Bottleneck Neural Network Language Models

24 March 2015

Diamantino Caseiro Google

In the last few years, language modeling techniques based on exponential models have consistently outperformed traditional n-gram models. Such techniques include L1-Regularized Maximum Entropy (L1-MaxEnt), and both Feedforward and Recurrent Neural Network Language Models (RNNLM). While more accurate, these models are also much more expensive to train and use. This presents a problem for low latency applications where it is desirable to find n-gram approximations that can be used in the first pass of a speech recognition system. In this talk I will present Bottleneck Neural Network Language Models, a novel feedforward architecture designed to be achieve low perplexity while allowing for n-gram approximations. This model is similar to MaxEnt models in the sense that its input is a rich set of features, however these features are processed though a non linear hidden layer to encourage generalization. In the talk, I will compare this architecture to other exponential models and I will present an effective algorithm for creating n-gram approximations. Results will be presented on standard data sets and on a voice-mail to text state of the art ASR system.



Diamantino Caseiro received a M.Sc. degree in electrical and computer engineering from Instituto Superior Tecnico (IST), Lisbon, Portugal in 1998, and a Ph.D in computer science, also from IST, in 2003. He was an assistant professor in the computer engineering department of this University from 2004 to 2007 and a member of the Spoken Language Systems Laboratory of INESC-ID from 1996 to 2007, where he specialized in weighted finite-state transducers and search algorithms for automatic speech recognition (ASR). From 2008 to 2014, he was a Principal Research Scientist at AT&T Labs Research, being responsible for language modeling, finite-state transducers and search for ASR. Since 2014 he has been a Senior Research Scientist at Google, working on search for ASR and on massive scale maximum entropy language modeling.