Perplexity, Smoothing, and What Words Mean
By the end of this post, you'll know how to evaluate a language model using 'perplexity', why unseen n-grams break everything and how smoothing patches the holes, and how interpolation lets you mix...

Source: DEV Community
By the end of this post, you'll know how to evaluate a language model using 'perplexity', why unseen n-grams break everything and how smoothing patches the holes, and how interpolation lets you mix n-gram orders instead of betting on one. You'll also understand why word meaning is harder to pin down than it looks, what kinds of relationships exist between words, and how a 1951 insight from philosopher Ludwig Wittgenstein laid the intellectual groundwork for word embeddings. Two halves, one thread: the first half shows you the limits of n-gram language models. The second half shows you why those limits forced NLP to rethink how words are represented, which is where the deep learning side of NLP starts. Where We Left Off Last post, we built n-gram language models: chain rule, Markov assumption, unigrams, bigrams, MLE. We left knowing how to build one. Two questions were still open: how do you know if your model is any good? and what happens when the training data doesn't cover a word com