Lecture Slides

Part of Speech Tagging

Markov Chains

Markov Chains and POS Tags

Hidden Markov Models

Calculate probabilities

Populating the transition Matrix

Populating the Emission Matrix

In general think of transition when working with tags only and of emission when working with tags and words.

Viterbi Algorithm

Viterbi: Initialization

You will now populate a matrix C of dimension (num_tags, num_words). This matrix will have the probabilities that will tell you what part of speech each word belongs to.

Untitled

Now to populate the first column, you just multiply the initial π distribution, for each tag, times $b_{i, \operatorname{cindex}\left(w_{1}\right)}$Where the i, corresponds to the tag of the initial distribution and the $cindex(w1​)$ is the index of word 1 in the emission matrix. And that's it, you are done with populating the first column of your new C matrix. You will now need to keep track what part of speech you are coming from. Hence we introduce a matrix D, which allows you to store the labels that represent the different states you are going through when finding the most likely sequence of POS tags for the given sequence of words $w_1​ ,…,w_K$ At first you set the first column to 0, because you are not coming from any POS tag.

Untitled

Viterbi: Forward Pass

Untitled

This will be best illustrated with an example:

Untitled