transition
when working with tags only and of emission
when working with tags and words.You will now populate a matrix C of dimension (num_tags, num_words). This matrix will have the probabilities that will tell you what part of speech each word belongs to.
Now to populate the first column, you just multiply the initial π distribution, for each tag, times $b_{i, \operatorname{cindex}\left(w_{1}\right)}$Where the i, corresponds to the tag of the initial distribution and the $cindex(w1)$ is the index of word 1 in the emission matrix. And that's it, you are done with populating the first column of your new C matrix. You will now need to keep track what part of speech you are coming from. Hence we introduce a matrix D, which allows you to store the labels that represent the different states you are going through when finding the most likely sequence of POS tags for the given sequence of words $w_1 ,…,w_K$ At first you set the first column to 0, because you are not coming from any POS tag.
This will be best illustrated with an example: