28 September 2018

The basics of Hidden Markov Models with Forward Trellis

With anything related to Mathematics, I'm surprised how tutorials on the internet and research papers rush into complex equations and variables without first explaining the basic concept which can help a student get a grasp of what the equations mean.
I've had a Maths teacher who was so fond of equations, that he disliked having to explain concepts to students. He expected us to first go through the equation because according to him, the equations were intuitive. They are NOT!
There are far better, simpler ways everything can be explained, and that's how I intend to explain Hidden Markov Models.


Let's take the common example of having three possible days. The day can be either sunny, cloudy or rainy.

To have a simpler representation, I'll show them by their primary colors.

Our objective is to assume that we start with one of these days and then calculate the probability that the next day would be sunny, cloudy or rainy. Then we calculate the same for the next day and the day after that and so on, until we want to stop.

Although the above figure shows what the weather is on each of the five days, in reality we don't know it, and we want to calculate the probability that any particular day would be sunny, rainy or cloudy.

We refer to these weather situations as "States". So on any day the weather might be in a sunny state or a cloudy state or a rainy state.

Transition probability

Usually the transition probability matrix is created by people who note down the actual weather states for many days and then they statistically estimate that if today is sunny, then the probability that the sunny weather might transition into a cloudy weather is 0.3. If the weather is cloudy, the probability that it might transition into a rainy weather tomorrow is 0.4. They calculate these probabilities and create a probability transition matrix like this:

Note that in such a matrix, the probabilities of the row values add up to 1. The 0.5 in the first row and first column of the matrix indicates the probability that if today is sunny, tomorrow will be sunny too.

The Trellis 
When we start with knowing today's weather and try to calculate probabilities of the weather of future days, we don't know what the weather will be in the future. Knowledge about the future is hidden, and we call those as hidden states.

So if we represent each state (sunny/cloudy/rainy) in a row, it would look like this:

Suppose the states were not hidden, and if we knew what the weather is on each day, the weather transitions could either be shown like this:

Or like this:

Representing states in this manner is called "Trellis" because it looks like a Trellis.

In a Hidden Markov Model, we don't know the states, so we represent all circles as empty circles and we add an additional row for the "final state". The final state is nothing special. It is just the weather of the last day that you are considering. It was not even necessary to add to the trellis diagram, but some silly person decided to complicate the concept and represent the final state as a separate row, to be able to clearly indicate that once the Markov Model transitions into the final state, it stops there and does not transition any more.

Since the final state could be either sunny, rainy or cloudy, I decided to show that row with all three colors.

The transition probability matrix would now have an additional row to show the final state:

So what this silly transition probability matrix now shows us is that the probability of transitioning from any weather today to the weather tomorrow is 0.2 or 0.1 or 0.3, but once you reach the final state, the probability of transitioning to any other state tomorrow is zero.

Why do we need to do all this?

In this simple example, we assume that a person could have a picnic if the weather is sunny or cloudy. If the weather is likely to be rainy, the person sits at home and reads a book.

Interestingly, suppose we observe that somebody went for a picnic on day 1, day 2 and day 3 and spent day 4 and day 5 reading a book, we would be able to use the trellis diagram to estimate which days were sunny, cloudy and rainy (the weather states are "hidden" and we need to estimate it).
To do that, you'll need another matrix called the "emission probability matrix".

Emission probabilities

The rows in the trellis represent sunny, cloudy and rainy respectively. Based on the state (weather) of any day, a person may decide to either go on a picnic or read a book. The decision is called an "emission" and is defined numerically by probabilities in an emission probability matrix.

An "emission" on a day just means the probability that one of the decisions will be taken. Either to read a book or to go on a picnic.

This is our emission probability matrix:

The rows in the matrix show the decision taken and the columns show the probability of taking those decisions for each day. If you were considering 10 days, then this matrix would have five more columns.

Just like the transition probability matrix, even the emission probability matrix is calculated beforehand by somebody who observes the weather and a person's decision to read a book or go on a picnic on multiple days and says that these are the probabilities for the Hidden Markov Model. Note that here too, the row values add up to 1.

Forward Trellis (Viterbi algorithm)

All that the Forward Trellis method says is that if you have a set of observations about the decisions a person took on the five days:
O = {picnic, picnic, picnic, read book, read book}

Using the Viterbi algorithm you can find out the probability of whether each of the days were sunny, cloudy or rainy. It shows you the shortest path through the trellis.

In the transition probability matrix and the emission probability matrix, the rows are represented by "i" and the columns by "j" in many textbooks.
So each probability value in the transition probability matrix would be represented as aij.
Since in the emission probability matrix the columns represent the days, I'll represent the columns with "k" instead of "j". The probability values in the emission probability matrix are represented as bik.

Now, given the observations O = {picnic, picnic, picnic, read book, read book}, if we want to know the probability that day2 is cloudy...

...we just have to use the Viterbi formula.
It is:

Probability of day2 being cloudy = Emission probability of picnic for day2 * ((probability day1 being sunny * transition probability of sunny to cloudy) + (probability of day1 being cloudy * transition probability of cloudy to cloudy) + (probability of day1 being rainy * transition probability of rainy to cloudy)).

Day1 in this case will begin with the sunny state having probability = 1 and the cloudy and rainy states will have probability = 0 for day1. We assume we know that the first day is sunny.
If you wish, you could also assume you don't know the weather for the first day, and assign initial probabilities for each state on day1.

To calculate probabilities for day3, you'd first have to calculate all probabilities of sunny, cloudy and rainy for day2.

The three converging gray arrows in the above diagram are just to show that we use the transition probabilities of sunny to cloudy, cloudy to cloudy and rainy to cloudy.

That's all there is to it. It's so simple, but people explain it in such contorted, complicated ways that it's hard to grasp and understand why things are being done like this and what purpose it serves. Luckily for you, NRecursions comes to your rescue again :-)