16 October 2018

A better tutorial on the Haar features used in Viola Jones algorithm

One of the most confusing aspects of the Haar features used for the Viola Jones algorithm is the black and white rectangles. It took quite a while for me to figure it out since neither the research paper of Viola and Jones explained it well nor did any of the tutorials. Besides, the concept shouldn't be shown as black and white rectangles in the first place. So here's a change:

Instead of showing Haar features like this:

Show them like this:

It is necessary that people intuitively understand that it is not the black and white rectangles that are important, but the actual pixel values within the rectangles that are important. For good contrast, show them as yellow and red rectangles if you like.

Why are we using those rectangles?

If you were searching for a line in an image, you'd use a mask that's shaped like a line. Same way, when we search for a face, we can use a mask that is shaped like a face or to reduce computation, we could just search for parts of the face that almost always have dark and bright pixels in a certain pattern. The eyes and forehead are one such example. The pixels at the eyes will almost always be dark and the pixels at the forehead will almost always be brighter than the pixels at the eyes. So the black rectangle in the figure above just says that we are looking for a rectangular region where most of the pixels will be dark. The white rectangle above it says that wherever we intend to find such dark pixels, we want to be sure that it might be the eye area, by confirming that there is a bright pixeled forehead area just above it (the white rectangle area). All other Haar features are created with similar objectives. To find areas with bright pixels that are adjacent to areas with dark pixels. You can create your own Haar feature based on what kind of a pattern you want to search for.

How to do the calculations?

Start by normalizing pixel values. If you have your grayscale image pixel values in a 2D matrix M that can hold grayscale values from 0 to 255, then divide all values in the matrix by 255 to normalize them. M will now have values ranging from 0 to 1.

Haar value = ((sum of values within white rectangle area in M) divided by (number of pixels within white rectangle)) minus ((sum of values within black rectangle area in M) divided by (number of pixels within black rectangle)).

The closer the Haar value is to 1, the more likely it is, that you've found a facial feature you were trying to match. In the image above, we were trying to find areas where the darker pixels of the eye region have an area above them consisting of lighter pixels of the forehead.

Other Haar feature shapes

Don't worry when you see shapes like this:

It's the same concept. Simply take the sum of all pixels from both white areas in the normalized image matrix M and the sum of all pixels from both black areas in M and subtract in the same way we did earlier. This particular shape is to detect some dark diagonal feature. You can also create your own Haar feature shapes based on what facial feature you are trying to detect.

To learn more about Integral images and Haar features, I recommend Balazs Holczer's tutorials. Well explained, and it's pleasantly amusing to hear him say "Lots of lots of" and the way he says "Feeeeeatures" :-)

Integral images

Haar features

No comments: