One of the most confusing aspects of the Haar features used for the Viola Jones algorithm is the black and white rectangles. It took quite a while for me to figure it out since neither the research paper of Viola and Jones explained it well nor did any of the tutorials. Besides, the concept shouldn't be shown as filled black and white rectangles in the first place. So here's a change:
Instead of showing Haar features like this:
Show them like this:
It is necessary that people intuitively understand that it is not the black and white rectangles that are important, but the actual pixel values within the rectangles that are important. For good contrast, show them as yellow and red rectangles if you like.
Source of the original images: University of Oulu's website.
Why are we using those rectangles?
If you were searching for a line in an image, you'd use a mask that's shaped like a line. Same way, when we search for a face, we can use a mask that is shaped like a face or to reduce computation, we could just search for parts of the face that almost always have dark and bright pixels in a certain pattern. The eyes and forehead are one such example. The pixels at the eyes will almost always be dark and the pixels at the forehead will almost always be brighter than the pixels at the eyes. So the black rectangle in the figure above just says that we are looking for a rectangular region where most of the pixels will be dark. The white rectangle above it means that wherever we intend to find such dark pixels (eyes area), we want to be sure that the rectangular area directly above it should compulsorily have plenty of bright pixels (the forehead area).
Of course there would be plenty of other places in the picture which would have similar bright and dark areas, but the area which is most likely to be the eyes and forehead, will give the highest Haar value (calculated in the formula below). This is also why you should not only search for the eyes and forehead, but also search for the nose and lips and ensure that the features you found are in the correct positions with respect to each other. That's how you'll be able to ensure that you have located a face. So based on what kind of black and white pixel pattern you are searching for, you have to design your Haar feature (the black and white rectangles) in such a way that it results in the highest value for whatever feature you are searching.
How to do the calculations?
Start by normalizing pixel values. If you have your grayscale image pixel values in a 2D matrix M that can hold grayscale values from 0 to 255, then divide all values in the matrix by 255 to normalize them. M will now have values ranging from 0 to 1.
The closer the Haar value is to 1, the more likely it is, that you've found a facial feature you were trying to match. In the image above, we were trying to find areas where the darker pixels of the eye region have an area above them consisting of lighter pixels of the forehead.
Other Haar feature shapes
Don't worry when you see shapes like this:
It's the same concept. Simply take the sum of all pixels from both white areas in the normalized image matrix M and the sum of all pixels from both black areas in M and subtract in the same way we did earlier. This particular shape is to detect some dark diagonal feature. You can also create your own Haar feature shapes based on what facial feature you are trying to detect.
To learn more about Integral images and Haar features, I recommend Balazs Holczer's tutorials. Well explained, and it's pleasantly amusing to hear him say "Lots of lots of" and the way he says "Feeeeeatures" :-)
Integral images
Haar features
Instead of showing Haar features like this:
Show them like this:
It is necessary that people intuitively understand that it is not the black and white rectangles that are important, but the actual pixel values within the rectangles that are important. For good contrast, show them as yellow and red rectangles if you like.
Source of the original images: University of Oulu's website.
Why are we using those rectangles?
If you were searching for a line in an image, you'd use a mask that's shaped like a line. Same way, when we search for a face, we can use a mask that is shaped like a face or to reduce computation, we could just search for parts of the face that almost always have dark and bright pixels in a certain pattern. The eyes and forehead are one such example. The pixels at the eyes will almost always be dark and the pixels at the forehead will almost always be brighter than the pixels at the eyes. So the black rectangle in the figure above just says that we are looking for a rectangular region where most of the pixels will be dark. The white rectangle above it means that wherever we intend to find such dark pixels (eyes area), we want to be sure that the rectangular area directly above it should compulsorily have plenty of bright pixels (the forehead area).
Of course there would be plenty of other places in the picture which would have similar bright and dark areas, but the area which is most likely to be the eyes and forehead, will give the highest Haar value (calculated in the formula below). This is also why you should not only search for the eyes and forehead, but also search for the nose and lips and ensure that the features you found are in the correct positions with respect to each other. That's how you'll be able to ensure that you have located a face. So based on what kind of black and white pixel pattern you are searching for, you have to design your Haar feature (the black and white rectangles) in such a way that it results in the highest value for whatever feature you are searching.
How to do the calculations?
Start by normalizing pixel values. If you have your grayscale image pixel values in a 2D matrix M that can hold grayscale values from 0 to 255, then divide all values in the matrix by 255 to normalize them. M will now have values ranging from 0 to 1.
Haar value = ((sum of values within white rectangle area in M) divided by (number of pixels within white rectangle)) minus ((sum of values within black rectangle area in M) divided by (number of pixels within black rectangle)).
The closer the Haar value is to 1, the more likely it is, that you've found a facial feature you were trying to match. In the image above, we were trying to find areas where the darker pixels of the eye region have an area above them consisting of lighter pixels of the forehead.
Other Haar feature shapes
Don't worry when you see shapes like this:
It's the same concept. Simply take the sum of all pixels from both white areas in the normalized image matrix M and the sum of all pixels from both black areas in M and subtract in the same way we did earlier. This particular shape is to detect some dark diagonal feature. You can also create your own Haar feature shapes based on what facial feature you are trying to detect.
To learn more about Integral images and Haar features, I recommend Balazs Holczer's tutorials. Well explained, and it's pleasantly amusing to hear him say "Lots of lots of" and the way he says "Feeeeeatures" :-)
Integral images
Haar features
1 comment:
"the concept shouldn't be shown as black and white rectangles in the first place"
Thanks for putting this out. I've also spent a good amount of time getting this cleared. The Viola paper too shows these as solid white and black rectangles and that was the source of all my confusion!
Post a Comment