19 October 2018

Why I'm not yet saying goodbye to Google and adopting DuckDuckGo

There is an old Chinese saying: "Before you throw away the old bucket, make sure the new one does not leak".

There have been an array of privacy concerns about Google. About the amount of information it stores about you. About the fact that this information can be revealed to law enforcement agencies when they ask for it. Anecdotes have been made about how Google knows your girlfriend is pregnant even before you do.

The alternative was DuckDuckGo. A privacy-friendly search engine that is said to not store any personally identifiable information about you and is growing in popularity due to that, and also due to their bang feature. They make money via ads through a Bing and Yahoo alliance and also through non-personally-identifiable tracking of your search result leading to ecommerce websites like Amazon.

I was almost convinced. I had just switched my web-browser search option to DuckDuckGo instead of Google, but there was this niggling thought in my head about something about DuckDuckGo.

One search through DuckDuckGo's own search engine led me to this.

These guys had created an annoying, intrusive way to advertise themselves a few years ago. This is a big red-flag. A company that can do something filthy like this, can also do a lot of other filthy things behind-the-scenes. I already had my doubts about the so-called privacy when they had a Bing-Yahoo alliance (Yahoo is now under Oath).
A similar red-flag was observed with the Brave web-browser. They illegally replaced ads shown on websites. People who can do things like that can also cheat you while pretending to be nice and privacy-friendly.

DuckDuckGo: I don't trust you. 

None of this of much consequence anyway. The internet is not a place where you can actually be guaranteed privacy. Especially when using free tools. Even Tor has law-enforcement agencies spying on it. There's good reason for it though...given that illegal activities online are increasing.
One of the main reasons I prefer using Google search is because of the notification on harmful websites. Also, the search results are better than DuckDuckGo or any other search engine.

16 October 2018

Installing Netbeans 8.2 on Ubuntu 16.04 or 18.04 for Python functionality

Most Python editors are either not very functional or they take up way too much memory or they are just a pain to install. If you want a simple editor for Python, try Geany. Note that if you use Python 3, you'll have to specify it in Geany's compile and execute commands.

If you are a Netbeans fan, you'll need Java 8 to be able to install Netbeans.

You could either choose to install from this PPA:
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

Or install from the default JDK:
sudo apt-get install openjdk-8*
Now open the environment file:
sudo vi /etc/environment

Add the following line to the environment file (for Netbeans to be able to find Java):

Save and exit.

source /etc/environment

Now download the Python plugin for Netbeans 8.2. You'll get a zip file. Extract it and open Netbeans.
Goto the Tools > Plugins menu option. 
Select the "Downloaded" tab.
Click "Add Plugins".
Select the folder that you just extracted from the zip file and you'll see many ".nbm" files.
Press Ctrl+A to select all of them and press "Ok".
Click the "Install" button on the bottom left corner. Accept the license terms and click "Install".
Restart Netbeans.

Select File > New Project > Python > Python Project.
You'll see a "Manage" button which you can use to change the python platform to Python 3 if you need to.

Click the "New" button, navigate to /usr/bin/ and you'll find a "python3" file among a lot of files. Select it and click "Ok".
Now select Python 3 and click "Make Default".
Click "Close".
Click "Finish".

That's it! Your Python 3 project is ready to run in Ubuntu with Netbeans.

A better tutorial on the Haar features used in Viola Jones algorithm

One of the most confusing aspects of the Haar features used for the Viola Jones algorithm is the black and white rectangles. It took quite a while for me to figure it out since neither the research paper of Viola and Jones explained it well nor did any of the tutorials. Besides, the concept shouldn't be shown as filled black and white rectangles in the first place. So here's a change:

Instead of showing Haar features like this:

Show them like this:

It is necessary that people intuitively understand that it is not the black and white rectangles that are important, but the actual pixel values within the rectangles that are important. For good contrast, show them as yellow and red rectangles if you like.

Why are we using those rectangles?

If you were searching for a line in an image, you'd use a mask that's shaped like a line. Same way, when we search for a face, we can use a mask that is shaped like a face or to reduce computation, we could just search for parts of the face that almost always have dark and bright pixels in a certain pattern. The eyes and forehead are one such example. The pixels at the eyes will almost always be dark and the pixels at the forehead will almost always be brighter than the pixels at the eyes. So the black rectangle in the figure above just says that we are looking for a rectangular region where most of the pixels will be dark. The white rectangle above it means that wherever we intend to find such dark pixels (eyes area), we want to be sure that the rectangular area directly above it should compulsorily have plenty of bright pixels (the forehead area).

Of course there would be plenty of other places in the picture which would have similar bright and dark areas, but the area which is most likely to be the eyes and forehead, will give the highest Haar value (calculated in the formula below). This is also why you should not only search for the eyes and forehead, but also search for the nose and lips and ensure that the features you found are in the correct positions with respect to each other. That's how you'll be able to ensure that you have located a face. So based on what kind of black and white pixel pattern you are searching for, you have to design your Haar feature (the black and white rectangles) in such a way that it results in the highest value for whatever feature you are searching.

How to do the calculations?

Start by normalizing pixel values. If you have your grayscale image pixel values in a 2D matrix M that can hold grayscale values from 0 to 255, then divide all values in the matrix by 255 to normalize them. M will now have values ranging from 0 to 1.

Haar value = ((sum of values within white rectangle area in M) divided by (number of pixels within white rectangle)) minus ((sum of values within black rectangle area in M) divided by (number of pixels within black rectangle)).

The closer the Haar value is to 1, the more likely it is, that you've found a facial feature you were trying to match. In the image above, we were trying to find areas where the darker pixels of the eye region have an area above them consisting of lighter pixels of the forehead.

Other Haar feature shapes

Don't worry when you see shapes like this:

It's the same concept. Simply take the sum of all pixels from both white areas in the normalized image matrix M and the sum of all pixels from both black areas in M and subtract in the same way we did earlier. This particular shape is to detect some dark diagonal feature. You can also create your own Haar feature shapes based on what facial feature you are trying to detect.

To learn more about Integral images and Haar features, I recommend Balazs Holczer's tutorials. Well explained, and it's pleasantly amusing to hear him say "Lots of lots of" and the way he says "Feeeeeatures" :-)

Integral images

Haar features

14 October 2018

Skills requred in the field of Machine Learning and Data Science

While doing a literature survey for an assignment, I came across this Medium post by Jeff Hale where he lists some of the skills and technologies that are most in demand during 2017 and 2018 for jobs in Machine Learning.

What a job-seeker should know is that it isn't the highest bar in the graph they should be looking out for. These graphs only show you what the industry wants. There are many job descriptions that list an array of skills but the recruit is actually made to work on something much more mundane like data cleaning or preparing presentations.

It's far more important to identify which area of Machine Learning interests you. Do your own little research or build hobby projects specific to that area of interest and see if you can integrate it with other Machine Learning paradigms.
When you look for a job, don't just look for the skill-set they require or the projects they work on. Look for what your role would be in the project and have a very careful look at the Glassdoor reviews and Indeed reviews about the company culture. During the interview, make sure you get to meet the actual people who would be supervising you and if possible, even the team you'd be working with. Past experiences have shown that the way you are treated before, during and after the interview are a very good indicator of how you'll be treated after you join the company. You can even identify potentially toxic people you'd want to avoid.

Machine Learning and Data Science technologies are here to stay. If you've got yourself equipped with the necessary basic skills, you won't find yourself struggling to land yourself a job. What you do have to worry about is whether you end up as just a soldier in an army of data gorillas or whether you build products that you enjoy building, do research that is fulfilling and make the world a better place for everyone.