30 December 2018

The silo-world of messaging apps

Messaging apps are convenient. Everybody agrees to that. It's easy to access notifications, easy to share files and it does not have the "formality" baggage that email comes with.

Yet, messaging apps have a fundamental flaw that services like email and SMS do not have: Universality.

With email and SMS, you can send messages between any service provider. However, you can't send messages between WhatsApp and Hangouts or WeChat or Telegram. A simple process of following a protocol like XMPP or creating a global standard that all messaging apps should have followed.

So now we are stuck in a world where we compulsorily have to install whatever app a group of people uses, even if we don't want to.

Indifference among people
Oddly, even though the objective of messaging apps is to bring people together, that's not what actually happens.
One of my classmates got in touch with few other classmates after sixteen years, via phone and email. He took initiative to get in touch with more classmates, and with two months of effort, he put many other classmates in touch with each other. Everyone was happy about the reunion. They asked him to create a WhatsApp group and he told them he doesn't use WhatsApp. So they created the group themselves and all of them joined the group, leaving him out. They just didn't care if he wasn't part of the group. To them, it was his fault that he doesn't use WhatsApp. Needless to say, when I saw this cold response from everyone, I refused to join the group too. I don't support people who don't give a damn about others.

Another group of classmates at another college decided to form a WhatsApp group for discussions with teachers and the teachers agreed to send them notifications about exam syllabus and other updates via the group. One classmate openly said he doesn't use WhatsApp and got heckled and mocked by his classmates. When the teachers shared lab exam dates or assignment presentation dates, the group would not share this information with "the social outcast". When he asked, they rudely told him that since he chose not to use WhatsApp, it was his responsibility to ask them for updates. Oddly, even though the teachers realized that what was going on was wrong, they eventually continued using WhatsApp instead of ensuring that all communication happened via email uniformly to all students. The social bullies had skillfully psychologically manipulated the teachers by providing them with circumstantial evidence and continued maligning the name of the "outcast" by means of a character assassination. What really went wrong here was that the teachers sourced opinions from the bullies and the gullible people who believed in the misinformation circulated by the bullies (much like the Stanford Prison Experiment). Everybody decided to make assumptions about the "outcast", making the crucial mistake of not talking to him about the matter.

Messaging apps only end up dividing people, when the app creator's objective is to create a tiny walled garden where everybody uses only their app.

I don't support social ostracization done in the name of forming messaging app groups. Apps are not the only way to communicate. In-fact, they are one of the worst ways.

What you can do

  • Tech identification: First, learn the science of identifying an appropriate communication medium. You can't use a messaging app to form a group of a diverse set of people, since many of them may not use apps. Use a more common medium like email. It's not convenience that matters here. It's uniform communication and participation. If you don't understand this, you don't have the maturity to form a group.
  • Talk to people: Ask everyone what they are comfortable with. If somebody is not very tech-savvy or does not use the tech y'all chose, either check if everyone else is ok with using the tech this person proposes or offer to keep the person informed always. It's your responsibility to keep the group united.
  • Prevent power misuse: It is said that if you want to test the character of a person, give them power and see what they do with it. When you give responsibility to someone for forming a group or for communicating crucial information, make sure it's given to a person of integrity and maturity. Other personality types can utilize their 'position of power' for personal vendetta or to pump up their ego. If you see a misuse of power, don't support the person, no matter how convincing they sound. 
  • Know the nature of communication: When there is an urgent piece of info to communicate, the ONLY reliable way to do it is to have a phone call. That's the only way you know the person has received the message. If you use email, SMS or a messaging app, don't assume that the other person has seen the message unless they respond with an "ok". Often, the response or read-receipt could be a result of somebody else looking at the message too. Do the phone call instead.

Hoping for an RFC that would be accepted universally. Messaging apps need the ability to talk to each other.

UPDATE: On the 30th of December 2018 I wrote to the WhatsApp team about creating a better messaging ecosystem, mentioned the incidents of bullying and asked if they could offer a messaging option that wouldn't require compulsorily installing the app on the phone. When I wrote to them in July 31st 2017, they replied saying that they had no plans of doing so, but to my December email, I received two updates until mid Jan 2019, stating that my communication was on hold and then I stopped receiving updates.
On 26th Jan I saw this in the newspaper:

Now that's a good move!

11 November 2018

Which is better? Octave or Matlab?

I'd say choose Matlab everytime if you can. It's much faster than Octave.

A Sudoku program I developed used to take two minutes to run 2000 iterations in Matlab.
The same program took 13 minutes to run 500 iterations in Octave.

The reason apparently, is because Matlab uses Intel's Math Kernel Library (MKL) internally, and I have personally seen the significant speed boost MKL gives, when I tried it for some of my C++ code I developed long ago.

Apart from this, there's the Matlab GUI which is far better than Octave in terms of customizability and plain-old user friendliness.

Granted, that Octave is being built by the open source community, but the community should have seriously considered the aspect of processing speed before starting the project. Even now it isn't impossible to make the switch to MKL. The speed changes I had seen in my C++ program were after I replaced the old code with function calls to MKL and it worked.

Good websites to download masters or PhD thesis / dissertation

To find a good university website or any other website where theses have been published and is available for free, your best bet is to first try using a search engine.

Eg: If you are looking for swarm intelligence using image segmentation, search with the terms:
"partial fulfillment" swarm image segmentation masters thesis pdf

Some other websites that can help are:

08 November 2018


Continued from the previous Aha!

Pachcha Malayalam
Share with this link

To be continued...

07 November 2018

Can Initiative Q be trusted?

As a personal opinion, I'd say no.

In general, any scheme that seems too good to be true, probably is. Especially when money is involved. Initiative Q appears to have many of the lucrative parameters that all scams have.

  • The offer of free money (every sane person knows that you should stay away from this)
  • The "by invite only" exclusivity that gets people salivating (GMail had used the tactic when it started)
  • The offer of more imaginary money if you invite more people.
  • The ever-decreasing amount of imaginary free money, pushing you to throw caution to the wind and join early. A classic.
  • The non-constancy of offered free money. On the website it shows $25000 is available and decreasing, but if you click a link invite, the amount goes up to $33000.
  • Most importantly, the fact that you can't properly verify the credentials of the people who setup this scheme.
  • The fact that they "trust" you to approve new joinees but still offer you extra imaginary money if you approve them (even though they know that you may not personally know the new joinees), showing that they don't really care if the new users are genuine.
If you want to try it out for the fun of it, I'd recommend you create a new email ID under a fake name and use an Initiative Q password that does not resemble any of the passwords you'd normally use. Definitely do not join it using your Facebook or Google account. Also, everytime you've finished your visit to their website, clear your browser cache and cookies just to be safe.

Discussions on Reddit point to this scheme either being a way to harvest email ID's and passwords for marketing or to obtain people's password patterns or just a social experiment to check how many people are gullible enough to fall for a scam.

To be fair to Initiative Q, there are proponents of it who say it is not a scam and that it has the potential to be a new currency if there is a significant number of people who back it, but they also say it won't make you super-rich.
Only time will tell.

05 November 2018

A simple tutorial showing some basic PGMPy program code and explanations


Firstly, it's recommended you have the latest version of Python3 installed.
Python3 uses pip3 to install packages that you'll be importing into your programs, so ensure pip3 is installed too.
Install git.
It was my personal preference to not use Anaconda for installations (because although they have a lot of packages, there are still many python applications that end up not working because of dependency problems with Anaconda). If you do use Anaconda (also know miniconda exists), do try ensuring that whatever Python packages you install using Anaconda are not also installed using pip3, as there can be version conflicts.

Caution: It's better to setup a virtual environment first, because different Python packages are dependent on different versions of each other and they can cause problems with the default Linux python, which will then lead to problems with the operating system. I've found this tutorial to be good for installing pyenv and using it.

Now install a bunch of peripheral packages (Not all of them are required. If you want to install just the bare-minimum, see the documentation here):
sudo pip3 install sphinx nose
sudo pip3 install networkx
sudo pip3 install numpy
sudo pip3 install cython
sudo pip3 install pandas
sudo pip3 install setuptools
sudo pip3 install IPython
sudo pip3 install matplotlib
sudo pip3 install pylab
sudo pip3 install python-tk
sudo pip3 install gensim
sudo pip3 install spacy

Now clone PGMPy and install:
git clone https://github.com/pgmpy/pgmpy 
git checkout dev
sudo python3 setup.py install

If you are wondering which IDE to use for Python, I've put up my own little review here. I prefer LiClipse because it supports refactoring and autocomplete reasonably well.
You may also want to try PythonAnywhere.


Try creating a basic Bayesian network like this:

from pgmpy.models import BayesianModel
import networkx as nx
import matplotlib.pyplot as plt

# Defining the model structure. We can define the network by just passing a list of edges.
model = BayesianModel([('D', 'G'), ('I', 'G')])
nx.draw(model, with_labels = True); 


If you encounter an error like this:
"AttributeError: module 'matplotlib.pyplot' has no attribute 'ishold'",
See this issue for the solution.

To add conditional probabilities to each of those nodes, you can do this:

from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
#import networkx as nx
#import matplotlib.pyplot as plt

# Defining the model structure. We can define the network by just passing a list of edges.
model = BayesianModel([('D', 'G'), ('I', 'G')])
#nx.draw(model, with_labels = True); plt.show()

# Defining individual CPDs.
cpd_d = TabularCPD(variable='D', variable_card=2, values=[[0.6, 0.4]])
cpd_i = TabularCPD(variable='I', variable_card=2, values=[[0.7, 0.3]])
cpd_g = TabularCPD(variable='G', variable_card=3,
                   values=[[0.3, 0.05, 0.9,  0.5],
                           [0.4, 0.25, 0.08, 0.3],
                           [0.3, 0.7,  0.02, 0.2]],
                  evidence=['I', 'D'],
                  evidence_card=[2, 2])
# Associating the CPDs with the network
model.add_cpds(cpd_d, cpd_i, cpd_g)
if model.check_model():
    print("Your network structure and CPD's are correctly defined. The probabilities in the columns sum to 1. Good job!")

print("Showing all the CPD's one by one")
for i in model.get_cpds():
print("You can also access them like this:")
c = model.get_cpds()
print("Number of values G can take on. The cardinality of G is:")

Output CPD tables:

Here, the "variable_card" does not mean anything about a "card". It speaks about the cardinality of the variable. Same with evidence_card.
Have a look at the CPD tables and you'll see that variable cardinality for G is 3 because you want to specify three types of states for G. There's G_0, G_1 and G_2. For the variable cardinality and evidence cardinality, I feel the creators of PGMPy could've programmed it to automatically detect the number of rows instead of expecting us to specify the cardinality.

So variable cardinality will specify the number of rows of a CPD table and evidence cardinality will specify the columns. Once you've specified the CPD's, you can add it to the network and you are ready to start doing inferences.

Let's try a slightly larger network

from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination
import networkx as nx
import matplotlib.pyplot as plt

# Defining the model structure. We can define the network by just passing a list of edges.
model = BayesianModel([('D', 'G'), ('I', 'G'), ('G', 'L'), ('I', 'S')])

# Defining individual CPDs.
cpd_d = TabularCPD(variable='D', variable_card=2, values=[[0.6, 0.4]])
cpd_i = TabularCPD(variable='I', variable_card=2, values=[[0.7, 0.3]])
cpd_g = TabularCPD(variable='G', variable_card=3,
                   values=[[0.3, 0.05, 0.9,  0.5],
                           [0.4, 0.25, 0.08, 0.3],
                           [0.3, 0.7,  0.02, 0.2]],
                  evidence=['I', 'D'],
                  evidence_card=[2, 2])

cpd_l = TabularCPD(variable='L', variable_card=2,
                   values=[[0.1, 0.4, 0.99],
                           [0.9, 0.6, 0.01]],

cpd_s = TabularCPD(variable='S', variable_card=2,
                   values=[[0.95, 0.2],
                           [0.05, 0.8]],

# Associating the CPDs with the network
model.add_cpds(cpd_d, cpd_i, cpd_g, cpd_l, cpd_s)
# Getting the local independencies of a variable.
print("Local independencies of G:")
# Getting all the local independencies in the network
print("Local independencies of other nodes:")
model.local_independencies(['D', 'I', 'S', 'G', 'L'])
# Active trail: For any two variables A and B in a network if any change in A influences the values of B then we say that there is an active trail between A and B.
# In pgmpy active_trail_nodes gives a set of nodes which are affected by any change in the node passed in the argument.
print("Active trail for D:")
print("Active trail for D when G is observed:")
print(model.active_trail_nodes('D', observed='G'))

infer = VariableElimination(model)
print('Variable Elimination:')
print(infer.query(['G']) ['G'])
print(infer.query(['G'], evidence={'D': 0, 'I': 1}) ['G'])
print(infer.map_query(['G'], evidence={'D': 0, 'I': 1}))
print(infer.map_query(['G'], evidence={'D': 1, 'I': 0, 'L': 0, 'S': 0}))

nx.draw(model, with_labels = True); plt.show()

What is going on in the code:

The variable elimination queries are pretty-much self-explanatory, where you are querying for the state of a node, given that some other nodes are in certain other states (the evidence or observed variables). Inference is done via standard variable elimination or via a MAP query. PGMPy also allows you to do variable elimination by specifying the order in which you want to eliminate variables. There's more info about this in their documentation.
What this line infer.query(['G']) ['G'] does is, it returns a Python dict (which is like a multimap). The output looks like this:

{'G': 0}

Here, 'G' is the key of the dict, and to access the value associated with 'G' (which is zero in this case), you just have to use ['G']. So it is the equivalent of doing:

q = infer.query(['G']) 

Why this tutorial 
For anyone new to Python or PGMPy, a lot of this syntax looks very confusing, and the documentation does not explain it deeply enough either. The objective of this tutorial was to clear up those basic doubts so that you could navigate the rest of the library on your own. Hope it helped.

PGMPy is created by Indians, and is quite a good library for Probabilistic Graphical models in Python. You'll also find libraries for Java, C++, R, Matlab etcetera. If you want to manually try out your network model, there is an excellent tool called SAMIAM.

More documentation for PGMPy is here, and if you want their e-book, an internet search will lead you to it.

Being a sweetheart, if you'd like to say thank you, you could send me a chocolate from my Amazon WishList :-)

19 October 2018

Why I'm not yet saying goodbye to Google and adopting DuckDuckGo

There is an old Chinese saying: "Before you throw away the old bucket, make sure the new one does not leak".

There have been an array of privacy concerns about Google. About the amount of information it stores about you. About the fact that this information can be revealed to law enforcement agencies when they ask for it. Anecdotes have been made about how Google knows your girlfriend is pregnant even before you do.

The alternative was DuckDuckGo. A privacy-friendly search engine that is said to not store any personally identifiable information about you and is growing in popularity due to that, and also due to their bang feature. They make money via ads through a Bing and Yahoo alliance and also through non-personally-identifiable tracking of your search result leading to ecommerce websites like Amazon.

I was almost convinced. I had just switched my web-browser search option to DuckDuckGo instead of Google, but there was this niggling thought in my head about something about DuckDuckGo.

One search through DuckDuckGo's own search engine led me to this.

These guys had created an annoying, intrusive way to advertise themselves a few years ago. This is a big red-flag. A company that can do something filthy like this, can also do a lot of other filthy things behind-the-scenes. I already had my doubts about the so-called privacy when they had a Bing-Yahoo alliance (Yahoo is now under Oath).
A similar red-flag was observed with the Brave web-browser. They illegally replaced ads shown on websites. People who can do things like that can also cheat you while pretending to be nice and privacy-friendly.

DuckDuckGo: I don't trust you. 

None of this of much consequence anyway. The internet is not a place where you can actually be guaranteed privacy. Especially when using free tools. Even Tor has law-enforcement agencies spying on it. There's good reason for it though...given that illegal activities online are increasing.
One of the main reasons I prefer using Google search is because of the notification on harmful websites. Also, the search results are better than DuckDuckGo or any other search engine.

16 October 2018

Installing Netbeans 8.2 on Ubuntu 16.04 or 18.04 for Python functionality

Most Python editors are either not very functional or they take up way too much memory or they are just a pain to install. If you want a simple editor for Python, try Geany. Note that if you use Python 3, you'll have to specify it in Geany's compile and execute commands.

If you are a Netbeans fan, you'll need Java 8 to be able to install Netbeans.

You could either choose to install from this PPA:
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

Or install from the default JDK:
sudo apt-get install openjdk-8*
Now open the environment file:
sudo vi /etc/environment

Add the following line to the environment file (for Netbeans to be able to find Java):

Save and exit.

source /etc/environment

Now download the Python plugin for Netbeans 8.2. You'll get a zip file. Extract it and open Netbeans.
Goto the Tools > Plugins menu option. 
Select the "Downloaded" tab.
Click "Add Plugins".
Select the folder that you just extracted from the zip file and you'll see many ".nbm" files.
Press Ctrl+A to select all of them and press "Ok".
Click the "Install" button on the bottom left corner. Accept the license terms and click "Install".
Restart Netbeans.

Select File > New Project > Python > Python Project.
You'll see a "Manage" button which you can use to change the python platform to Python 3 if you need to.

Click the "New" button, navigate to /usr/bin/ and you'll find a "python3" file among a lot of files. Select it and click "Ok".
Now select Python 3 and click "Make Default".
Click "Close".
Click "Finish".

That's it! Your Python 3 project is ready to run in Ubuntu with Netbeans.

A better tutorial on the Haar features used in Viola Jones algorithm

One of the most confusing aspects of the Haar features used for the Viola Jones algorithm is the black and white rectangles. It took quite a while for me to figure it out since neither the research paper of Viola and Jones explained it well nor did any of the tutorials. Besides, the concept shouldn't be shown as filled black and white rectangles in the first place. So here's a change:

Instead of showing Haar features like this:

Show them like this:

It is necessary that people intuitively understand that it is not the black and white rectangles that are important, but the actual pixel values within the rectangles that are important. For good contrast, show them as yellow and red rectangles if you like.

Why are we using those rectangles?

If you were searching for a line in an image, you'd use a mask that's shaped like a line. Same way, when we search for a face, we can use a mask that is shaped like a face or to reduce computation, we could just search for parts of the face that almost always have dark and bright pixels in a certain pattern. The eyes and forehead are one such example. The pixels at the eyes will almost always be dark and the pixels at the forehead will almost always be brighter than the pixels at the eyes. So the black rectangle in the figure above just says that we are looking for a rectangular region where most of the pixels will be dark. The white rectangle above it means that wherever we intend to find such dark pixels (eyes area), we want to be sure that the rectangular area directly above it should compulsorily have plenty of bright pixels (the forehead area).

Of course there would be plenty of other places in the picture which would have similar bright and dark areas, but the area which is most likely to be the eyes and forehead, will give the highest Haar value (calculated in the formula below). This is also why you should not only search for the eyes and forehead, but also search for the nose and lips and ensure that the features you found are in the correct positions with respect to each other. That's how you'll be able to ensure that you have located a face. So based on what kind of black and white pixel pattern you are searching for, you have to design your Haar feature (the black and white rectangles) in such a way that it results in the highest value for whatever feature you are searching.

How to do the calculations?

Start by normalizing pixel values. If you have your grayscale image pixel values in a 2D matrix M that can hold grayscale values from 0 to 255, then divide all values in the matrix by 255 to normalize them. M will now have values ranging from 0 to 1.

Haar value = ((sum of values within white rectangle area in M) divided by (number of pixels within white rectangle)) minus ((sum of values within black rectangle area in M) divided by (number of pixels within black rectangle)).

The closer the Haar value is to 1, the more likely it is, that you've found a facial feature you were trying to match. In the image above, we were trying to find areas where the darker pixels of the eye region have an area above them consisting of lighter pixels of the forehead.

Other Haar feature shapes

Don't worry when you see shapes like this:

It's the same concept. Simply take the sum of all pixels from both white areas in the normalized image matrix M and the sum of all pixels from both black areas in M and subtract in the same way we did earlier. This particular shape is to detect some dark diagonal feature. You can also create your own Haar feature shapes based on what facial feature you are trying to detect.

To learn more about Integral images and Haar features, I recommend Balazs Holczer's tutorials. Well explained, and it's pleasantly amusing to hear him say "Lots of lots of" and the way he says "Feeeeeatures" :-)

Integral images

Haar features

14 October 2018

Skills requred in the field of Machine Learning and Data Science

While doing a literature survey for an assignment, I came across this Medium post by Jeff Hale where he lists some of the skills and technologies that are most in demand during 2017 and 2018 for jobs in Machine Learning.

What a job-seeker should know is that it isn't the highest bar in the graph they should be looking out for. These graphs only show you what the industry wants. There are many job descriptions that list an array of skills but the recruit is actually made to work on something much more mundane like data cleaning or preparing presentations.

It's far more important to identify which area of Machine Learning interests you. Do your own little research or build hobby projects specific to that area of interest and see if you can integrate it with other Machine Learning paradigms.
When you look for a job, don't just look for the skill-set they require or the projects they work on. Look for what your role would be in the project and have a very careful look at the Glassdoor reviews and Indeed reviews about the company culture. During the interview, make sure you get to meet the actual people who would be supervising you and if possible, even the team you'd be working with. Past experiences have shown that the way you are treated before, during and after the interview are a very good indicator of how you'll be treated after you join the company. You can even identify potentially toxic people you'd want to avoid.

Machine Learning and Data Science technologies are here to stay. If you've got yourself equipped with the necessary basic skills, you won't find yourself struggling to land yourself a job. What you do have to worry about is whether you end up as just a soldier in an army of data gorillas or whether you build products that you enjoy building, do research that is fulfilling and make the world a better place for everyone.

28 September 2018

The basics of Hidden Markov Models with Forward Trellis

With anything related to Mathematics, I'm surprised how tutorials on the internet and research papers rush into complex equations and variables without first explaining the basic concept which can help a student get a grasp of what the equations mean.
I've had a Maths teacher who was so fond of equations, that he disliked having to explain concepts to students. He expected us to first go through the equation because according to him, the equations were intuitive. They are NOT!
There are far better, simpler ways everything can be explained, and that's how I intend to explain Hidden Markov Models.


Let's take the common example of having three possible days. The day can be either sunny, cloudy or rainy.

To have a simpler representation, I'll show them by their primary colors.

Our objective is to assume that we start with one of these days and then calculate the probability that the next day would be sunny, cloudy or rainy. Then we calculate the same for the next day and the day after that and so on, until we want to stop.

Although the above figure shows what the weather is on each of the five days, in reality we don't know it, and we want to calculate the probability that any particular day would be sunny, rainy or cloudy.

We refer to these weather situations as "States". So on any day the weather might be in a sunny state or a cloudy state or a rainy state.

Transition probability

Usually the transition probability matrix is created by people who note down the actual weather states for many days and then they statistically estimate that if today is sunny, then the probability that the sunny weather might transition into a cloudy weather is 0.3. If the weather is cloudy, the probability that it might transition into a rainy weather tomorrow is 0.4. They calculate these probabilities and create a probability transition matrix like this:

Note that in such a matrix, the probabilities of the row values add up to 1. The 0.5 in the first row and first column of the matrix indicates the probability that if today is sunny, tomorrow will be sunny too.

The Trellis 
When we start with knowing today's weather and try to calculate probabilities of the weather of future days, we don't know what the weather will be in the future. Knowledge about the future is hidden, and we call those as hidden states.

So if we represent each state (sunny/cloudy/rainy) in a row, it would look like this:

Suppose the states were not hidden, and if we knew what the weather is on each day, the weather transitions could either be shown like this:

Or like this:

Representing states in this manner is called "Trellis" because it looks like a Trellis.

In a Hidden Markov Model, we don't know the states, so we represent all circles as empty circles and we add an additional row for the "final state". The final state is nothing special. It is just the weather of the last day that you are considering. It was not even necessary to add to the trellis diagram, but some silly person decided to complicate the concept and represent the final state as a separate row, to be able to clearly indicate that once the Markov Model transitions into the final state, it stops there and does not transition any more.

Since the final state could be either sunny, rainy or cloudy, I decided to show that row with all three colors.

The transition probability matrix would now have an additional row to show the final state:

So what this silly transition probability matrix now shows us is that the probability of transitioning from any weather today to the weather tomorrow is 0.2 or 0.1 or 0.3, but once you reach the final state, the probability of transitioning to any other state tomorrow is zero.

Why do we need to do all this?

In this simple example, we assume that a person could have a picnic if the weather is sunny or cloudy. If the weather is likely to be rainy, the person sits at home and reads a book.

Interestingly, suppose we observe that somebody went for a picnic on day 1, day 2 and day 3 and spent day 4 and day 5 reading a book, we would be able to use the trellis diagram to estimate which days were sunny, cloudy and rainy (the weather states are "hidden" and we need to estimate it).
To do that, you'll need another matrix called the "emission probability matrix".

Emission probabilities

The rows in the trellis represent sunny, cloudy and rainy respectively. Based on the state (weather) of any day, a person may decide to either go on a picnic or read a book. The decision is called an "emission" and is defined numerically by probabilities in an emission probability matrix.

An "emission" on a day just means the probability that one of the decisions will be taken. Either to read a book or to go on a picnic.

This is our emission probability matrix:

The rows in the matrix show the decision taken and the columns show the probability of taking those decisions for each day. If you were considering 10 days, then this matrix would have five more columns.

Just like the transition probability matrix, even the emission probability matrix is calculated beforehand by somebody who observes the weather and a person's decision to read a book or go on a picnic on multiple days and says that these are the probabilities for the Hidden Markov Model. Note that here too, the row values add up to 1.

Forward Trellis (Viterbi algorithm)

All that the Forward Trellis method says is that if you have a set of observations about the decisions a person took on the five days:
O = {picnic, picnic, picnic, read book, read book}

Using the Viterbi algorithm you can find out the probability of whether each of the days were sunny, cloudy or rainy. It shows you the shortest path through the trellis.

In the transition probability matrix and the emission probability matrix, the rows are represented by "i" and the columns by "j" in many textbooks.
So each probability value in the transition probability matrix would be represented as aij.
Since in the emission probability matrix the columns represent the days, I'll represent the columns with "k" instead of "j". The probability values in the emission probability matrix are represented as bik.

Now, given the observations O = {picnic, picnic, picnic, read book, read book}, if we want to know the probability that day2 is cloudy...

...we just have to use the Viterbi formula.
It is:

Probability of day2 being cloudy = Emission probability of picnic for day2 * ((probability day1 being sunny * transition probability of sunny to cloudy) + (probability of day1 being cloudy * transition probability of cloudy to cloudy) + (probability of day1 being rainy * transition probability of rainy to cloudy)).

Day1 in this case will begin with the sunny state having probability = 1 and the cloudy and rainy states will have probability = 0 for day1. We assume we know that the first day is sunny.
If you wish, you could also assume you don't know the weather for the first day, and assign initial probabilities for each state on day1.

To calculate probabilities for day3, you'd first have to calculate all probabilities of sunny, cloudy and rainy for day2.

The three converging gray arrows in the above diagram are just to show that we use the transition probabilities of sunny to cloudy, cloudy to cloudy and rainy to cloudy.

That's all there is to it. It's so simple, but people explain it in such contorted, complicated ways that it's hard to grasp and understand why things are being done like this and what purpose it serves. Luckily for you, NRecursions comes to your rescue again :-)

27 August 2018

Converting pgm files to jpg or png

Some software do not recognize the pgm image file format, even though it was created in the 1980's.

Luckily, Ubuntu has a pre-installed software named Imagemagick with which you can use this command to batch-convert all pgm files to png or jpg.

mogrify -format png *.pgm
mogrify -format jpg *.pgm

Try using it to convert from other formats too.

08 August 2018

Are you data privacy literate

I'm very surprised when I still meet highly educated Engineers and Doctors who still believe that the positions of the planets and stars have some effect on their daily luck and destiny. The result of their childhood curiosity and intelligence being channeled into believing and accepting unscientific information just because many others do so. Likewise, wasn't there a single person who could stand against sati and dowry for all those centuries? "Hey she's part of my family. How dare you suggest burning her!!!".

In today's times it's about standing against data misuse when society mocks you for doing so. Turns out there are still too many people who are insensitive to people's need for privacy.

If you are not paying for a product, YOU are the product
Few people realize the gravity of this.

Being a lifelong learner

A few years ago people were told that if they didn't know how to use a computer, they were as good as being illiterate. That school of thought has been upgraded. Today, you are illiterate if don't care/know how to protect your personal data.

I'm often asked why I don't use a certain messaging app, and get mocked by the "cool-crowd", for whom throwing caution to the wind is ok as long as everyone else is. Just like the enthusiastic users of radioactive toothpaste. If you love the messaging app so much, why don't you marry it? ;-)

I recently installed the messaging app, taking care to first deny all un-necessary permissions and then tried sending a message.  Couldn't do so unless I allowed it access to data on my phone. That itself is a huge red-flag. An app doesn't need such details to send a message. By having these details, there are a lot of conclusions that can be made based on the messages and calls you make. Even with encryption, don't forget who has the key to decrypt the message. Apparently a lot can be deduced just from the metadata associated with the messages. Apparently messaging apps compromise privacy and there's a lot the government can do to monitor people, just the way corporations may.
There's also the un-ending stream of notifications. Every Tom, Dick and Harry has created messaging app groups which you are expected to join.

So what do I do with an intrusive, irritating third-party app that forces me into sharing personal details?

Cartoon from Wumo

Oddly, free messaging apps are still popular in-spite of people telling their friends to stop using it. Free web-based email was invasive enough. Now they have access to even more intimate information about you through your smartphone.

Hope you've heard of how they switch on your phone's microphone without your permission to listen in, or misuse your phone for bitcoin mining. Turns out there are hundreds of such apps.

But my data is already out there...

Sure it is, but remember that data analytics works well only with lots of data. So you can stop putting more data out there right now. In fact, don't start today. Start yesterday!
Everything you type in a chat is potentially being analyzed. NLP can actually understand the grammar and associate it with context. Your contacts from various platforms are being integrated to know more about who you are and whom you interact with. A trainer from a networking company once told me that people's profiles get created by the ISP, based on their internet searches and activities, and any deviation from that pattern gets recorded as an anomaly. The history of your life, locations and activities are being recorded. The content in the files you upload are being analyzed. It's like standing on a rooftop and yelling out your intimate personal details to strangers. You don't have to.

Behind the scenes

You should know what goes on in such corporations. If the recent scandal of misusing data isn't enough, the god mode of app-based taxis is another. Internet searches will show you more. I wouldn't blindly believe privacy policies, since it's historically known that corporations can flout those rules or have cleverly written clauses that help them do whatever they want with your data while you give them your precious trust.

Any company (fraudsters too) can buy your personal data from any of the many data gathering companies and associate various intimate aspects of your personal life. For many of you, that sounds ok until you realize that someone you personally know can be working in such a company and looking at that information.

Don't believe me? Did you hear about Mr.Professional Stalker? Or how app-based taxi employees secretly stalked their ex'es and celebrities?
You only know about these because these made it to the news. There are a zillion other companies (and fraudsters) using your data and there can be people you personally know, who could be looking at a history of your activities you don't even remember, and using analytics to associate that data to make conclusions about you.


Yesterday I received a call from someone who had some details of my bank account and was asking me about transactions that I had not done. He calmly reassured me he was not asking for my password or pin details. I promptly reported this number to the bank and they confirmed it was a fraud call. For those of you who don't know, this is called Social Engineering. A method of questioning that makes you unsuspectingly give out personal data which you think is safe, but they can associate that data with more data they already gathered about you to commit fraud. If you still think this is too far fetched and will never happen to you, then you are very naive.

Few years back a girl looked at me like I'm an idiot when I casually told her why she shouldn't have posted her vacation details on Facebook. What harm could it do, she thought. https://m.youtube.com/watch?v=e0qrEnCbGIE

Now the rich and powerful are realizing that their own data is being misused, and are formulating laws to restrict it. Don't rejoice yet. These people would only formulate laws that protect themselves. You are still responsible for your own safety.

Yet, many of you still don't care. Just lazy eh? Too busy? Well, can't blame you really...in my phone, I was surprised at the plethora of options to restrict apps from accessing the internet and the huge number of settings I had to visit to turn off features that compromised my security/privacy. It does take a lot of time to do this. It's worth it though. Do it everywhere. Google account, Facebook, LinkedIn, Twitter, location services...everywhere. Don't give out your number at grocery stores or pizza stores. Don't use public WiFi unless you can ensure safety. Don't use public proxy servers or free VPN's. There's no free lunch.

Sensitivities need to adapt with changing times. You may be ok with your data being misused, but there are an increasing number of people who are not ok with it. Respect their need for privacy and security.

While you are busy telling people that you don't care about privacy, could you prove it by leaving the door of your house unlocked too?

01 August 2018

Causes of phone battery bloating / swelling

Lithium Ion batteries are known to function poorly when subjected to heat. It turns out that it's not just function that declines, but gases formed by electrolyte decomposition can cause the battery to bloat and potentially explode.

Few causes I know of are:

  • Using the phone for a phone-call while the phone is being charged.
  • Using the phone in a hot environment or subjecting it to heavy use (watching videos or gaming) while it is being charged.
  • Using mobile hotspot for long durations (more than half an hour) at a stretch.
  • Continuing to charge the mobile even after it reached 100% charge.

I'm surprised that many phones today have a moulded build with a non-removable battery. Although the battery can be replaced at a service center, the "non-removableness" means that if the battery gets bloated, it won't have any space to expand into, and it'll permanently damage the phone. The better option is to go for phones with removable batteries. Here, when the battery expands, it'll pop off the back cover and at least your phone's components won't get damaged. You just have to replace the battery instead of replacing the entire phone.

17 July 2018

Learning R through tutorials

R can be daunting initially, but you'll become comfortable with it in a month, compared to something like Matlab which takes longer to master. A good tutorial can speed up your comfort-level even more.

One good tutorial I found was this.

08 July 2018

Take a screenshot of a rectangular area of the screen in Ubuntu or Mac without any extra software

This was a niggling problem for so many years, and I didn't know the solution was right under my nose.

In Ubuntu, if you want to take a screenshot of the entire screen, use the PrtScn key on the keyboard. Everyone knows that.

To take a screenshot of the current active window, press Alt key and the PrtScn key together. Most people know that too.

What most people don't know is that you can take a screenshot of a rectangular selection of an area on your screen by pressing Shift and PrtScn together, and then using the mouse pointer to select an area on the screen.

Oh...how much time it'd have saved me if I knew this earlier!!!

On a Mac you can do the same using pressing the keys Command Shift 4 together.

17 June 2018

Privacy in the age of Augmented Reality

It is said that the stone age lasted 2.5 million years.
The agricultural age lasted a few millenia.
The industrial age lasted a few centuries.
The digital age lasted a few decades.

Now we are said to be at the brink of a new age: "The age of Augmented Reality".

We are already there. We have virtual reality headsets. We pick up our phone to internet-search for answers when we have questions.
More importantly, we are at the doorstep of mind-controlled devices. Microsoft has already applied for a patent. You could eventually even communicate with people or animals without having to speak a word.

While it's fascinating to see what technology has to offer, there's also the element of loss of privacy. Governments and corporations got access to our homes via the internet. Now they have even more intimate information of us via our smartphones. With brain controlled devices, they are posed to have access to the very depths of our thoughts.

I had an SMS conversation with a friend about a cab service and in a couple of hours and on the next day I received SMS advertisements about that cab service. I have never received ads about that service before in my life. So now the harsh reality is that even if you avoid messaging apps, your mobile phone service providers are monitoring and monetizing on your SMS'es and perhaps your conversations too.

There's plenty more.

In-spite of knowing this, most people happily allow companies and governments to have access to their personal data. Because the services are free!
Given the way my data was used without my consent for sending me an SMS ad, I wonder if we would have privacy even if we paid for using GMail, Facebook, WhatsApp etc.

Are our thoughts going to remain private?
I'm very sure that companies designing mind control apps and devices will also be devising an elaborate strategy to convince people to happily give them access to their minds and thoughts. In the same care-free way we do with free email and messaging services.

People born before the 1950's used to envy us when they saw how fascinating the digital age was. They weren't able to adapt quickly enough to use it, but they wished they could. They told us how lucky we were to be born into the "jackpot generation" which is witnessing a phenomenal change in technology.
Some people don't envy us though. They envy our grandparents, saying that maybe they lived in a world with lesser access to medical facilities, information, travel and a lot of other things we enjoy. Even then, they lived happier lives where life moved at a more comfortable pace. Food, air and water were purer. Privacy was something they had a lot more control over.

There will come a time though, when people will ask if privacy is really so necessary? A time when information and thought will be so pervasive that it'll become a lot easier to trust someone because you already know everything about them. This would remove barriers of communication and basically the entire world, not just humans, but even animals and perhaps trees and plants would be able to function as a single cooperative entity, sharing knowledge and purpose. Perhaps leading to a time when we finally answer the ultimate question of why we exist in this universe!

22 May 2018

Sick or sleepy? You ruin it for everyone by forcing these kids/employees to push themselves.


Put simply, would you prefer that one sick employee or student remains away from everyone else for the duration of their contagious illness or would you prefer that more people to get infected and productivity plummets?

In a competitive world, it's sad how people push themselves (or are pushed) to show up for work even when sick. You don't have to. Even with a chronic illness or a family member's illness, there are ways to manage it. Businesses always have a contingency plan for employee absence. It's their job to arrange for that, and not something that you have to worry about when sick.

What you should do:
Even if your illness is not contagious, stay at home and get rest. A company will never sacrifice their profitability for you, so you have no obligation to sacrifice your health for them either. I've seen this happen with multiple people. Sacrificing health for work or schooling is never worth it.


Again, put simply: Would you prefer that people remain productive throughout the day by taking naps in-between or that they remain productive for half a day, making mistakes (which will cost you a lot in re-work) and struggling to remain awake and focus?

When you feel sleepy, you aren't supposed drink tea/coffee to shake off the feeling. You are supposed to sleep.
There's some kind of a 'laziness-related' social stigma attached to sleeping in class or in office. It's high-time we got rid of this sleep depriving culture.
Human beings are not meant to sleep only at night. We naturally have either Biphasic sleep or Polyphasic sleep. This means that when you feel like sleeping, it's perfectly ok to sleep or take a nap. You don't have to apologize to anybody. If you want to be super-human and avoid sleep, you need to take a serious look at how sleep deprivation can wreck havoc on your body.
I had a teacher who scolded students for yawning in her class. I have another teacher who feels insecure when students feel sleepy in his class. Both of them didn't seem to observe that it wasn't the fault of their teaching. All they had to do was allow the students to take a 10 minute nap.

What you should do:
If you are at a school or college, the authorities should either specifically allow for a nap break during afternoon classes, where students can rest their head on the table and take a short nap, or arrange for monitored sleep boxes.
If you work at a company, convince your bosses on how a short nap can help boost productivity without negatively affecting morale. Don't worry. There's plenty of evidence about this already. Companies can definitely introduce sleep boxes.

Sleep deprivation begins early in life and continues unless you stop it:

  • Waking through the night when the kid is born.
  • Kids sleep cycles and duration being ruined when woken up for school.
  • The horrible sleep-depriving culture of "burning the midnight oil" or waking up early during exams.
  • Eating improperly cooked or burnt food, leading to sleep loss
  • Having to wake up early to go to office.
  • Staying up to complete a project (which never gets complete).

In an ideal world, one should be allowed to sleep and wake up as per their natural sleep cycle rather than society's insensitive work cycle. When one is sick, one should be able to take rest instead of worrying about losing out in the rat race. Advice like this can only show you the way. You are the one who decides to walk the path.

You won't believe it until you measure it

If you feel you are getting enough sleep, write down how many hours of sleep you think you are getting. Then, keep a piece of paper and a pen by your bedside. Every night and morning, note down the approximate time you woke and slept.
Have two columns.
  • Column 1: Number of hours of un-interrupted sleep. 
  • Column 2: Total sleep = un-interrupted sleep + remaining sleep duration in hours.
Interrupted sleep happens when you are woken by noise at night, need to go to the toilet at midnight, are woken by discomfort in your stomach after having eaten bad food etc.

Do this experiment at least for a week or two.
I assure you; you'll be startled at how less sleep you are getting. Do let me know what your results were and I'll tell you how to improve.

19 May 2018

How to install R and R Studio on Ubuntu 16.04

The desktop installation of R is very simple if you are ok with Ubuntu's apt version.

You will need to have libcurl, libxml2, gtk2 and openssl installed on Ubuntu if you want to install certain packages, so also run these commands:
sudo apt-get update
sudo apt-get install libcurl4-openssl-dev
sudo apt-get install libssl-dev
sudo apt-get install libxml2-dev
sudo apt-get install libgtk2.0-dev

Method 1 (recommended):
One way that worked for me is to add the repository:
sudo add-apt-repository 'deb https://ftp.ussg.iu.edu/CRAN/bin/linux/ubuntu xenial/'
sudo apt-get update

Now you might get an error like this:
W: GPG error: https://ftp.ussg.iu.edu/CRAN/bin/linux/ubuntu xenial/ InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 51716619E084DAB9
W: The repository 'https://ftp.ussg.iu.edu/CRAN/bin/linux/ubuntu xenial/ InRelease' is not signed.
N: Data from such a repository can't be authenticated and is therefore potentially dangerous to use.
N: See apt-secure(8) manpage for repository creation and user configuration details.
W: There is no public key available for the following key IDs:

The solution is to add the key:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys KEYID
where you have to replace KEYID with the key ID shown in the error. Marked in bold three lines above this line.
Now update and install.
sudo apt-get update
sudo apt-get install r-base 
sudo apt-get install r-base-dev

Method 2: 
sudo apt-get update
sudo apt-get install r-base

Download the jpeg62 dependency from here and run dpkg to install it.
sudo dpkg -i libjpeg62_6b2-2_amd64.deb

Download R Studio from here and run dpkg to install it.
sudo dpkg -i rstudio-xenial-1.1.453-amd64.deb

Method 3:
However, if you want the latest version (advisable if you plan to install packages like arules) or you want to upgrade the version installed above, follow this:
sudo gedit /etc/apt/sources.list
Enter this line in the sources.list file (substitute the right Ubuntu version. Xenial is for Ubuntu 16.04): deb http://cran.rstudio.com/bin/linux/ubuntu xenial/
sudo apt-get update
sudo apt-get install r-base

16 May 2018

Using email like chat

Many people prefer using a messaging app for chats and email for a formal, letter-like communication. Certain people insist on always starting an email with a greeting like "Dear x", "Hi" or "Hello". It doesn't necessarily have to be that way.

If we rigorously stuck to tradition, we would be writing on cave walls and living in the jungle. There's nothing wrong with introducing change. There's nothing wrong in reducing the degree of formality to make someone feel comfortable or to make technology more cordial and usable.

If you are someone who does not like using invasive smartphone apps and are in a bit of a hurry to start/continue a conversation in a chat messenger, feel free to use email. Many modern email clients support the "email conversation" format, so it's easy to follow the conversation even if multiple people are in the conversation. It's easy to add attachments, include/exclude people, use highlighting and make use of all the other functionalities that an email offers.
Given that there are smartphone apps for email, you can even get your updates in the same way you'd get it from a messaging app.

When you are having a conversation with someone in the same way that you'd want to converse on a messaging app, go ahead and skip the formalities of email. Use email like a messaging app.
When using email for formal communication, use the standard letter-writing style and use all necessary formalities.

Let's keep technology flexible and make it comfortable for everyone. It's like how the heads of Google spoke about their choice to wear t-shirts instead of suits; they said "You don't have to wear a suit to be serious".