N Recursions: 2018

30 December 2018

The silo-world of messaging apps

Messaging apps are convenient. Everybody agrees to that. It's easy to access notifications, easy to share files and it does not have the "formality" baggage that email comes with.

Yet, messaging apps have a fundamental flaw that services like email and SMS do not have: Universality.

With email and SMS, you can send messages between any service provider. However, you can't send messages between WhatsApp and Hangouts or WeChat or Telegram. A simple process of following a protocol like XMPP or creating a global standard that all messaging apps should have followed.

So now we are stuck in a world where we compulsorily have to install whatever app a group of people uses, even if we don't want to.

Indifference

Oddly, even though the objective of messaging apps is to bring people together, that does not necessarily happen.
One of my classmates got in touch with few other classmates after sixteen years, via phone and email. He took initiative to get in touch with more classmates, and with two months of effort, he put many other classmates in touch with each other. Everyone was happy about the reunion. They asked him to create a group on a messaging app and he told them he doesn't use that app. So they created the group themselves and all of them joined the group, leaving him out. They just didn't care if he wasn't part of the group. To them, it was his fault that he doesn't use the app. Needless to say, when I saw this cold response from everyone, I refused to join the group too.

What you can do

Tech identification: First, learn the science of identifying an appropriate communication medium. You can't use a messaging app to form a group of a diverse set of people, since many of them may not use apps. Use a more common medium like email. It's not convenience that matters here. It's uniform communication and participation.
Talk to people: Ask everyone what they are comfortable with. If somebody is not very tech-savvy or does not use the tech y'all chose, either check if everyone else is ok with using the tech this person proposes or offer to keep the person informed always. It's your responsibility to keep the group united.
Know the nature of communication: When there is an urgent piece of info to communicate, the ONLY reliable way to do it is to have a phone call. That's the only way you know the person has received the message. If you use email, SMS or a messaging app, don't assume that the other person has seen the message unless they respond with an "ok". Often, the response or read-receipt could be a result of somebody else looking at the message too. Do the phone call instead.

Hoping for an RFC that would be accepted universally. Messaging apps need the ability to talk to each other.

11 November 2018

Which is better? Octave or Matlab?

I'd say choose Matlab if you can. It's much faster than Octave.

Speed
A Sudoku program I developed used to take two minutes to run 2000 iterations in Matlab. The same program took 13 minutes to run 500 iterations in Octave.

The reason apparently, is because Matlab uses Intel's Math Kernel Library (MKL) internally, and I have personally seen the significant speed boost MKL gives, when I tried it for some of my C++ code I developed long ago.

GUI
Apart from this, there's the Matlab GUI which is far better than Octave in terms of customizability and plain-old user friendliness.

Granted, that Octave is being built by the open source community, but the community should have seriously considered the aspect of processing speed before starting the project. Even now it isn't impossible to make the switch to MKL. The speed changes I had seen in my C++ program were after I replaced the old code with function calls to MKL and it worked.

Good websites to download masters or PhD thesis / dissertation

To find a good university website or any other website where theses have been published and is available for free, your best bet is to first try using a search engine.

Eg: If you are looking for swarm intelligence using image segmentation, search with the terms:

"partial fulfillment" swarm image segmentation masters thesis pdf

Some other websites that can help are:

Update [June 2020]: On a side note, if researchers need to not only see the papers but also see the code, there's a website called Papers With Code.

08 November 2018

Aha!

Continued from the previous Aha!

Pachcha Malayalam

Share with this link

To be continued...

05 November 2018

A simple tutorial showing some basic PGMPy program code and explanations

Installation

Firstly, it's recommended you have the latest version of Python3 installed.
Python3 uses pip3 to install packages that you'll be importing into your programs, so ensure pip3 is installed too.
Install git.
I earlier prefered not to use Anaconda or miniconda due to version conflicts, unavailability of packages and dependency issues, but it's actually ok to use it. It's a good software. If any packages are missing, you can install them. This tutorial assumes you aren't using Anaconda.

Another option is to setup a virtual environment first, because different Python packages are dependent on different versions of each other and they can cause problems with the default Linux python, which will then lead to problems with the operating system. I've found this tutorial to be good for installing and using pyenv.

(Also have a look at Poetry)

Now install a bunch of peripheral packages (Not all of them are required. If you want to install just the bare-minimum, see the documentation here):
sudo pip3 install sphinx nose networkx numpy cython pandas setuptools IPython matplotlib pylab python-tk gensim spacy

Clone PGMPy and install:
git clone https://github.com/pgmpy/pgmpy
git checkout dev
sudo python3 setup.py install

If you are wondering which IDE to use for Python, I've put up my own little review here. I prefer LiClipse because my laptop has just 2GB RAM and LiClipse supports refactoring and autocomplete reasonably well. You could use PyCharm or VS Code (VS Code is also good enough for computers with low RAM or low processing power) too.
Also know about PythonAnywhere.

Tutorial

Try creating a basic Bayesian network like this:

from pgmpy.models import BayesianModel
import networkx as nx
import matplotlib.pyplot as plt

# Defining the model structure. We can define the network by just passing a list of edges.
model = BayesianModel([('D', 'G'), ('I', 'G')])
nx.draw(model, with_labels = True);
plt.show()

If you encounter this error: "AttributeError: module 'matplotlib.pyplot' has no attribute 'ishold'", see this issue for the solution.

To add conditional probabilities to each of those nodes, you can do this:

from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
#import networkx as nx
#import matplotlib.pyplot as plt

# Defining the model structure. We can define the network by just passing a list of edges.
model = BayesianModel([('D', 'G'), ('I', 'G')])
#nx.draw(model, with_labels = True); plt.show()

# Defining individual CPDs.
cpd_d = TabularCPD(variable='D', variable_card=2, values=[[0.6, 0.4]])
cpd_i = TabularCPD(variable='I', variable_card=2, values=[[0.7, 0.3]])
cpd_g = TabularCPD(variable='G', variable_card=3,
                   values=[[0.3, 0.05, 0.9, 0.5],
                           [0.4, 0.25, 0.08, 0.3],
                           [0.3, 0.7, 0.02, 0.2]],
                  evidence=['I', 'D'],
                  evidence_card=[2, 2])
# Associating the CPDs with the network
model.add_cpds(cpd_d, cpd_i, cpd_g)
if model.check_model():
    print("Your network structure and CPD's are correctly defined. The probabilities in the columns sum to 1. Good job!")

print("Showing all the CPD's one by one")
for i in model.get_cpds():
    print(i)
print("You can also access them like this:")
c = model.get_cpds()
print(c[0])
print(model.get_cpds('G'))
print("Number of values G can take on. The cardinality of G is:")
print(model.get_cardinality('G'))

Output CPD tables:

Here, the "variable_card" isn't about a "card". It's the cardinality of the variable. Same with evidence_card.
Have a look at the CPD tables and you'll see that variable cardinality for G is 3 because you want to specify three types of states for G. There's G_0, G_1 and G_2. For the variable cardinality and evidence cardinality, I feel the creators of PGMPy could've programmed it to automatically detect the number of rows instead of expecting us to specify the cardinality.

So variable cardinality will specify the number of rows of a CPD table and evidence cardinality will specify the columns. Once you've specified the CPD's, you can add it to the network and you are ready to start doing inferences.

Try a slightly larger network

from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination
import networkx as nx
import matplotlib.pyplot as plt

# Defining the model structure. We can define the network by just passing a list of edges.
model = BayesianModel([('D', 'G'), ('I', 'G'), ('G', 'L'), ('I', 'S')])

# Defining individual CPDs.
cpd_d = TabularCPD(variable='D', variable_card=2, values=[[0.6, 0.4]])
cpd_i = TabularCPD(variable='I', variable_card=2, values=[[0.7, 0.3]])
cpd_g = TabularCPD(variable='G', variable_card=3,
                   values=[[0.3, 0.05, 0.9, 0.5],
                           [0.4, 0.25, 0.08, 0.3],
                           [0.3, 0.7, 0.02, 0.2]],
                  evidence=['I', 'D'],
                  evidence_card=[2, 2])

cpd_l = TabularCPD(variable='L', variable_card=2,
                   values=[[0.1, 0.4, 0.99],
                           [0.9, 0.6, 0.01]],
                   evidence=['G'],
                   evidence_card=[3])

cpd_s = TabularCPD(variable='S', variable_card=2,
                   values=[[0.95, 0.2],
                           [0.05, 0.8]],
                   evidence=['I'],
                   evidence_card=[2])

# Associating the CPDs with the network
model.add_cpds(cpd_d, cpd_i, cpd_g, cpd_l, cpd_s)
# Getting the local independencies of a variable.
print("Local independencies of G:")
print(model.local_independencies('G'))
# Getting all the local independencies in the network
print("Local independencies of other nodes:")
model.local_independencies(['D', 'I', 'S', 'G', 'L'])
# Active trail: For any two variables A and B in a network if any change in A influences the values of B then we say that there is an active trail between A and B.
# In pgmpy active_trail_nodes gives a set of nodes which are affected by any change in the node passed in the argument.
print("Active trail for D:")
print(model.active_trail_nodes('D'))
print("Active trail for D when G is observed:")
print(model.active_trail_nodes('D', observed='G'))

infer = VariableElimination(model)
print('Variable Elimination:')
print(infer.query(['G']) ['G'])
print(infer.query(['G'], evidence={'D': 0, 'I': 1}) ['G'])
print(infer.map_query(['G']))
print(infer.map_query(['G'], evidence={'D': 0, 'I': 1}))
print(infer.map_query(['G'], evidence={'D': 1, 'I': 0, 'L': 0, 'S': 0}))

nx.draw(model, with_labels = True); plt.show()

What is going on in the code:

The variable elimination queries are pretty-much self-explanatory, where you are querying for the state of a node, given that some other nodes are in certain other states (the evidence or observed variables). Inference is done via standard variable elimination or via a MAP query. PGMPy also allows you to do variable elimination by specifying the order in which you want to eliminate variables. There's more info about this in their documentation.
What the line infer.query(['G']) ['G'] returns a Python dict (which is like a multimap). The output looks like this:

{'G': 0}

Here, 'G' is the key of the dict, and to access the value associated with 'G' (which is zero in this case), you just have to use ['G']. So it is the equivalent of doing:

q = infer.query(['G'])
print(q['G'])

Why this tutorial

For anyone new to Python or PGMPy, a lot of this syntax looks very confusing, and the documentation does not explain it deeply enough either. The objective of this tutorial was to clear up those basic doubts so that you could navigate the rest of the library on your own. Hope it helped.

PGMPy is created by Indians, and is quite a good library for Probabilistic Graphical models in Python. You'll also find libraries for Java, C++, R, Matlab etcetera. If you want to manually try out your network model, there is an excellent tool called SAMIAM.

More documentation for PGMPy is here, and if you want their e-book, an internet search will lead you to it.

16 October 2018

Installing Netbeans 8.2 on Ubuntu 16.04 or 18.04 for Python functionality

Most Python editors are either not very functional or they take up way too much memory or they are just a pain to install. If you want a simple editor for Python, try Geany. Note that if you use Python 3, you'll have to specify it in Geany's compile and execute commands.

If you are a Netbeans fan, you'll need Java 8 to be able to install Netbeans.

You could either choose to install from this PPA:
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

Or install from the default JDK:
sudo apt-get install openjdk-8*

Now open the environment file:
sudo vi /etc/environment

Add the following line to the environment file (for Netbeans to be able to find Java):
JAVA_HOME="/usr/lib/jvm/java-8-oracle"

Save and exit.

source /etc/environment
echo $JAVA_HOME

Now download the Python plugin for Netbeans 8.2. You'll get a zip file. Extract it and open Netbeans.

Goto the Tools > Plugins menu option.

Select the "Downloaded" tab.
Click "Add Plugins".
Select the folder that you just extracted from the zip file and you'll see many ".nbm" files.
Press Ctrl+A to select all of them and press "Ok".
Click the "Install" button on the bottom left corner. Accept the license terms and click "Install".
Restart Netbeans.

Select File > New Project > Python > Python Project.
You'll see a "Manage" button which you can use to change the python platform to Python 3 if you need to.

Click the "New" button, navigate to /usr/bin/ and you'll find a "python3" file among a lot of files. Select it and click "Ok".
Now select Python 3 and click "Make Default".
Click "Close".
Click "Finish".

That's it! Your Python 3 project is ready to run in Ubuntu with Netbeans.

A better tutorial on the Haar features used in Viola Jones algorithm

One of the most confusing aspects of the Haar features used for the Viola Jones algorithm is the black and white rectangles. It took quite a while for me to figure it out since neither the research paper of Viola and Jones explained it well nor did any of the tutorials. Besides, the concept shouldn't be shown as filled black and white rectangles in the first place. So here's a change:

Instead of showing Haar features like this:

Show them like this:

It is necessary that people intuitively understand that it is not the black and white rectangles that are important, but the actual pixel values within the rectangles that are important. For good contrast, show them as yellow and red rectangles if you like.

Source of the original images: University of Oulu's website.

Why are we using those rectangles?

If you were searching for a line in an image, you'd use a mask that's shaped like a line. Same way, when we search for a face, we can use a mask that is shaped like a face or to reduce computation, we could just search for parts of the face that almost always have dark and bright pixels in a certain pattern. The eyes and forehead are one such example. The pixels at the eyes will almost always be dark and the pixels at the forehead will almost always be brighter than the pixels at the eyes. So the black rectangle in the figure above just says that we are looking for a rectangular region where most of the pixels will be dark. The white rectangle above it means that wherever we intend to find such dark pixels (eyes area), we want to be sure that the rectangular area directly above it should compulsorily have plenty of bright pixels (the forehead area).

Of course there would be plenty of other places in the picture which would have similar bright and dark areas, but the area which is most likely to be the eyes and forehead, will give the highest Haar value (calculated in the formula below). This is also why you should not only search for the eyes and forehead, but also search for the nose and lips and ensure that the features you found are in the correct positions with respect to each other. That's how you'll be able to ensure that you have located a face. So based on what kind of black and white pixel pattern you are searching for, you have to design your Haar feature (the black and white rectangles) in such a way that it results in the highest value for whatever feature you are searching.

How to do the calculations?

Start by normalizing pixel values. If you have your grayscale image pixel values in a 2D matrix M that can hold grayscale values from 0 to 255, then divide all values in the matrix by 255 to normalize them. M will now have values ranging from 0 to 1.

Haar value = ((sum of values within white rectangle area in M) divided by (number of pixels within white rectangle)) minus ((sum of values within black rectangle area in M) divided by (number of pixels within black rectangle)).

The closer the Haar value is to 1, the more likely it is, that you've found a facial feature you were trying to match. In the image above, we were trying to find areas where the darker pixels of the eye region have an area above them consisting of lighter pixels of the forehead.

Other Haar feature shapes

Don't worry when you see shapes like this:

It's the same concept. Simply take the sum of all pixels from both white areas in the normalized image matrix M and the sum of all pixels from both black areas in M and subtract in the same way we did earlier. This particular shape is to detect some dark diagonal feature. You can also create your own Haar feature shapes based on what facial feature you are trying to detect.

To learn more about Integral images and Haar features, I recommend Balazs Holczer's tutorials. Well explained, and it's pleasantly amusing to hear him say "Lots of lots of" and the way he says "Feeeeeatures" :-)

Integral images

Haar features

14 October 2018

Skills requred in the field of Machine Learning and Data Science

While doing a literature survey for an assignment, I came across this Medium post by Jeff Hale where he lists some of the skills and technologies that are most in demand during 2017 and 2018 for jobs in Machine Learning.

What a job-seeker should know is that it isn't the highest bar in the graph they should be looking out for. These graphs only show you what the industry wants. There are many job descriptions that list an array of skills but the recruit is actually made to work on something much more mundane like data cleaning or preparing presentations.

It's far more important to identify which area of Machine Learning interests you. Do your own little research or build hobby projects specific to that area of interest and see if you can integrate it with other Machine Learning paradigms.
When you look for a job, don't just look for the skill-set they require or the projects they work on. Look for what your role would be in the project and have a very careful look at the Glassdoor reviews and Indeed reviews about the company culture. During the interview, make sure you get to meet the actual people who would be supervising you and if possible, even the team you'd be working with. Past experiences have shown that the way you are treated before, during and after the interview are a very good indicator of how you'll be treated after you join the company. You can even identify potentially toxic people you'd want to avoid.

Machine Learning and Data Science technologies are here to stay. If you've got yourself equipped with the necessary basic skills, you won't find yourself struggling to land yourself a job. What you do have to worry about is whether you end up as just a soldier in an army of data gorillas or whether you build products that you enjoy building, do research that is fulfilling and make the world a better place for everyone.

13 October 2018

A better way to evaluate students and improve education

It's one thing to go through school and college because you are forced by your parents. It is another thing to do it because you really want to learn.

I've had the latter opportunity during the past one and a half years of pursuing my M.Tech in AI.
Getting back to the classroom environment after spending a decade as a professional working in the industry, one gets to view academia with a broader perspective and an understanding of how things work.

1. Poor textbooks

Wanting to revise some of the basics of integration, differentiation and partial differentiation, I looked up my old Bachelors of Engineering textbooks and was shocked at the content.
The explanation of concepts was almost non-existent. The problems that were worked out were lacking many steps that would help a student understand it. There was hardly any explanation about the history of the techniques. There was no explanation about how these techniques would be used in practical applications.

How to improve:

A consortium of students can evaluate textbooks to check if it conveys the concept in a way that newbie students would be able to understand it even if there was no teacher to teach the subject. Stop buying any textbooks that are not good enough.
Create a wiki which collates the best three internet sources that teach any particular concept very well.
Teachers across the globe who are best known for teaching a particular subject well could prepare course material and distribute it for free to the world (one such excellent source is Coursera).
Create a template for textbooks, which specifies how the author should introduce the topic. First a brief history on what was lacking which caused the introduction of a new technology or technique. Then an introduction to the technique and comparison on how it fares with respect to other techniques. Then an introduction to the technique itself and finally, how the technique is used in various real-world applications.

2. Poor explanation in classrooms

I remembered that a majority of the teachers in schools and colleges were people who either didn't know the subject well enough or didn't know how to break it down into concepts that could be digested by the students easily. It isn't entirely their fault either. They themselves were a product of the same education system that didn't care if the students actually learnt anything. I've never heard of colleges conducting any screening/audition sessions for teachers to check if they had the ability to actually teach!

On a side-note, an often ignored point in classrooms is sleepiness. When you feel thirsty, you don't go and start exercising. You drink water. Same way, when you feel sleepy, you aren't supposed to drink coffee to stay awake. You are supposed to sleep. When students feel sleepy in class, they should be allowed a 10 minute nap instead of being asked to remain awake.

How to improve:

Before hiring a teacher, conduct a session where they are asked to explain at least three different randomly chosen topics of varying difficulty. The teacher should be allowed sufficient time to prepare for the topics, but then be rigorously evaluated on their ability to convey the topic in a simple manner that students can understand easily. They can choose any method to deliver the lecture. Just spoken words, the white-board, a slideshow or even VR.
The teachers can also use a template for teaching, where they first introduce the history of the technology, compare it to other techniques, teach the actual technique and then explain how it is used in real-world applications.
Repetition helps. So it can also help to adopt a teaching technique where the teacher covers the entire syllabus in a few days, where the basics of all concepts are touched upon, and then takes up each topic one-by-one. It's a reality that a student's mind might wander during a lecture, and the second repetition can help them get back on track.
Allow sufficient breaks and nap-time when students feel sleepy. A ten minute nap can be refreshing for them and it'd help if teachers can also take small breaks.

3. Wasted, unsafe practicals

Lab sessions were introduced so that students would get hands-on experience with whatever they learn. However, not all labs go as they are intended. They either follow a mundane list of to-do things or are just plain boring. Many schools and colleges don't use safety equipment either. I heard of a student at NTTF who lost an eye when a piece of metal shot into his eye while he was working on machining it.

How to improve:

Use safety equipment. There's no excuse.
Use half the lab time to allow students to practice what the lab manual stipulates and the other half, where the teacher challenges them to try creative things (that aren't dangerous) to tweak their existing understanding and see what happens if they try something different. This is a precious childhood trait we all have, which gets crushed by years of disciplining. It helps to unleash this trait in a safe, controlled environment and observe how learning actually becomes fun.

4. Scrap the written exam

Every student learns differently and at a varied pace. You can't put everyone in a similar class with horrible teachers and expect them to actually learn.
In all those subjects I didn't score well, my parents, relatives and I used to think I was too dumb to learn. In later years I learnt that it was the above three points that made the subject boring and un-learnable. The subject was actually easy. It was very interesting too, when I looked at it after many years. Yet, at the time I studied it it seemed horrible.
Moreover, many of those who were extremely adept at memorizing information and reproducing it with perfection in the exam hall were clueless when they were asked to generalize and creatively apply the concept. They didn't even know where to start.
A high score in an exam does not mean the student is intelligent. It means they have the capability to assimilate information and remember it for longer than others. This does not mean they would be able to apply the concept well in real-life.
So when you hire people into your organization, think about the role. If all you want are people who do what they are told, go ahead and hire those with a high GPA. If you want people who love applying concepts and building things, hire those who create their own personal hobby projects. One crucial point to note is that you shouldn't manage the latter bunch of people in the same way as you'd manage the former. The creative bunch of people deserve a lot more trust, freedom of thought and expression. If you constrain them, it's as good as having not hired them at all.

How to improve:

Either scrap the written exam altogether or create two types of exams that students can choose from. One exam which is the typical exam where people can memorize things and write it. Another exam where students are given challenges to apply what they've learnt and even come up with new discoveries. Don't make goldfish climb trees. Don't crush the confidence of children by showing them a written-exam score which does not really tell them anything about their innate skills.

Perhaps the education system would only be able to change once the industry starts being more specific about the kind of people it hires. Our roads are in a bad shape today because of people who don't really care about creating good roads. Many doctors can't diagnose patients well because they never really wanted to become a doctor. Many engineers disregard safety and design best-practices because they never really wanted to be engineers. You can see this in every other profession: Religion, politics, education, manufacturing, sales, aviation...

Isn't it time we had a system which could evaluate children for what they are best at, and allowed them to pursue that as a career interest? To allow people to pursue what they love doing and are good at doing. If anything, it'll lead to a happier, more comfortable world to live in.

28 September 2018

The basics of Hidden Markov Models with Forward Trellis

With anything related to Mathematics, I'm surprised how tutorials on the internet and research papers rush into complex equations and variables without first explaining the basic concept which can help a student get a grasp of what the equations mean. There are far better, simpler ways everything can be explained, and that's how I intend to explain Hidden Markov Models.

States

Let's take the common example of having three possible days. The day can be either sunny, cloudy or rainy.

To have a simpler representation, I'll show them by their primary colors.

Our objective is to assume that we start with one of these days and then calculate the probability that the next day would be sunny, cloudy or rainy. Then we calculate the same for the next day and the day after that and so on, until we want to stop.

Although the above figure shows what the weather is on each of the five days, in reality we don't know it, and we want to calculate the probability that any particular day would be sunny, rainy or cloudy.

We refer to these weather situations as "States". So on any day the weather might be in a sunny state or a cloudy state or a rainy state.

Transition probability

Usually the transition probability matrix is created by people who note down the actual weather states for many days and then they statistically estimate that if today is sunny, then the probability that the sunny weather might transition into a cloudy weather is 0.3. If the weather is cloudy, the probability that it might transition into a rainy weather tomorrow is 0.4. They calculate these probabilities and create a probability transition matrix like this:

Note that in such a matrix, the probabilities of the row values add up to 1. The 0.5 in the first row and first column of the matrix indicates the probability that if today is sunny, tomorrow will be sunny too.

The Trellis
When we start with knowing today's weather and try to calculate probabilities of the weather of future days, we don't know what the weather will be in the future. Knowledge about the future is hidden, and we call those as hidden states.

So if we represent each state (sunny/cloudy/rainy) in a row, it would look like this:

Suppose the states were not hidden, and if we knew what the weather is on each day, the weather transitions could either be shown like this:

Or like this:

Representing states in this manner is called "Trellis" because it looks like a Trellis.

In a Hidden Markov Model, we don't know the states, so we represent all circles as empty circles and we add an additional row for the "final state". The final state is nothing special. It is just the weather of the last day that you are considering. It was not even necessary to add to the trellis diagram, but some silly person decided to complicate the concept and represent the final state as a separate row, to be able to clearly indicate that once the Markov Model transitions into the final state, it stops there and does not transition any more.

Since the final state could be either sunny, rainy or cloudy, I decided to show that row with all three colors.

The transition probability matrix would now have an additional row to show the final state:

So what this silly transition probability matrix now shows us is that the probability of transitioning from any weather today to the weather tomorrow is 0.2 or 0.1 or 0.3, but once you reach the final state, the probability of transitioning to any other state tomorrow is zero.

Why do we need to do all this?

In this simple example, we assume that a person could have a picnic if the weather is sunny or cloudy. If the weather is likely to be rainy, the person sits at home and reads a book.

Interestingly, suppose we observe that somebody went for a picnic on day 1, day 2 and day 3 and spent day 4 and day 5 reading a book, we would be able to use the trellis diagram to estimate which days were sunny, cloudy and rainy (the weather states are "hidden" and we need to estimate it).
To do that, you'll need another matrix called the "emission probability matrix".

Emission probabilities

The rows in the trellis represent sunny, cloudy and rainy respectively. Based on the state (weather) of any day, a person may decide to either go on a picnic or read a book. The decision is called an "emission" and is defined numerically by probabilities in an emission probability matrix.

An "emission" on a day just means the probability that one of the decisions will be taken. Either to read a book or to go on a picnic.

This is our emission probability matrix:

The rows in the matrix show the decision taken and the columns show the probability of taking those decisions for each day. If you were considering 10 days, then this matrix would have five more columns.

Just like the transition probability matrix, even the emission probability matrix is calculated beforehand by somebody who observes the weather and a person's decision to read a book or go on a picnic on multiple days and says that these are the probabilities for the Hidden Markov Model. Note that here too, the row values add up to 1.

Forward Trellis (Viterbi algorithm)

All that the Forward Trellis method says is that if you have a set of observations about the decisions a person took on the five days:
O = {picnic, picnic, picnic, read book, read book}

Using the Viterbi algorithm you can find out the probability of whether each of the days were sunny, cloudy or rainy. It shows you the shortest path through the trellis.

In the transition probability matrix and the emission probability matrix, the rows are represented by "i" and the columns by "j" in many textbooks.
So each probability value in the transition probability matrix would be represented as a_ij.
Since in the emission probability matrix the columns represent the days, I'll represent the columns with "k" instead of "j". The probability values in the emission probability matrix are represented as b_ik.

Now, given the observations O = {picnic, picnic, picnic, read book, read book}, if we want to know the probability that day2 is cloudy...

...we just have to use the Viterbi formula.
It is:

Probability of day2 being cloudy = Emission probability of picnic for day2 * ((probability day1 being sunny * transition probability of sunny to cloudy) + (probability of day1 being cloudy * transition probability of cloudy to cloudy) + (probability of day1 being rainy * transition probability of rainy to cloudy)).

Day1 in this case will begin with the sunny state having probability = 1 and the cloudy and rainy states will have probability = 0 for day1. We assume we know that the first day is sunny.
If you wish, you could also assume you don't know the weather for the first day, and assign initial probabilities for each state on day1.

To calculate probabilities for day3, you'd first have to calculate all probabilities of sunny, cloudy and rainy for day2.

The three converging gray arrows in the above diagram are just to show that we use the transition probabilities of sunny to cloudy, cloudy to cloudy and rainy to cloudy.

That's all there is to it. It's so simple, but people explain it in such contorted, complicated ways that it's hard to grasp and understand why things are being done like this and what purpose it serves. Luckily for you, NRecursions comes to your rescue again :-)

27 August 2018

Converting pgm files to jpg or png

Some software do not recognize the pgm image file format, even though it was created in the 1980's.

Luckily, Ubuntu has a pre-installed software named Imagemagick with which you can use this command to batch-convert all pgm files to png or jpg.

mogrify -format png *.pgm
or
mogrify -format jpg *.pgm

Try using it to convert from other formats too.

08 August 2018

Are you data privacy literate

I'm very surprised when I still meet highly educated Engineers and Doctors who still believe that the positions of the planets and stars have some effect on their daily luck and destiny. The result of their childhood curiosity and intelligence being channeled into believing and accepting unscientific information just because many others do so. Likewise, wasn't there a single person who could stand against sati and dowry for all those centuries? "Hey she's part of my family. How dare you suggest burning her!!!".

In today's times it's about standing against data misuse when society mocks you for doing so. Turns out there are still too many people who are insensitive to people's need for privacy.

If you are not paying for a product, YOU are the product

Few people realize the gravity of this.

Being a lifelong learner

A few years ago people were told that if they didn't know how to use a computer, they were as good as being illiterate. That school of thought has been upgraded. Today, you are illiterate if don't care/know how to protect your personal data.

I'm often asked why I don't use a certain messaging app, and get mocked by the "cool-crowd", for whom throwing caution to the wind is ok as long as everyone else is. Just like the enthusiastic users of radioactive toothpaste. If you love the messaging app so much, why don't you marry it? ;-)

I recently installed the messaging app, taking care to first deny all un-necessary permissions and then tried sending a message. Couldn't do so unless I allowed it access to data on my phone. That itself is a huge red-flag. An app doesn't need such details to send a message. By having these details, there are a lot of conclusions that can be made based on the messages and calls you make. Even with encryption, don't forget who has the key to decrypt the message. Apparently a lot can be deduced just from the metadata associated with the messages. Apparently messaging apps compromise privacy and there's a lot the government can do to monitor people, just the way corporations may.
There's also the un-ending stream of notifications. Every Tom, Dick and Harry has created messaging app groups which you are expected to join.

So what do I do with an intrusive, irritating third-party app that forces me into sharing personal details?
UNINSTALL.

Cartoon from Wumo

Oddly, free messaging apps are still popular in-spite of people telling their friends to stop using it. Free web-based email was invasive enough. Now they have access to even more intimate information about you through your smartphone.

Hope you've heard of how they switch on your phone's microphone without your permission to listen in, or misuse your phone for bitcoin mining. Turns out there are hundreds of such apps.

But my data is already out there...

Sure it is, but remember that data analytics works well only with lots of data. So you can stop putting more data out there right now. In fact, don't start today. Start yesterday!
Everything you type in a chat is potentially being analyzed. NLP can actually understand the grammar and associate it with context. Your contacts from various platforms are being integrated to know more about who you are and whom you interact with. A trainer from a networking company once told me that people's profiles get created by the ISP, based on their internet searches and activities, and any deviation from that pattern gets recorded as an anomaly. The history of your life, locations and activities are being recorded. The content in the files you upload are being analyzed. It's like standing on a rooftop and yelling out your intimate personal details to strangers. You don't have to.

Behind the scenes

You should know what goes on in such corporations. If the recent scandal of misusing data isn't enough, the god mode of app-based taxis is another. Internet searches will show you more. I wouldn't blindly believe privacy policies, since it's historically known that corporations can flout those rules or have cleverly written clauses that help them do whatever they want with your data while you give them your precious trust.

Any company (fraudsters too) can buy your personal data from any of the many data gathering companies and associate various intimate aspects of your personal life. For many of you, that sounds ok until you realize that someone you personally know can be working in such a company and looking at that information.

Don't believe me? Did you hear about Mr.Professional Stalker? Or how app-based taxi employees secretly stalked their ex'es and celebrities?
You only know about these because these made it to the news. There are a zillion other companies (and fraudsters) using your data and there can be people you personally know, who could be looking at a history of your activities you don't even remember, and using analytics to associate that data to make conclusions about you.

Crime

Yesterday I received a call from someone who had some details of my bank account and was asking me about transactions that I had not done. He calmly reassured me he was not asking for my password or pin details. I promptly reported this number to the bank and they confirmed it was a fraud call. For those of you who don't know, this is called Social Engineering. A method of questioning that makes you unsuspectingly give out personal data which you think is safe, but they can associate that data with more data they already gathered about you to commit fraud. If you still think this is too far fetched and will never happen to you, then you are very naive.

Few years back a girl looked at me like I'm an idiot when I casually told her why she shouldn't have posted her vacation details on Facebook. What harm could it do, she thought. https://m.youtube.com/watch?v=e0qrEnCbGIE

Now the rich and powerful are realizing that their own data is being misused, and are formulating laws to restrict it. Don't rejoice yet. These people would only formulate laws that protect themselves. You are still responsible for your own safety.

Why you should care about data privacy.
What kind of data about you is being sold
A chilling glimpse of how widely companies share user data
How much is your personal data worth. Should companies pay you for your data? Here's a simple Data Worth Calculator.

Yet, many of you still don't care. Just lazy eh? Too busy? Well, can't blame you really...in my phone, I was surprised at the plethora of options to restrict apps from accessing the internet and the huge number of settings I had to visit to turn off features that compromised my security/privacy. It does take a lot of time to do this. It's worth it though. Do it everywhere. Google account, Facebook, LinkedIn, Twitter, location services...everywhere. Don't give out your number at grocery stores or pizza stores. Don't use public WiFi unless you can ensure safety. Don't use public proxy servers or free VPN's. There's no free lunch.

Sensitivities need to adapt with changing times. You may be ok with your data being misused, but there are an increasing number of people who are not ok with it. Respect their need for privacy and security.

While you are busy telling people that you don't care about privacy, could you prove it by leaving the door of your house unlocked too?

01 August 2018

Causes of phone battery bloating / swelling

Lithium Ion batteries are known to function poorly when subjected to heat. It turns out that it's not just function that declines, but gases formed by electrolyte decomposition can cause the battery to bloat and potentially explode.

Few causes I know of are:

Using the phone for a phone-call while the phone is being charged.
Using the phone in a hot environment or subjecting it to heavy use (watching videos or gaming) while it is being charged.
Using mobile hotspot for long durations (more than half an hour) at a stretch.
Continuing to charge the mobile even after it reached 100% charge.

I'm surprised that many phones today have a moulded build with a non-removable battery. Although the battery can be replaced at a service center, the "non-removableness" means that if the battery gets bloated, it won't have any space to expand into, and it'll permanently damage the phone. The better option is to go for phones with removable batteries. Here, when the battery expands, it'll pop off the back cover and at least your phone's components won't get damaged. You just have to replace the battery instead of replacing the entire phone.

17 July 2018

Learning R through tutorials

R can be daunting initially, but you'll become comfortable with it in a month, compared to something like Matlab which takes longer to master. A good tutorial can speed up your comfort-level even more.

One good tutorial I found was this.

08 July 2018

Take a screenshot of a rectangular area of the screen in Ubuntu or Mac without any extra software

This was a niggling problem for so many years, and I didn't know the solution was right under my nose.

In Ubuntu, if you want to take a screenshot of the entire screen, use the PrtScn key on the keyboard. Everyone knows that.

To take a screenshot of the current active window, press Alt key and the PrtScn key together. Most people know that too.

What most people don't know is that you can take a screenshot of a rectangular selection of an area on your screen by pressing Shift and PrtScn together, and then using the mouse pointer to select an area on the screen.

Oh...how much time it'd have saved me if I knew this earlier!!!

On a Mac you can do the same using pressing the keys Command Shift 4 together.

17 June 2018

Privacy in the age of Augmented Reality

It is said that the stone age lasted 2.5 million years.
The agricultural age lasted a few millenia.
The industrial age lasted a few centuries.
The digital age lasted a few decades.

Now we are said to be at the brink of a new age: "The age of Augmented Reality".

We are already there. We have virtual reality headsets. We pick up our phone to internet-search for answers when we have questions.
More importantly, we are at the doorstep of mind-controlled devices. Microsoft has already applied for a patent. You could eventually even communicate with people or animals without having to speak a word.

Privacy
While it's fascinating to see what technology has to offer, there's also the element of loss of privacy. Governments and corporations got access to our homes via the internet. Now they have even more intimate information of us via our smartphones. With brain controlled devices, they are posed to have access to the very depths of our thoughts.

I had an SMS conversation with a friend about a cab service and in a couple of hours and on the next day I received SMS advertisements about that cab service. I have never received ads about that service before in my life. So now the harsh reality is that even if you avoid messaging apps, your mobile phone service providers are monitoring and monetizing on your SMS'es and perhaps your conversations too.

It was nothing short of shocking to hear that certain apps automatically turn on your phone's microphone to listen to your conversations or your TV usage!
Your smartphone camera can be used to spy on you.
Your webcam can be remotely activated to spy on you.
Even your Internet Service Provider knows a lot about you.

There's plenty more.

In-spite of knowing this, most people happily allow companies and governments to have access to their personal data. Because the services are free!
Given the way my data was used without my consent for sending me an SMS ad, I wonder if we would have privacy even if we paid for using GMail, Facebook, WhatsApp etc.

Are our thoughts going to remain private?
I'm very sure that companies designing mind control apps and devices will also be devising an elaborate strategy to convince people to happily give them access to their minds and thoughts. In the same care-free way we do with free email and messaging services.

People born before the 1950's used to envy us when they saw how fascinating the digital age was. They weren't able to adapt quickly enough to use it, but they wished they could. They told us how lucky we were to be born into the "jackpot generation" which is witnessing a phenomenal change in technology.
Some people don't envy us though. They envy our grandparents, saying that maybe they lived in a world with lesser access to medical facilities, information, travel and a lot of other things we enjoy. Even then, they lived happier lives where life moved at a more comfortable pace. Food, air and water were purer. Privacy was something they had a lot more control over.

There may come a time though, when people will ask if privacy is really so necessary? A time when information and thought will be so pervasive that it'll become a lot easier to trust someone because you already know everything about them. This would remove barriers of communication and basically the entire world, not just humans, but even animals and perhaps trees and plants would be able to function as a single cooperative entity, sharing knowledge and purpose. Perhaps leading to a time when we finally answer the ultimate question of why we exist in this universe!

22 May 2018

Sick or sleepy? You ruin it for everyone by forcing these kids/employees to push themselves.

Sickness

Put simply, would you prefer that one sick employee or student remains away from everyone else for the duration of their contagious illness or would you prefer that more people to get infected and productivity plummets?

In a competitive world, it's sad how people push themselves (or are pushed) to show up for work even when sick. You don't have to. Even with a chronic illness or a family member's illness, there are ways to manage it. Businesses always have a contingency plan for employee absence. It's their job to arrange for that, and not something that you have to worry about when sick.

What you should do:
Even if your illness is not contagious, stay at home and get rest. A company will never sacrifice their profitability for you, so you have no obligation to sacrifice your health for them either. I've seen this happen with multiple people. Sacrificing health for work or schooling is never worth it.

Sleep

Again, put simply: Would you prefer that people remain productive throughout the day by taking naps in-between or that they remain productive for half a day, making mistakes (which will cost you a lot in re-work) and struggling to remain awake and focus?

When you feel sleepy, you aren't supposed drink tea/coffee to shake off the feeling. You are supposed to sleep.
There's some kind of a 'laziness-related' social stigma attached to sleeping in class or in office. It's high-time we got rid of this sleep depriving culture.
Human beings are not meant to sleep only at night. We naturally have either Biphasic sleep or Polyphasic sleep. This means that when you feel like sleeping, it's perfectly ok to sleep or take a nap. You don't have to apologize to anybody. If you want to be super-human and avoid sleep, you need to take a serious look at how sleep deprivation can wreck havoc on your body.
I had a teacher who scolded students for yawning in her class. I have another teacher who feels insecure when students feel sleepy in his class. Both of them didn't seem to observe that it wasn't the fault of their teaching. All they had to do was allow the students to take a 10 minute nap.

What you should do:
If you are at a school or college, the authorities should either specifically allow for a nap break during afternoon classes, where students can rest their head on the table and take a short nap, or arrange for monitored sleep boxes.
If you work at a company, convince your bosses on how a short nap can help boost productivity without negatively affecting morale. Don't worry. There's plenty of evidence about this already. Companies can definitely introduce sleep boxes.

Sleep deprivation begins early in life and continues unless you stop it:

Waking through the night when the kid is born.
Kids sleep cycles and duration being ruined when woken up for school.
The horrible sleep-depriving culture of "burning the midnight oil" or waking up early during exams.
Eating improperly cooked or burnt food, leading to sleep loss.
Having to wake up early to go to office.
Staying up to complete a project (which never gets complete).

In an ideal world, one should be allowed to sleep and wake up as per their natural sleep cycle rather than society's insensitive work cycle. When one is sick, one should be able to take rest instead of worrying about losing out in the rat race. Advice like this can only show you the way. You are the one who decides to walk the path.

You won't believe it until you measure it

If you feel you are getting enough sleep, write down how many hours of sleep you think you are getting. Then, keep a piece of paper and a pen by your bedside. Every night and morning, note down the approximate time you woke and slept.
Have two columns.

Column 1: Number of hours of un-interrupted sleep.
Column 2: Total sleep = un-interrupted sleep + remaining sleep duration in hours.

Interrupted sleep happens when you are woken by noise at night, need to go to the toilet at midnight, are woken by discomfort in your stomach after having eaten bad food etc.

Do this experiment at least for a week or two.
I assure you; you'll be startled at how less sleep you are getting. Do let me know what your results were and I'll tell you how to improve.