11 November 2018

Which is better? Octave or Matlab?

I'd say choose Matlab everytime if you can. It's much faster than Octave.

Speed
A Sudoku program I developed used to take two minutes to run 2000 iterations in Matlab.
The same program took 13 minutes to run 500 iterations in Octave.

The reason apparently, is because Matlab uses Intel's Math Kernel Library (MKL) internally, and I have personally seen the significant speed boost MKL gives, when I tried it for some of my C++ code I developed long ago.

GUI
Apart from this, there's the Matlab GUI which is far better than Octave in terms of customizability and plain-old user friendliness.



Granted, that Octave is being built by the open source community, but the community should have seriously considered the aspect of processing speed before starting the project. Even now it isn't impossible to make the switch to MKL. The speed changes I had seen in my C++ program were after I replaced the old code with function calls to MKL and it worked.

Good websites to download masters or PhD thesis / dissertation

To find a good university website or any other website where theses have been published and is available for free, your best bet is to first try using a search engine.

Eg: If you are looking for swarm intelligence using image segmentation, search with the terms:
"partial fulfillment" swarm image segmentation masters thesis pdf


Some other websites that can help are:

08 November 2018

Aha!


Continued from the previous Aha!



Pachcha Malayalam
Share with this link





To be continued...


07 November 2018

Can Initiative Q be trusted?



As a personal opinion, I'd say no.

In general, any scheme that seems too good to be true, probably is. Especially when money is involved. Initiative Q appears to have many of the lucrative parameters that all scams have.

  • The offer of free money (every sane person knows that you should stay away from this)
  • The "by invite only" exclusivity that gets people salivating (GMail had used the tactic when it started)
  • The offer of more imaginary money if you invite more people.
  • The ever-decreasing amount of imaginary free money, pushing you to throw caution to the wind and join early. A classic.
  • The non-constancy of offered free money. On the website it shows $25000 is available and decreasing, but if you click a link invite, the amount goes up to $33000.
  • Most importantly, the fact that you can't properly verify the credentials of the people who setup this scheme.
  • The fact that they "trust" you to approve new joinees but still offer you extra imaginary money if you approve them (even though they know that you may not personally know the new joinees), showing that they don't really care if the new users are genuine.
Precautions:
If you want to try it out for the fun of it, I'd recommend you create a new email ID under a fake name and use an Initiative Q password that does not resemble any of the passwords you'd normally use. Definitely do not join it using your Facebook or Google account. Also, everytime you've finished your visit to their website, clear your browser cache and cookies just to be safe.

Discussions on Reddit point to this scheme either being a way to harvest email ID's and passwords for marketing or to obtain people's password patterns or just a social experiment to check how many people are gullible enough to fall for a scam.

To be fair to Initiative Q, there are proponents of it who say it is not a scam and that it has the potential to be a new currency if there is a significant number of people who back it, but they also say it won't make you super-rich.
Only time will tell.


05 November 2018

A simple tutorial showing some basic PGMPy program code and explanations

Installation

Firstly, it's recommended you have the latest version of Python3 installed.
Python3 uses pip3 to install packages that you'll be importing into your programs, so ensure pip3 is installed too.
Install git.
It's my personal preference to not use Anaconda for installations. If you do use Anaconda (also know miniconda exists), do try ensuring that whatever Python packages you install using Anaconda are not also installed using pip3, as there can be version conflicts.

Now install a bunch of peripheral packages (Not all of them are required. If you want to install just the bare-minimum, see the documentation here):
sudo pip3 install sphinx nose
sudo pip3 install networkx
sudo pip3 install numpy
sudo pip3 install cython
sudo pip3 install pandas
sudo pip3 install setuptools
sudo pip3 install IPython
sudo pip3 install matplotlib
sudo pip3 install pylab
sudo pip3 install python-tk
sudo pip3 install gensim
sudo pip3 install spacy

Now clone PGMPy and install:
git clone https://github.com/pgmpy/pgmpy 
git checkout dev
sudo python3 setup.py install


If you are wondering which IDE to use for Python, I've put up my own little review here. I prefer LiClipse because it supports refactoring and autocomplete reasonably well.


Tutorial

Try creating a basic Bayesian network like this:




from pgmpy.models import BayesianModel
import networkx as nx
import matplotlib.pyplot as plt

# Defining the model structure. We can define the network by just passing a list of edges.
model = BayesianModel([('D', 'G'), ('I', 'G')])
nx.draw(model, with_labels = True); 

plt.show()


If you encounter an error like this:
"AttributeError: module 'matplotlib.pyplot' has no attribute 'ishold'",
See this issue for the solution.

To add conditional probabilities to each of those nodes, you can do this:

from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
#import networkx as nx
#import matplotlib.pyplot as plt

# Defining the model structure. We can define the network by just passing a list of edges.
model = BayesianModel([('D', 'G'), ('I', 'G')])
#nx.draw(model, with_labels = True); plt.show()

# Defining individual CPDs.
cpd_d = TabularCPD(variable='D', variable_card=2, values=[[0.6, 0.4]])
cpd_i = TabularCPD(variable='I', variable_card=2, values=[[0.7, 0.3]])
cpd_g = TabularCPD(variable='G', variable_card=3,
                   values=[[0.3, 0.05, 0.9,  0.5],
                           [0.4, 0.25, 0.08, 0.3],
                           [0.3, 0.7,  0.02, 0.2]],
                  evidence=['I', 'D'],
                  evidence_card=[2, 2])
# Associating the CPDs with the network
model.add_cpds(cpd_d, cpd_i, cpd_g)
if model.check_model():
    print("Your network structure and CPD's are correctly defined. The probabilities in the columns sum to 1. Good job!")

print("Showing all the CPD's one by one")
for i in model.get_cpds():
    print(i)
print("You can also access them like this:")
c = model.get_cpds()
print(c[0])
print(model.get_cpds('G'))
print("Number of values G can take on. The cardinality of G is:")
print(model.get_cardinality('G'))



Output CPD tables:



Here, the "variable_card" does not mean anything about a "card". It speaks about the cardinality of the variable. Same with evidence_card.
Have a look at the CPD tables and you'll see that variable cardinality for G is 3 because you want to specify three types of states for G. There's G_0, G_1 and G_2. For the variable cardinality and evidence cardinality, I feel the creators of PGMPy could've programmed it to automatically detect the number of rows instead of expecting us to specify the cardinality.

So variable cardinality will specify the number of rows of a CPD table and evidence cardinality will specify the columns. Once you've specified the CPD's, you can add it to the network and you are ready to start doing inferences.


Let's try a slightly larger network




from pgmpy.models import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination
import networkx as nx
import matplotlib.pyplot as plt

# Defining the model structure. We can define the network by just passing a list of edges.
model = BayesianModel([('D', 'G'), ('I', 'G'), ('G', 'L'), ('I', 'S')])

# Defining individual CPDs.
cpd_d = TabularCPD(variable='D', variable_card=2, values=[[0.6, 0.4]])
cpd_i = TabularCPD(variable='I', variable_card=2, values=[[0.7, 0.3]])
cpd_g = TabularCPD(variable='G', variable_card=3,
                   values=[[0.3, 0.05, 0.9,  0.5],
                           [0.4, 0.25, 0.08, 0.3],
                           [0.3, 0.7,  0.02, 0.2]],
                  evidence=['I', 'D'],
                  evidence_card=[2, 2])

cpd_l = TabularCPD(variable='L', variable_card=2,
                   values=[[0.1, 0.4, 0.99],
                           [0.9, 0.6, 0.01]],
                   evidence=['G'],
                   evidence_card=[3])

cpd_s = TabularCPD(variable='S', variable_card=2,
                   values=[[0.95, 0.2],
                           [0.05, 0.8]],
                   evidence=['I'],
                   evidence_card=[2])

# Associating the CPDs with the network
model.add_cpds(cpd_d, cpd_i, cpd_g, cpd_l, cpd_s)
# Getting the local independencies of a variable.
print("Local independencies of G:")
print(model.local_independencies('G'))
# Getting all the local independencies in the network
print("Local independencies of other nodes:")
model.local_independencies(['D', 'I', 'S', 'G', 'L'])
# Active trail: For any two variables A and B in a network if any change in A influences the values of B then we say that there is an active trail between A and B.
# In pgmpy active_trail_nodes gives a set of nodes which are affected by any change in the node passed in the argument.
print("Active trail for D:")
print(model.active_trail_nodes('D'))
print("Active trail for D when G is observed:")
print(model.active_trail_nodes('D', observed='G'))

infer = VariableElimination(model)
print('Variable Elimination:')
print(infer.query(['G']) ['G'])
print(infer.query(['G'], evidence={'D': 0, 'I': 1}) ['G'])
print(infer.map_query(['G']))
print(infer.map_query(['G'], evidence={'D': 0, 'I': 1}))
print(infer.map_query(['G'], evidence={'D': 1, 'I': 0, 'L': 0, 'S': 0}))

nx.draw(model, with_labels = True); plt.show()



What is going on in the code:

The variable elimination queries are pretty-much self-explanatory, where you are querying for the state of a node, given that some other nodes are in certain other states (the evidence or observed variables). Inference is done via standard variable elimination or via a MAP query. PGMPy also allows you to do variable elimination by specifying the order in which you want to eliminate variables. There's more info about this in their documentation.
What this line infer.query(['G']) ['G'] does is, it returns a Python dict (which is like a multimap). The output looks like this:

{'G': 0}

Here, 'G' is the key of the dict, and to access the value associated with 'G' (which is zero in this case), you just have to use ['G']. So it is the equivalent of doing:

q = infer.query(['G']) 
print(q['G'])



Why this tutorial 
 
For anyone new to Python or PGMPy, a lot of this syntax looks very confusing, and the documentation does not explain it deeply enough either. The objective of this tutorial was to clear up those basic doubts so that you could navigate the rest of the library on your own. Hope it helped.

PGMPy is created by Indians, and is quite a good library for Probabilistic Graphical models in Python. You'll also find libraries for Java, C++, R, Matlab etcetera. If you want to manually try out your network model, there is an excellent tool called SAMIAM.

More documentation for PGMPy is here, and if you want their e-book, an internet search will lead you to it.