27 February 2017

12 February 2017

How to speed up MySQL inserts

One would assume that if an application had two threads, and while one thread inserted data into one table and the other thread inserted data into another table, the inserts would be faster than if both threads inserted into the same table? No. It isn't faster.
The various threads that MySQL has, are to manage the other different operations.

There however is a technique that can improve insert speed further.

Smart programmers already know that using a batch insert is more efficient and faster than calling INSERT multiple times from their program.

What you perhaps do not know, is that after you send the batch of insert statements to MySQL, they are executed internally as separate inserts.

INSERT INTO Table1 VALUES (1,2,3) 
INSERT INTO Table1 VALUES (4,5,6)

What if you could combine it into a single insert statement like this:

INSERT INTO X VALUES (1,2,3),(4,5,6)

It'll actually execute faster! Some people report a 2x increase in write speed. Others, report a 40x increase.


How to do it? 

Just set the rewriteBatchedStatements=true value in the connection string.

So if you normally connect to MySQL like this...

Connection conn = DriverManager.getConnection("jdbc:mysql://"+ domainName +"/"+configRef.getBallastDatabaseNameOfSQL()+"?" + "user=" + username + "&password=" + password);

...you just have to change it to:

Connection conn = DriverManager.getConnection("jdbc:mysql://"+ domainName +"/"+configRef.getBallastDatabaseNameOfSQL()+"?" + "user=" + username + "&password=" + password + "&rewriteBatchedStatements=true");


In my opinion, MySQL should have kept this as default behaviour and the user should have had to disable it only if they were doing very large batch inserts.

Do try this out even if you don't execute too many inserts. In applications dealing with a lot of data though, this option can come as a life saver! You save a humongous amount of time with these super fast inserts.

Avoiding memory problems when working with huge amounts of data

There would be times when you run applications with libraries like Apache Storm or similar programs, and you'd be surprised when these applications crash and restart with no apparent reason.

The reason in many cases (as you'll infer from the error logs) is that the application ran out of memory. You just didn't have enough RAM.

Why this happens:
If you specify (using -Xmx2g) that your application can use a maximum of 2GB memory, you have to remember that there are other applications running that might also be free to use that much memory.

Let's say you have 4GB RAM.
App1 takes up 2GB.
App2 takes up 2GB.

App1 deletes some large datastructures that it's using, before creating a new datastructure.

The request for creation of a new datastructure makes the Garbage Collector try to allocate more memory. Sometimes, operations happen so quickly that the GC has no time to remove the unused memory before allocating new memory.

Since you've run out of RAM space, the OS would typically allow your application to use virtual memory. To do that, you need a swap space. If you don't have one, your application might crash with the message that there was insufficient memory. Your application might not even catch this exception and yet crash.

In many managed servers like AWS, the nodes don't have a default swap space.  You have to create one. It's very simple and completely worth it.

How much swap space to create?
Go for double the size of your existing RAM. I know that sounds like a lot, but believe me; I've seen how a system with 4GB RAM has used more than 4GB of swap space when I had allocated it 10GB of swap space. Hardware is cheap, and we now have disks with plenty of space, so create large swap partitions/files.

How to create the swap space?
First check if it's already there with

free -m

and check for how much disk space is available with

df -h

On an Ubuntu 14.04 system, just run these commands:

sudo fallocate -l 6G /swapfile (the "6G" means 6GB of disk space)

sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

To make the swap file permanent:

sudo vi /etc/fstab

and at the bottom of the fstab file, add this line (press tab instead of a space wherever you see a space in the line below):
/swapfile   none    swap    sw    0   0

If you like tweaking things more, you can also tweak the swappiness setting which determines how often the OS will transfer data from the RAM to swap space.