Wednesday, June 11, 2008

MySQL and PostgreSQL performance comparison (using TPC-b based benchmark).

One of these recent days I was reading some feeds from MySQL planet and found this post comparing postgresql and mysql. Well, I found the post interesting, so I decided to do a little benchmarking of my own also, but this time with a workload based on the TPC-b benchmark. You can read more about this benchmark here. Basically, it is a stress test of the database back-end. From the description on the website: "TPC Benchmark B is designed to be a stress test on the core portion of a database system." Full specification is available here. In a nutshell, it is comprised of a single type of transaction that performs selects, updates and inserts on four distinct tables in the system: accounts, tellers, branches and history. The transaction profile is the following:

UPDATE accounts
SET Abalance = Abalance + :delta
WHERE Aid = :Aid;

SELECT Abalance
INTO :Abalance FROM accounts
WHERE Aid = :Aid;

UPDATE tellers
SET Tbalance = Tbalance + :delta
WHERE Tid = :Tid;

UPDATE branches
SET Bbalance = Bbalance + :delta
WHERE Bid = :Bid;

INSERT INTO history(Tid, Bid, Aid, delta, time)
VALUES (:Tid, :Bid, :Aid, :delta, CURRENT);

The runs conducted varied the number of costumers: 1,8,16,32,64. These represented the number of simultaneous clients that were actually emulated.

I configured PostgreSQL (8.3.1), using postgresql.conf, according to the following options:

  • max_connections = 100
  • shared_buffers = 200MB
  • effective_cache_size = 2GB

As for MySQL (5.1.24-rc) I used the transactional storage engine InnoDb. Then I configured the engine according to the following options, in my.cnf:

  • innodb_buffer_pool_size = 500M
  • innodb_flush_log_at_trx_commit=1
  • innodb_thread_concurrency=0
  • thread_cache_size=25

With respect to the hardware, I used two nodes, one for executing the emulated clients (node A) and another to holding the databases (node B). The following table details the environment.

Node ANode B
CPUDual AMD Opteron(tm) at 1.5GHz
Memory4 Gb3 Gb
StorageOne 55 Gb dedicated volume for each node (RAID5).
NetworkEthernet: 1 Gbps
Operating SystemUbuntu 7.10

Both of the machines were dedicated, meaning that no other job was running concurrently with the benchmark. As for the network I am unable to claim the same, as the physical network was shared among people here in the lab accessing other servers. The benchmark implementation was developed within the context of the GORDA project and integrated into Bristlecone benchmarking framework, from Continuent.

So, taking into account the aforementioned settings, I performed several runs in which the difference among them was the number of concurrent connections opened to the database. From the runs, I have measured latency (ms - milliseconds) and throughput (TPM - transactions per minute). The samples from each of the runs were taken by droping the initial two minutes (avoiding the ramp up effects). Additionally, the outliers samples were removed by taking only the 98th percentile of set of samples. The results are depicted in the following two figures.

From these two figures, one may immediately conclude that the system is overloaded even when there are only eight clients in the system. This is predictable, since TPC-b does not add think time on transaction submission, hence the system is in overload. As a consequence the latency increases, due to queuing and the throughput remains stable. One interesting conclusion is that MySQL outperforms completely PostgreSQL in throughput. Since TPC-b is an update, stress test it seems reasonable to consider that MySQL is more suitable for workloads that characterized by update peaks.

What I would like to do next is to compare both of these DBMS using workload with more read operations (TPC-w?), and also compare them using some TPC-c based benchmark for modeling a more complex OLTP setting, and not a stressing it as TPC-b does. When I have the time, I will do this and let you know about the results.