Wednesday, May 4, 2011

Redis Pipelines and Transactions


Redis is a fantastic NoSql database. The main reason I really like Redis is that it allows you to do very  powerful things yet the data model is simple and intuitive. All this is backed by remarkable ease of use and solid performance.
Redis supports two nice features, pipelines and transactions, that have a direct impact on the way user commands are handled and on performance. While the two features are distinct, they do share some similarities and can actually be combined together.
In this blog post I would like to shed some light on how these features can be used separately and jointly, and to examine the potential impact each use case could have. I will be using Jedis as a client but first, the usual disclaimer that goes with this kind of post: the examples are intentionally simple and far from being scientifically accurate. You are welcome to experiment and adapt.


Pipelines

The usual way a Redis client interacts with the server is to issue a command, wait till it gets executed and then read the returned response. This is kind of obvious but when the number of commands grows, performance can take a hit.
One way of improving this is by using pipelining, in which case you send your commands to the server in sequence and read the combined responses at the end in one block, when the pipeline is "closed".
Pipelining is a common technique in programming and it works particularly well for Redis. Of course you can not pipeline commands that share any kind of conditional dependency but if your unit of work is composed of several commands that can be run sequentially, you should consider pipelining them.
Let's take a look at our first example. The following snippet sets 100000 keys/values without using any pipelining. In all the following examples, the jedis variable refers to a Jedis client created before each test and disconnected afterwards.


On my machine this test averages around 5.75 seconds. Nothing surprising. Now let's use a pipeline to do the same thing.


The average is around 0.7 seconds this time, which is a huge improvement over the previous example for the same result.
There are a couple of things to notice here. Every call to pipeline.set() effectively sends the SET command to Redis (you can easily see this by setting a breakpoint inside the loop and querying Redis with redis-cli). The call to pipeline.execute() is when the reading of all the pending responses happens. A pipeline that issues a huge number of commands might cause an out of memory error or a timeout exception at this point despite the fact that the data has safely attained the database. So be careful and think about what this implies for your particular use case.

Essentially, pipelining is a client-side operation supported by the ability of the Redis protocol to handle bulk responses. Remember that pipelining does not imply that the commands are queued on the client side before being sent - only reading responses is delayed and handled in bulk at the end. Pipelining does not guarantee that the commands are atomic either; that's the role of transactions.

Transactions

Redis support transactions as a way of executing a series of commands atomically. Transactions guarantee a number of good things but essentially, a transaction is initiated by a client call of the MULTI command. All the commands issued subsequently by the same client are queued on the server till the transaction gets executed or discarded. For every queued command, the server returns QUEUED as a result right away. Once a transaction is executed by a call to the EXEC command, the server will execute all the queued command sequentially and atomically.
One interesting property of Redis transactions is that the server will go on executing all of the commands without interruption even if one (or several) commands fails. At the end of a transaction the server returns the list of results to the client. The following example executes the SET command within a transaction.


This example averages around 5.7 seconds, which is more or less the same as for the first plain SETs example. While this might seem surprising at first, it really isn't. Redis transactions are not as heavyweight as what one might intuitively expect when thinking about database transactions. Remember that every command issued within a transaction is simply queued on the server. On the other hand, a transaction containing a huge number of commands can put a strain on the memory available for the server. Also, the same remark on reading the result of a pipeline applies here.

Combining Pipelines and Transactions

So there are some similarities between pipelines and transactions: both operations are initiated by the client to handle a series of interrelated commands. However, transactions offer stronger guarantees and are managed mainly on the server. It makes sense then to combine the two features to enjoy their combined benefits. The next example does just that.


This time, the average is around 0.8 second, which shows that pipelining the transaction in our example turned out to be a good idea. The combined gain would be even greater if the server is accessed by multiple concurrent clients. Only caveat: the size of the returned result list is doubled this time because of the combined queuing and execution results.

It is legitimate to ask at this point whether pipelining transactions is always a good idea. After all, we can not assume that any typical operation consists of at least 100000 commands. So how do pipelined transactions behave for significantly smaller command sequences? Are we always better off pipelining transactions? Let's write a small test for that. The following code performs plain transactions and followed by pipelined transactions for a growing sequence size.


Granted, this test is wildly inaccurate. However, when executed, the result shows a certain tendency that starting from a reasonably small command sequence (5 to 10), pipelined transactions outperform plain ones. In most cases, pipelining transaction pays off since a typical transaction is composed from at least 4 commands (MULTI, a couple of commands followed by EXEC or DISCARD). Have fun and experiment!