Thursday, May 23, 2013

Optimistic Locking in Neo4j

Optimistic locking is a technique commonly employed with relational databases to control concurrent access to data. It is common for user interactions to span over multiple system transactions (web and database) and rather than locking the data, which might impact performance, optimistic locking allows to detect write conflicts to ensure that the system stays consistent. For this to be efficient, the possibility of conflicts over the same data should be fairly low.

Optimistic locking is typically implemented using a timestamp or a version number to determine when committing updates to the database whether the version held in the database has been modified by a different transaction. If there is a conflict, the timestamps - or version numbers - will differ, in which case the application can decide to abort the transaction or to the conflicting versions to the user to resolve the conflict, for example.

This pattern has been documented in a number of places.

There is nothing that stops us from applying the same technique on graph resources using Cypher if an application can benefit from optimistic locking.

We need first to add a timestamp when we create a new node. This is easily done using Cypher's built-in timestamp() function

When the node needs to be updated, we can add a condition on the "updated" property to the query to ensure that we are not overriding a different version. Assuming the "updated" property was initially set to 1369167566868, the following request will update the node only if the timestamp matches.

The application logic would need a way to find out whether the query resulted in an update or not. Therefore the previous query returns the count of the updated nodes, which we can safely expect to be 1 in this case. Executing the same statement again will result in no update and 0 will be returned.
Alternatively, we could have returned the node itself, true/false or anything else that fulfills the same goal.

Good, but the wonderful world of graphs is also populated with relationships. In a similar way, we can track modifications on relationships using timestamps. However, it is legitimate to wonder at this point whether we could extend optimistic locking to nodes and relationships at the same time since both are first-class citizens of the graph.

Let's create another node to keep Bob company.

With some imagination we can write something like the following query to create a relationship between the two nodes only if neither have been touched, and to set a new timestamp on the three entities if the update succeeds.

Optimistic locking and transactions

In all the previous examples, we relied on the fact that the queries wouldn't result in any side-effect if the update wasn't successful. The situation is different if we need to run multiple queries in such cases as bulk updates or if we want to return the conflicting version of the data when an update fails.

In such cases we can use the newly added transaction endpoint to control the transactional behaviour. Invoking the endpoint (http://localhost:7474/db/data/transaction) will create a new transaction and execute the statements within its context.

If the update is successful - and assuming the transaction id is 3 - we can commit the transaction by POSTing to http://localhost:7474/db/data/transaction/3/commit. Otherwise, the transaction can be rolled back by sending DELETE to http://localhost:7474/db/data/transaction/3/
Please read more about the new transactional http endpoint to fully understand its behaviour.

Thanks for my friend and colleague @oakinger for the initial question and idea!