Monday, March 24, 2014

Implementing Graph-Based Applications

Graphs have proven to be widely applicable to model a range of business problems and domains. Yet, the flexibility that graphs bring requires an additional level of attention to implementation and an adaptation of familiar programming idioms to increase the benefits while avoiding common pitfalls.

Wednesday, December 4, 2013

The Warehouse and the Shop Floor: Separation of Concerns Based on Data Flow

Today, a cornucopia of NoSQL and Big Data technologies is available to us, each exposing a particular data model and implementing a unique set of features. These different offerings are capable of modeling a diversity of domains and addressing wide-ranging concerns, from scalability to evolvability of the data model. However, when creating a new system or extending an existing one, choosing the right tools for the job can be surprisingly hard. A number of problems arise:

Monday, August 26, 2013

Bad Data Handbook Review

Bad Data Handbook from O'Reilly is a collection of essays and articles by different authors having as common theme data, or “bad”  data to be precise. The “badness” of the data in this case is more of a perceived quality, rather than an inherent one. Arguably, data can be surprising, unpredictable, defective or deficient but rarely thoroughly bad.

The different chapters are generally well written and they can be read in any order. The book contains a wide range of interesting situations, from machine learning war stories, to data quality issues, to modelling and processing concerns. To be clear, this book is not a programming guide but it is full of practical advice and recommendations.

Thursday, August 22, 2013

Designing Graph-Based Applications

Building graph-based applications is understandably different from developing against relational databases, or from other non-relational data models, such as document or column family stores. The graph model is unique with its ability to accommodate highly connected, partially structured datasets that can evolve over time in terms of complexity and structure. Additionally, graphs are naturally capable of providing a wide range of ad-hoc queries on top of such datasets.
To fully harness the power of graphs, it is germane to reexamine traditional design and implementation practices and to consider the impact the specific nature of graphs can have on them. In the common context of object-oriented languages and multi-tier architecture, some of the intriguing questions are about how to design data access and business logic to handle graph data effectively. For instance, should an object mapping framework be used or should we try to stick to a graph representation as long as possible?

Thursday, May 23, 2013

Optimistic Locking in Neo4j

Optimistic locking is a technique commonly employed with relational databases to control concurrent access to data. It is common for user interactions to span over multiple system transactions (web and database) and rather than locking the data, which might impact performance, optimistic locking allows to detect write conflicts to ensure that the system stays consistent. For this to be efficient, the possibility of conflicts over the same data should be fairly low.

Thursday, May 2, 2013

Labels and Schema Indexes in Neo4j

Neo4j recently introduced the concept of labels and their sidekick, schema indexes. Labels are a way of attaching one or more simple types to nodes (and relationships), while schema indexes allow to automatically index labelled nodes by one or more of their properties. Those indexes are then implicitly used by Cypher as secondary indexes and to infer the starting point(s) of a query.

I would like to shed some light in this blog post on how these new constructs work together. Some details will be inevitably specific to the current version of Neo4j and might change in the future but I still think it’s an interesting exercise.

Wednesday, May 4, 2011

Redis Pipelines and Transactions

Redis is a fantastic NoSql database. The main reason I really like Redis is that it allows you to do very  powerful things yet the data model is simple and intuitive. All this is backed by remarkable ease of use and solid performance.
Redis supports two nice features, pipelines and transactions, that have a direct impact on the way user commands are handled and on performance. While the two features are distinct, they do share some similarities and can actually be combined together.
In this blog post I would like to shed some light on how these features can be used separately and jointly, and to examine the potential impact each use case could have. I will be using Jedis as a client but first, the usual disclaimer that goes with this kind of post: the examples are intentionally simple and far from being scientifically accurate. You are welcome to experiment and adapt.

Thursday, October 15, 2009

Spring, Web Services and Functional Testing

Functional tests are particularly interesting for web services as they allow to test a fully integrated web service stack from the outside, thus giving the same point of view of the clients. In a sense, what really matters in a web service is its interface rather than its implementation, and functional testing allows to focus on that.

In this post I'd like to present an automated way to implement functional tests for your web services using Spring. The approach relies mainly on Spring TestContext Framework, an embedded Jetty instance and Spring Web Services (you saw it coming, didn't you?).
The outline is as follows: functional tests are implemented using Spring TestContext and JUnit 4 (or any of the other supported testing frameworks). In the ApplicationContext that Spring TestContext creates for the test, we include an embedded Jetty instance that loads and runs the target web service application. The actual test consists simply of invoking that web service through a WebServiceTemplate (Spring-WS) and validating the response with XMLUnit. Finally, we run our tests using Maven (mvn test, simply).