Wednesday, December 4, 2013

The Warehouse and the Shop Floor: Separation of Concerns Based on Data Flow

Today, a cornucopia of NoSQL and Big Data technologies is available to us, each exposing a particular data model and implementing a unique set of features. These different offerings are capable of modeling a diversity of domains and addressing wide-ranging concerns, from scalability to evolvability of the data model. However, when creating a new system or extending an existing one, choosing the right tools for the job can be surprisingly hard. A number of problems arise:

Monday, August 26, 2013

Bad Data Handbook Review

Bad Data Handbook from O'Reilly is a collection of essays and articles by different authors having as common theme data, or “bad”  data to be precise. The “badness” of the data in this case is more of a perceived quality, rather than an inherent one. Arguably, data can be surprising, unpredictable, defective or deficient but rarely thoroughly bad.

The different chapters are generally well written and they can be read in any order. The book contains a wide range of interesting situations, from machine learning war stories, to data quality issues, to modelling and processing concerns. To be clear, this book is not a programming guide but it is full of practical advice and recommendations.

Thursday, August 22, 2013

Designing Graph-Based Applications

Building graph-based applications is understandably different from developing against relational databases, or from other non-relational data models, such as document or column family stores. The graph model is unique with its ability to accommodate highly connected, partially structured datasets that can evolve over time in terms of complexity and structure. Additionally, graphs are naturally capable of providing a wide range of ad-hoc queries on top of such datasets.
To fully harness the power of graphs, it is germane to reexamine traditional design and implementation practices and to consider the impact the specific nature of graphs can have on them. In the common context of object-oriented languages and multi-tier architecture, some of the intriguing questions are about how to design data access and business logic to handle graph data effectively. For instance, should an object mapping framework be used or should we try to stick to a graph representation as long as possible?

Thursday, May 23, 2013

Optimistic Locking in Neo4j

Optimistic locking is a technique commonly employed with relational databases to control concurrent access to data. It is common for user interactions to span over multiple system transactions (web and database) and rather than locking the data, which might impact performance, optimistic locking allows to detect write conflicts to ensure that the system stays consistent. For this to be efficient, the possibility of conflicts over the same data should be fairly low.

Thursday, May 2, 2013

Labels and Schema Indexes in Neo4j

Neo4j recently introduced the concept of labels and their sidekick, schema indexes. Labels are a way of attaching one or more simple types to nodes (and relationships), while schema indexes allow to automatically index labelled nodes by one or more of their properties. Those indexes are then implicitly used by Cypher as secondary indexes and to infer the starting point(s) of a query.

I would like to shed some light in this blog post on how these new constructs work together. Some details will be inevitably specific to the current version of Neo4j and might change in the future but I still think it’s an interesting exercise.