Monday, August 26, 2013

Bad Data Handbook Review

Bad Data Handbook from O'Reilly is a collection of essays and articles by different authors having as common theme data, or “bad”  data to be precise. The “badness” of the data in this case is more of a perceived quality, rather than an inherent one. Arguably, data can be surprising, unpredictable, defective or deficient but rarely thoroughly bad.

The different chapters are generally well written and they can be read in any order. The book contains a wide range of interesting situations, from machine learning war stories, to data quality issues, to modelling and processing concerns. To be clear, this book is not a programming guide but it is full of practical advice and recommendations.

Thursday, August 22, 2013

Designing Graph-Based Applications

Building graph-based applications is understandably different from developing against relational databases, or from other non-relational data models, such as document or column family stores. The graph model is unique with its ability to accommodate highly connected, partially structured datasets that can evolve over time in terms of complexity and structure. Additionally, graphs are naturally capable of providing a wide range of ad-hoc queries on top of such datasets.
To fully harness the power of graphs, it is germane to reexamine traditional design and implementation practices and to consider the impact the specific nature of graphs can have on them. In the common context of object-oriented languages and multi-tier architecture, some of the intriguing questions are about how to design data access and business logic to handle graph data effectively. For instance, should an object mapping framework be used or should we try to stick to a graph representation as long as possible?