The Secret to RavenDB

The Secret to RavenDB

Well I have finally got back to writing my blog and I felt the first thing I had to do was reveal the secret to RavenDB. This is not an in depth technical analysis of the inner workings of RavenDB but a very lightweight introduction to the main key concept of RavenDB design. It may also be applicable to many of the other NoSQL Document based solutions available.

If you are just starting out doing RavenDB and come from an SQL background then read this article now. I just wish that there had been a post out there that had cleanly laid out this concept when I first started and so that is what I am trying to do.

Indexing, Indexing, it’s all about Indexing

The secret to RavenDB is that the Index is the single most important thing you are going to write.

Why?

Well first you have to learn that you cannot join properly through a query. The only way to join, aggregate and generally mess with data is through an Index.

Neon Index
Most developers I know that have approached the platform have come from an SQL background. They end up using queries in a typically CRUD way. You write some code that puts some data into RavenDB then you write some code that queries it back out again. In most simplistic circumstances this works and many developers take it no further.

The problems start when you start to join data together. RavenDB provides an Include method that allows the retrieval of a “joined” document but there is no easy way to step out to a second level join on another document related to the one being “joined”. The RavenDB documentation misleadingly rambles on about storing a de-normalized copy in the prime document and this is just distracting.

Why does RavenDB not have the basic functionality to do a join in a query? Surely someone missed something?

To understand why RavenDB does this you need to understand a little bit about what RavenDB is doing under the hood.

It’s a long way from a database Index

One of the things that is misleading is that an Index in the world of RavenDB is nothing like it’s SQL equivalent. Take whatever concepts you understand about an SQL Index and simply press the delete key in your mind. RavenDB Indexes are going to be a new concept that’s a fresh start. Open your mind Quaid….

Quaid

Understand that an Index is a bit of code that executes when a document is saved. There is an event source on the document that fires on modification and Indexes are listening for this event. When it fires the Index runs against the changed document and the output of that Indexing process is potentially stored.

Now this Indexing process is executing asynchronously in the background and has no performance impact on the saving process. There may be many Indexes firing on a single document modification. This is why the platform is eventually consistent. If you select data from an Index immediately after a modification you may get a stale result because the Index has not had time to run.

It’s like a backwards database.

Instead of writing a query that joins two bits of data together at the point of access the join actually happens inside the Index at the point of modification.

RavenDB provides a LoadDocument method within the Index and this can be chained together.

So:-

public CustomerIndex()
{
    AddMap(customers => from customer in customers
        let region = LoadDocument(customer.RegionId)
        let country = LoadDocument(region.CountryId)
        let orders = LoadDocument(customer.OrderIds)
        select new
        {
            Id = customer.Id,
            CustomerName = customer.Name,
            RegionName = region.Name,
            CountryName = country.Name,
            OrdersProcessed = orders.Count(o => o.IsProcessed),
            ProductIds = orders.Select(o => o.ProductId)
        });
    StoreAllFields(FieldStorage.Yes);
}

What is happening here is that the Region document is being loaded from the RegionId on the Customer object then the Country object is being loaded from the CountryId on the Region object. The Orders collection is there to illustrate that you can do this with Lists.

The Index then returns and stores a new object that contains the aggregate of all this work.

Now you would never probably do all of this within a single Index but it is an illustrative example of how you can piece together a new object out of a related graph of documents. The important thing to understand is that this all happens in the background shortly after the saving of the Customer document.

The user then queries the cached Index result with further parameters to retrieve the aggregate DTO objects of choice. This operation usually ends up being a very simplistic operation as most of the complex work was done by the Index. Users get their data almost instantly as they are essentially accessing cached pre-calculated data.

The end result of all of this is one fundamental rule of RavenDB.

Documents are used to store and modify data. Queries run from cached aggregate Indexes.

Which conveniently explains why……

CQRS now makes sense

One of my previous criticisms of RavenDB was that they tried to shove a CQRS based architecture down everyone’s throat. I still stand by the assertion that CQRS is not a good architecture for a transactional SQL database but I now fully understand why it makes so much sense in RavenDB.

Much of the difficult synchronisation bit of CQRS is now done automatically by an Index without anyone noticing.

Commands are used to modify documents. Queries are used to read data from Indexes. In order to do this successfully you have to split them out. They run from different data sources.

Splitting your data layer is no longer a choice. Forget that really clever repository data layer you designed for Entity Framework. It’s over, put it away.

To some extent I now understand what the RavenDB team have been trying to tell us for a long time but the message was getting caught up in the complex architectural language that can frequently cause us all confusion.

No-one really got around to explaining to all the developers that is was due to the way that Indexes work. You never really query a document, you only ever query an Index.

ravendb

Still not got it?

If you still don’t understand all of this then the best way I have found to really hit this home to SQL developers is by explaining it like this:-

“Imagine a database where you could only query a View. Indexes are like Triggers that create cached Views. You write the Index to create a virtually cached View and then you write a query against the cached View. All joins and aggregation have to be done by the Index to create a perfect pre-calculated View. Your queries then just become simple where statements.”

Lightbulb Moment

The next question most SQL developers ask is why? The simple answer is scalability and performance. Most work is done at the point of save rather than every time a user asks for the data. This means a user is only ever accessing pre-calculated data which is significantly quicker. The cached Index can also be distributed, cached and used in a more scalable way because it is a copy of the original master document. Sharding and replication mean big performance benefits.

Summary

The secret of RavenDB is the Index. Learning to write an Index is an essential requirement of using RavenDB properly. Forget the SQL way of querying and understand that most work is done by an Index at the point of save. You only ever query an Index and you will almost certainly return a pre-calculated DTO object.

It is only when you have assimilated this knowledge that most of the discussions and technical documentation about RavenDB makes sense. That’s why their documentation starts with explaining Indexes.

The next time a developer tells you that RavenDB was really slow or didn’t do this or that, simply ask him if he wrote an Index as part of the process. If the answer is no then he never really grasped RavenDB fully.

RavenDB Architecture

Opened our eyes

I love RavenDB. Really I do.

There is just one problem. I am concerned it wont survive.

Now I dont want to be the harbinger of doom here as I think it is a genre defining product that is a real saviour of .NEContinue Reading

Bootstrap to Foundation

Bootstrap

I have to admit to being won over completely by Bootstrap. When I started using it I was like a tired old dog, ready to dive once again in to the murky world of CSS with the inevitability of coming up empy handed, frustrated byContinue Reading

8 Lines of Code

If you are an experienced software architect and you are interested in what is going on at the cutting edge of software design then I would suggest you have a look at Greg Young’s keynote 8 Lines of Code talk. I have to say that this video really knocked me out and here’s why. InContinue Reading

Glasgow Subway Stories

Thought I would go off track a bit and take some time to mention a new site that a friend of mine has contributed to. The site is called Glasgow Subway Stories and it is a site about the Glasgow underground. As someone who has spent most of his life travelling on the subway IContinue Reading