Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,583
|
Comments: 51,214
Privacy Policy · Terms
filter by tags archive
time to read 12 min | 2323 words

This is another stream of conciseness post, with me trying to figure out what is the best way to resolve a given problem.

Update: I ended up solving this in a drastically different way. I'll post about this later on. I am keeping this here because it is a good post about how I think about a problem.

A while ago, I posted a visual description of how Map/Reduce is supposed to work. That was while I was working on RavenDB map/reduce implementation, but that isn’t actually how the current RavenDB map/reduce works.

Instead, it works like this:

Map phase:

for item in docs:
   for result in reduce(map(item)):
        Persist(item.id, result.key, result)

And the reduce phase:

for result in reduce(results):
     WriteToIndex(result)

There are a few things that are interesting here.  First, we have both map and reduce run during the map phase, why is that?

Well, to do an immediate reduction of the values, of course. This has two major benefits.

  • It means that in the reduce phase, the reduce accepts the output of the reduce – this is important because it prepare us for the next step that we would have to do, multiple reduce steps.
  • Second, it means that if you have multiple results from the same documents that share the same key, they would be reduced immediately, rather than have multiple results with the same key.

You might notice an issue with the code above, though. It seems like we are only running reduce once, and only once.

Indeed, this is how RavenDB 1.0 behaves. It only run the reduce once, regardless of how many entries you have per key.

Wait! I probably need to explain things. Let us talk for a second about the following map/reduce index:

//map
from order in docs.Orders
from line in order.OrderLines
select new
{
   line.Product,
   line.Qty
}

//reduce
from result in results
group result by result.Product into g
select new
{
    Product = g.Key,
    Qty = g.Sum(x=>x.Qty)
}

Now, if we have an order with two line items for the same product, they would be immediately reduced (on the same document) and saved as the map results.

Now, the reduce key in this case is the line item product. So when we execute the reduce, we load all the map results that share the same reduce key and run them through the reduce function together.

As I said, this is how RavenDB 1.0 behaves. And it works really nicely, except that it behave less nicely if you have a lot of results for the same reduce key. What happen if we had a really popular product, that was purchased by a million different order?

Every time that we would get a new order for this product, we would have to re-reduce the entire set. That means that we would have to re-reduce 1 millions items.

We recently got a problem when one of our customers run into an issue with running map/reduce indexes over the entire US census data. One of the problematic indexes was something like:

//map 
from person in docs.CensusPeople
select new
{
  person.State,
  Count = 1
}

//reduce
from result in results
group result by result.State into g
select new
{
  State = g.Key,
  Count = g.Sum(x=>x.Count)
}

As you can imagine, this sort of thing is going to have a lot of items for the same key.

For example, for California, we would need to run reduce over about 38 million items, and for Texas it would be over 25 million items. This is really not what we had in mind, so we turned to a long standing bug in RavenDB and started to implement multi step reduce.

The issue is how to do so. Ideally, we do not want to have to make any changes between map/reduce indexes that have a large number of items per key and map/reduce indexes that have small number of indexes per key.

The question is how to do this, and how to make sure that we don’t affect the overall system performance. As I mentioned before, it is very easy to modify things to fit one particular scenario, while forgetting all about the other scenarios.

Things get interesting after that, and here is where you might get lost, because this part is mostly written from the point of view of the implementers.

The actual behavior of the system in the map phase is more complex, because we need to invalidate old items, it looks more like this:

for item in docs:
   keys = DeleteMapResultsFor(item.id)
   for result in reduce(map(item)):
           keys.Remove(result.key)
        Persist(item.id, result.key, result)
   ReReduceRemovedKeys(keys)

Instead of this, we will have this:

for item in docs:
   result = DeleteMapResultsFor(item.id)
   keys = new HashSet<string>(result.Select(x=>x.Key))
   lookups = result.ToLookup(x=>new {x.id, x.key})

   for result in reduce(map(item)):
           keys.Remove(result.key)
           int bucketId
           if not lookups.TryFind(new { item.Id, result.key}, out bucketId):
               bucketId = -1
        Persist(item.id, result.key, bucketId, result)
   ReReduceRemovedKeys(keys)

Note that we now have the notion of a bucket, and by default that bucket is set to –1. But note that we keep the same bucketId if it already has one, this will be important later on.

The interesting thing happens in the reduce phase:

def Reduce(string key, int level):
    bool hasMore =  true
    bool hadMore = false
    while hasMore:
        results = GetMappedResults(key, level, out hasMore)
        hadMore |= hasMore
        if hasMore:
            newBucketId = GetNewBucketId(key, level)
            UpdateBucketId(results, newBucketId)
        for result in reduce(results):
                Persist(key, level +1, result)
                if not hadMore:
                    WriteToIndex(key, result)
    if hadMore:
        ScheduleReduce(key, level +1)

Here is where the important things happen. If we have less than 1,024 items for the same key, we just proceed normally, there is nothing to see there.

If we have more than that, then we create a bucket for all of those results and schedule a re-reduce for the next level up.

In other words, it looks like this, here it the map phase, notice that we start out with all of the bucket ids being –1.

image

When running the reduce phase with level = 0, we get three buckets, like this:

 

image

Which we now need to re-reduce again, this is where we are called with level = 1. Let us assume that the results of bucket 1 & 2 are over 1,024 still, so we will have:

 

image

And finally:

image

 

So far, this looks nice, but there are several practical problems that we still need to solve.

To start with, when does this end? We have users who write map/reduce queries like this:

//map #1
from customer in docs.Customers
select new 
{
   CustomerId = customer.Id,
   CustomerName = customer.Name,
   OrderId = (string)null,
}

// map #2
from order in docs.Orders
select new
{
  order.CustomerId,
  CustomerName = (string)null,
  OrderId = order.Id
}

//reduce
from result in results
group result by result.CustomerId into g
let name = g.FirstOrDefault(x=>x.CustomerName!=null).CustomerName
from item in g
select new
{
   CustomerId = g.Key,
   CustomerName = name,
   item.OrderId
}

This is a frowned upon (but working) way of allow you to query and sort by the customer name while searching for indexes. The problem with this method is that if we have 15,000 orders per customer, we are going to have the same number come out of the reduce phase as well.

Now, the reason this is frowned upon? Because while this is using map/reduce, it isn’t actually… you know.. reducing the data. In order to resolve this issue, we are going to make sure that all of the items generated from a single reduce step will always go into the same bucket. This will mean that we keep pretty much the same behavior as we have now, it is going to be inefficient, but that was always going to be the case.

We are also going to limit the number of levels to three, which still gives us the ability to handle over a billion results before a reduce phase would need to see more than 1,024 items at once.

Take the California example, we would have 37,691,912 people, each of them reduce to 37,691,912 map results at bucket –1. Then we have 36,809 buckets for the second level. And finally 35 levels at the third level. All of which are computed for the final result.

The next step from here is to actually handle updates, which means that we have to keep track of the bucket ids going forward, so we start with deleting a person, which means that we need to delete their map result. Which means that we need to re-reduce the bucket they belong to at the appropriate level, and then upward, etc. In total, we would have to compute 1,024 + 1,024 + 35 items, instead of 37,691,912.

Okay, enough talking, let us see if I straighten things out enough for me to actually be able to implement this.

time to read 2 min | 273 words

I don’t believe that I ever did this, but I was just completely blown away by Skills Matter.

Please note, I never actually dealt with them as a student, although I got great feedback from the people I taught about them in that capacity. I am talking here solely about dealing with them as a service provider.

I have been working with them since 2008, and we have a great working relationship. I am routinely dealing with many businesses, and almost always you have this… friction. I usually have great rapport with the technical people with whom I work, but then it comes to dealing with other departments, it can be… annoying.

With Skills Matter, not only was it never the case. They go out of their way to make it easy and fun to work for them.  I can’t talk about the “straw that broke the camel’s back” and was the trigger for this post, but I can talk about some other things.

From spreading the word about RavenDB and NHibernate by organising talks with me for the Skills Matter community to helping me with a payment dispute that I had with a hotel (that I reserved, but they took care of all the details of getting my money back and saved me tons of international calls an angst) to being willing and able to accommodate screw ups (oops, I missed the plane) in the most pleasant way possible and all the stuff they do for the developer community.

In my time working with them, I found them to be honest, hardworking, professional, ethical and in general Very Good People.

time to read 9 min | 1607 words

There is a chance that you’ll look at me strangely for calling this the “Feature of the Day”. But that is actually quite a important little feature.

Here is the deal, let us say that you have the following index:

public class Orders_Search : AbstractIndexCreationTask<Order, Orders_Search.ReduceResult>
{
    public class ReduceResult
    {
        public string Query { get; set; }
        public DateTime LastPaymentDate { get; set; }
    }

    public Orders_Search()
    {
        Map = orders => from order in orders
                        let lastPayment = order.Payments.LastOrDefault()
                        select new
                        {
                            Query = new object[]
                            {
                                order.FirstName, 
                                order.LastName, 
                                order.OrderNumber, 
                                order.Email, 
                                order.Email.Split('@'),
                                order.CompanyName,
                                order.Payments.Select(payment => payment.PaymentIdentifier),
                                order.LicenseIds
                            },
                            LastPaymentDate = lastPayment == null ? order.OrderedAt : lastPayment.At
                        };
    }
}

And you are quite happy with it. But that is the client side perspective. We don’t have any types on the server, so you can’t just execute this there. Instead, we send a string representing the index to the server. That string is actually the output of the linq expression, which looks like this:

image

This is… somewhat hard to read, I think you’ll agree. So we had some minimal work done to improve this, and right now what you’ll get is (you’ll likely see it roll off the screen, that is expected):

docs.Orders
    .Select(order => new {order = order, lastPayment = order.Payments.LastOrDefault()})
    .Select(__h__TransparentIdentifier0 => new {Query = new System.Object []{__h__TransparentIdentifier0.order.FirstName, __h__TransparentIdentifier0.order.LastName, __h__TransparentIdentifier0.order.OrderNumber, __h__TransparentIdentifier0.order.Email, __h__TransparentIdentifier0.order.Email.Split(new System.Char []{'@'}), __h__TransparentIdentifier0.order.CompanyName, __h__TransparentIdentifier0.order.Payments
    .Select(payment => payment.PaymentIdentifier), __h__TransparentIdentifier0.order.LicenseIds}, LastPaymentDate = __h__TransparentIdentifier0.lastPayment == null ? __h__TransparentIdentifier0.order.OrderedAt : __h__TransparentIdentifier0.lastPayment.At})

This is still quite confusing, actually. But still better than the alternative.

As I said, it seems like a little thing, but those things are important. An index in its compiled form that is hard to understand for a user is a support issue for us. We needed to resolve this issue.

The problem is that source code beautifying is non trivial. I started playing with parsers a bit, but it was all way too complex. Then I had an epiphany. I didn’t actually care about the code, I just wanted it sorted. There aren’t many C# code beautifiers around, but there are a lot for JavaScript.

I started with the code from http://jsbeautifier.org/, which Rekna Anker had already ported to C#. From there, it was an issue of making sure that for my purposes, the code generated the right output. I had to teach it C# idioms such as @foo, null coalescent and lambda expressions, but that sounds harder than it actually was. With that done, we go this output:

docs.Orders.Select(order => new {
    order = order,
    lastPayment = order.Payments.LastOrDefault()
}).Select(__h__TransparentIdentifier0 => new {
    Query = new System.Object[] {
        __h__TransparentIdentifier0.order.FirstName,
        __h__TransparentIdentifier0.order.LastName,
        __h__TransparentIdentifier0.order.OrderNumber,
        __h__TransparentIdentifier0.order.Email,
        __h__TransparentIdentifier0.order.Email.Split(new System.Char[] {
            '@'
        }),
        __h__TransparentIdentifier0.order.CompanyName,
        __h__TransparentIdentifier0.order.Payments.Select(payment => payment.PaymentIdentifier),
        __h__TransparentIdentifier0.order.LicenseIds
    },
    LastPaymentDate = __h__TransparentIdentifier0.lastPayment == null ? __h__TransparentIdentifier0.order.OrderedAt : __h__TransparentIdentifier0.lastPayment.At
})

And this is actually much better. Still not good enough, mind. we can do better than that. It is a simple change:

docs.Orders.Select(order => new {
    order = order,
    lastPayment = order.Payments.LastOrDefault()
}).Select(this0 => new {
    Query = new System.Object[] {
        this0.order.FirstName,
        this0.order.LastName,
        this0.order.OrderNumber,
        this0.order.Email,
        this0.order.Email.Split(new System.Char[] {
            '@'
        }),
        this0.order.CompanyName,
        this0.order.Payments.Select(payment => payment.PaymentIdentifier),
        this0.order.LicenseIds
    },
    LastPaymentDate = this0.lastPayment == null ? this0.order.OrderedAt : this0.lastPayment.At
})

And now we got to something far more readable Smile.

time to read 1 min | 70 words

Well, it took a while, but the next version of the NHibernate Profiler is ready to go to a private beta.

We go to private beta before the product is done, because we want to get as much early feedback as we can.

We have 10 spots for NHibernate Profiler v2.0 and 10 spots for Entity Framework Profiler v2.0.

If you are interested, please send me an email about this.

time to read 2 min | 252 words

You might have noticed that I am talking a lot about support and operations recently.  This is because we have been doing a lot of work around that area. Making sure that RavenDB is even more forthcoming and open about what is going on.

This week it has been about making sure that the shutdown phase of RavenDB is as transparent as it could be. Debugging those sort of issues is a PITA, because you very rarely really stop to consider them. But we got some feedback from customers about a common set of issues there.

Under some circumstances, shutting down RavenDB might take a while. We identified several things that can cause this, mostly indexing in progress.

The first thing we did is change the way we are doing indexing to allow aborting them during shutdown, but there are still a set of operations that might take a while that we have to complete even in shutdown scenario.

For example, we might be just in the middle of flushing to disk, and we really want that to complete successfully (otherwise we would need to run an expensive check on startup).

Therefor, we added this:

image

You’ll still have to wait, sure. But now if you watch the logs you can see why, and have some understanding about how long this is going to take.

time to read 2 min | 299 words

I wanted to improve the RavenDB query optimizer, so it can merge similar indexes into a single index.

It is actually fairly simple to do (conceptually). But that leaves us with the problem of what to do with the indexes that we just superseded.

The whole point here is to reduce the number of indexes, not to create an index that will never be used.

So the next step was to make sure that we can cleanup unused auto indexes. That is great…

Except that we don’t have any way of tracking unused indexes, so before we could do that, we have to add tracking for query times.

Okay, let us do that.

Wait, we don’t have a place to put those in, and anyway, we don’t want to save it all the time, that would mean IO for every query, and what about concurrent queries?

So we need to add a way to save this, and make sure that there is some batching going on in there, so we won’t go to the disk all the time. And wait, there is this other feature over there that needs something similar, so we might as well solve both problems at once…

Okay, let us add that.

Now, where were we again?

That annoying thing about this? You only realize that those are actually problems when you run some deep analysis on the actual feature request. And usually you have to have multiple cycles to get it all out.

It is also very easy to go down one branch, all the way to the bottom, then go up to another branch, which was opened up because of the work you just did. Make for a good workday, but make it harder to predict what you are going to do today.

time to read 1 min | 193 words

Is something that you probably wouldn’t even notice, to be perfectly honest. We are going to work on our map/reduce implementation.

This is freakishly complex, because we need to do updatable, persistent map/reduce. It got so complex that I decided that I can’t really implement this on my own in the RavenDB solution and moved to spiking the solution in isolation.

If you care, you can look at this here. There would be additional optimizations to worry about in RavenDB, but it is pretty much all there, in less than 400 lines of code.

I couldn’t find anything out there which was nearly as useful. Most of the map/reduce implementations are about distributing the work load and scheduling it. None of them really deal with the notion of updatable map/reduce results.

Note that the Storage layer there is both only there for the sole purpose of actually showing we can persist and restart from any point and also has critical behavior in its behavior (for example, scheduling).

I’ll probably do a set of posts about this, but for now, here is the source, have fun poking at it: https://github.com/ayende/updatable-persistent-map-reduce

time to read 1 min | 136 words

That is what I call things like this:

image

It was part of a question I was asked, and the question contained things like:

I've a field (Field2) in MyClass as nested Dictionary and i have an index for Field 1.

There are several issues with this style of question:

  • It make it harder to answer, because you have to keep a mapping of that in your head.
  • There is no meaning to the question, we can’t figure out whatever this is a good or bad scenario.
  • Usually the problem description contains references to the real model, which I haven’t seen and don’t know anything about.

Annoying.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production postmorterm (2):
    11 Jun 2025 - The rookie server's untimely promotion
  2. Webinar (7):
    05 Jun 2025 - Think inside the database
  3. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  4. RavenDB News (2):
    02 May 2025 - May 2025
  5. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}