Ayende @ Rahien

Oct 08 2013

The Candy Crush Challenge

time to read 1 min | 187 words

Tags:

programming

Here is an interesting challenge. In Candy Crush (which I do not have a problem with), you have 5 lives to try. Life renew at a rate of about 1 per 30 minutes. So it is pretty common to get to this stage:

candy crush life step 1

Now, you can go and change your system clock, and then you’ll get 5 more lives, and you can play some more.

Now, there are probably very good reason why this is done in this manner, to ensure players are still hooked, and it is just inconvenient enough that there is still meaning to the number of lives you have.

However, let us say that we wanted to stop that. How would you go about approaching this? Remember, we are talking about an app on a phone, and this isn’t something super serious if it gets broken, but we want to avoid the super easy workaround.

How would you solve that if this was on a computer, instead of a phone? What are the different considerations? What if this was something that was very important?

Oct 07 2013

RavenDB 2.5 hidden feature: Index debugging

time to read 1 min | 92 words

Tweet Share Share 1 comments

Tags:

raven

Here is something that you aren’t likely to know, mostly because it is an internal feature that we mostly use to fix strange stuff.

You can debug into the indexing functions, and see exactly what happens during indexing / transformers:

To be fair, this is probably something that you won’t really have a chance to se, but for debugging the really hard stuff, it is invaluable.

Oct 06 2013

Update on RavenDB at NBC Universal

time to read 1 min | 130 words

Tweet Share Share 6 comments

Tags:

raven

I just got an email from NBC Universal about the RavenDB usage. We’ve previously talked with them about their RavenDB usage, and I was very pleased to hear that they have completed their upgrade to RavenDB 2.x.

I was even happier to hear the results of the upgrade:

Index times decreased (75% improvement)
Replication + Indexing times decreased (40% improvement)
Development server raven restore times decreased (75% improvement)
Memory leak/crash frequency decreased ⁽¹⁾ (60% improvement – stronger devices are experiencing no crashes)
Support from Hibernating Rhinos will now improve as they are more capable of supporting the new version

I like getting such news from customers.

The scenario they are talking about is overloading the system. It exhibits itself as a high memory utilization. On small devices, that can lead to crashes due to out of memory errors.

Oct 04 2013

The saga of the race condition re-fix

time to read 8 min | 1594 words

Tweet Share Share 2 comments

Tags:

raven
bugs

This is a story about a bug, RavenDB-1280, to be exact. It is really bad when we start with a race condition bug, and it only gets worse from there.

The story begins with the following index:

public class EmailIndex : AbstractIndexCreationTask<EmailDocument, EmailIndexDoc>
{
    public EmailIndex()
    {
        Map =
            emails => from email in emails
                    let text = LoadDocument<EmailText>(email.Id + "/text")                 
                    select new
                            {
                                email.To,
                                email.From,
                                email.Subject,
                                Body = text == null ? null : text.Body
                            };
    }
}

So far, this is a pretty standard index using LoadDocument. The LoadDocument feature allows us to also index data from related documents, and the database will ensure that when the loaded document is changed, the requesting document will also be re-indexed.

What does this mean? It means that when we change the text of an email, we will also be re-indexing that email’s document. Internally, we store it in something that looks like this (using relation model because it is probably easy to follow for most of you):

CREATE TABLE DocumentReferences
{
   Source,
   Reference
}

So in the case above, we will have (Source: emails/1 and Reference: emails/1/text).

Whenever we modify a document, we check the stored references for matches. If we were using a relational database, the code would be something like:

foreach ( var src in (SELECT Source FROM DocumentReferences WHERE Reference = @ref))
{
   TouchDocument(src);
}

So far, so good. This ensures that we are always keep the index up to date. Obviously, this also means that the indexing process needs to update the references as well. Which brings us to the problematic part:

Parallel.For(0, iterations, i =>
{
    using (var session = documentStore.OpenSession())
    {
        session.Store(new EmailDocument {Id = "Emails/"+ i,To = "root@localhost", From = "nobody@localhost", Subject = "Subject" + i});
        session.SaveChanges();
    }

    using (var session = documentStore.OpenSession())
    {
        session.Store(new EmailText { Id = "Emails/" + i + "/text", Body = "MessageBody" + i });
        session.SaveChanges();
    }
});

This code will result in an interesting edge case. Because we are using two different sessions (and thus two different transactions), it is possible for the index to pick up the emails/1 document and start indexing it, but not pick up the emails/1/text document (it hasn’t been saved yet).

However, during the save of emails/1/text document, there wouldn’t be anything in the references storage, so we won’t know that we need to re-index emails/1. The result is that we violated our promise of re-indexing if the document changed.

As I said, just setting up the problem require parallel thinking and a severe case of headache. Luckily, we had a great bug report from Barry Hagan, which included an occasionally failing test. (Even just writing “occasionally failing” causes me to throw up a little bit).

After we identified the problem, we tried to solve it by holding up a list of modified documents in memory that were also required (and missing) during indexing using LoadDocument. Don’t bother to follow up on this statement, it is complex, and it is hard, and we did it. And it worked, except that it didn’t. That just moved the race condition from the entire process to the edges of the process. We have had two guys sitting on this for a couple of days and in danger of hair tearing, with no luck narrowing it down.

Eventually I sat down and tried to figure out how to deal with it. I was able to build on their work, and narrow down exactly where we had the race condition now. And just thinking about trying to solve it was way too hard. It would require multiple locks and pretty strange behavior overall. I wasn’t happy with the implications, and that code already created quite a lot of complications as it was.

So I started by reverting all of that code and had a clean slate to work with. Then I sat down and thought about it, and finally figured out a better way to do it. The problem was only relevant when we had a missing reference (an existing reference would properly generate the re-indexing under all conditions). So I started with that, I gathered up all of the missing references, and stored them in a db task. A db task is a RavenDB notion of a transactionally safe way of registering things to run in another transaction. So after the indexing transaction was done, we would then go an execute that task.

The idea here is that doing it this way prevent us from having to deal with any issues with regards to “what is the server crashes midway”, etc. Anyway, it also means that this task is running in a separate transaction and after the indexing transaction is done. The only thing that this task is doing is just forcing re-indexing of all the documents that tried to call LoadDocument but got null as a result. Because we are now in a separate transaction, there are only two options:

The related document is still not there, in which case were are merely doing some extra work.
The related document is there, and will be indexed.

The first case deserve some more attention, even if the document that was missing comes while we are re-indexing, we don’t care, we already setup the references, and committed it, so it would force yet another re-indexing, and in the end, everyone will be happy.

And now, I have written this blog post while I was running a stress test on this feature, but it took me long enough that I am sure it works properly. So I’ll call it a day and go do something more fun, maybe 3.0 work.

Oct 03 2013

The RavenDB Experts Network

time to read 1 min | 195 words

Tweet Share Share 13 comments

Tags:

raven

I am proud to announce another service that Hibernating Rhinos now offers. A network of experts that are now available for RavenDB work in many parts of the world.

This is in addition to our recent enterprise partnership agreement with Managed Designs, which gives us the ability to offer consulting, training and support services throughout most of Europe.

We now have experts available in:

Dublin, Ireland
London, UK
Toruń, Poland
Dallas, Texas, USA
Austin, Texas, USA
Chicago, Illinois, USA

Those experts are available for RavenDB consulting and support, so you can get a local experts backed by our own expertise on RavenDB. If you are interested in having one of those people show up, ping our support email.

In addition to that, I intend to be visiting the stats starting mid November, so if you are interested in having me show up for a day or two of consulting, or even a private RavenDB course, my schedule calls for visit for the following locations:

New York, NY
Trenton, NJ
Pittsburgh, PA
Orlando, FA
Raleigh, NC
Dallas, TX

Ping me if you are interested in a visit. In addition to that, any user groups in the area that might be interested, I would love to pop and and give a guest lecture.

Oct 02 2013

Evil interview questions: Unique & Random C

time to read 4 min | 652 words

Tweet Share Share 45 comments

Tags:

miscellaneous

Writing in C (and using only the C std lib as building blocks, which explicitly exclude C++ and all its stuff), generate 1 million unique random numbers.

For reference, here is the code in C#:

   1: var random = new Random();

   2: var set = new HashSet<int>();

3:

   4: var sp = Stopwatch.StartNew();

5:

   6: while (set.Count < 1000 * 1000)

   7: {

   8:     set.Add(random.Next(0, int.MaxValue));

   9: }

10:

  11: Console.WriteLine(sp.ElapsedMilliseconds);

It is a brute force approach, I’ll admit, but it completes in about 150ms on my end. Solution must run in under 10 seconds.

This question just looks stupid, it actually can tell you quite a bit about the developer.

Oct 01 2013

File I/O - Flush or WriteThrough?

time to read 2 min | 356 words

Tweet Share Share 0 comments

Tags:

development

Daniel Crabtree has asked a really interesting question:

I've been playing around with file I/O and I am trying to figure out when to use FileOptions.WriteThrough. In your experience, if durability is priority 1 and performance priority 2, is there any reason to use WriteThrough on
file streams in addition to or instead of FlushToDisk?

From my experiments, I found the following:

WriteThrough

Durable

Calls to Write block

Slower than FlushToDisk

FlushToDisk

Durable

Calls to Flush block

Faster than WriteThrough

Both WriteThrough & FlushToDisk

Durable

Calls to Write block

Same performance as WriteThrough alone

I'm asking you, as I notice you've used both approaches.
You used WriteThrough:

http://ayende.com/blog/3448/some-observations-on-saving-to-file

http://ayende.com/blog/3456/code-critique-transactional-file

And in RavenDB https://github.com/ravendb/raven.munin/blob/master/Raven.Munin/FileBasedPersistentSource.cs

You used Flush with flushToDisk in Rhino Events: https://github.com/ayende/Rhino.Events/blob/master/Rhino.Events/Storage/FileStreamSource.cs

Well… that caught up with me, is appears. Basically, I was a bit lazy with the terms I was using in those blog posts, so I think that I have better clarify.

WriteThrough is the .NET name for FILE_FLAG_WRITE_THROUGH which tells the OS to skip any caching and goes directly to the disk. Writes will block until the data is sent to the disk. That is usually the wrong thing to do, actually, since this means that you can’t get any benefit from OS buffers, batching, etc. In practice, what this means is that you are effectively calling fsync() after each and every write call. That is usually the wrong thing to do, since you would usually want to do multiple writes and then flush all those changes to disk, and you do want those changes to take advantage of the work the OS can do to optimize your system.

Instead, you want to use Flush(). Note that even in Munin, both options are used, and WriteThrough can be removed. Although Munin by no means should be seen as a good impl of a storage engine.

That said, you also have to be aware that Flush doesn’t always does its work: http://connect.microsoft.com/VisualStudio/feedback/details/792434/flush-true-does-not-always-flush-when-it-should

It looks like this is fixed in 4.5, but be aware if you need to run on 4.0 or previous versions.

Oren Eini

Oren Eini

CEO of RavenDB

The Candy Crush Challenge

RavenDB 2.5 hidden feature: Index debugging

Update on RavenDB at NBC Universal

The saga of the race condition re-fix

The RavenDB Experts Network

Evil interview questions: Unique & Random C

File I/O - Flush or WriteThrough?

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed