Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,582
|
Comments: 51,212
Privacy Policy · Terms
filter by tags archive
time to read 13 min | 2574 words

As part of the work we have been doing on Voron, I wrote a few benchmarks and looked at where the hot spots are. One of the major ones was this function:

   1: public override void Flush()
   2: {
   3:     if (_flushMode == FlushMode.None)
   4:         return;
   5:  
   6:     PagerState.Accessor.Flush();
   7:     _fileStream.Flush(_flushMode == FlushMode.Full);
   8: }

This is the “fsync()” call, effectively. Accessor.Flush() call will resolve to a FlushViewOfFile(0, size); and _fileStream.Flush(true) will resolve to FlushFileBuffers on Windows.

It isn’t surprising that this would be THE hotspot, it is the part where we actually have to wait for the hardware to do stuff, after all. But further investigation revealed that it wasn’t the FlushFileBuffers that was really costly, it was the FlushViewOfFile. What FlushViewOfFile will do is scan all of the pages in range, and flush them to the OS (not to disk) if they are dirty. That is great, but it is effectively an O(N) operation. We have more knowledge about what is going on, so we can do better. We already know what are the dirty pages, so we can actually use that, instead of letting the OS do all the work.

But then we run into another problem. If we actually just call FlushViewOfFile for every page separately, we are going to have to spend a lot of time just calling to the OS when we have to do a large write. So we need to balance the amount of data we send to FlushViewOfFile with the number of times we are going to call FlushViewOfFile. Therefor, I came up with the following logic. We are going to group calls to FlushViewOfFile, as long as they are nearby (within 256KB of one another), this will give us the best balance between reducing the number of pages that FlushViewOfFile needs to call and the number of times we call FlushViewOfFile.

This now looks like this:

   1: public override void Flush(List<long> sortedPagesToFlush)
   2: {
   3:     if (_flushMode == FlushMode.None || sortedPagesToFlush.Count == 0)
   4:         return;
   5:  
   6:     // here we try to optimize the amount of work we do, we will only 
   7:     // flush the actual dirty pages, and we will do so in sequential order
   8:     // ideally, this will save the OS the trouble of actually having to flush the 
   9:     // entire range
  10:     long start = sortedPagesToFlush[0];
  11:     long count = 1;
  12:     for (int i = 1; i < sortedPagesToFlush.Count; i++)
  13:     {
  14:         var difference = sortedPagesToFlush[i] - sortedPagesToFlush[i - 1];
  15:         // if the difference between them is not _too_ big, we will just merge it into a single call
  16:         // we are trying to minimize both the size of the range that we flush AND the number of times
  17:         // we call flush, so we need to balance those needs.
  18:         if (difference < 64)
  19:         {
  20:             count += difference;
  21:             continue;
  22:         }
  23:         FlushPages(start, count);
  24:         start = sortedPagesToFlush[i];
  25:         count = 1;
  26:     }
  27:     FlushPages(start, count);
  28:  
  29:     if (_flushMode == FlushMode.Full)
  30:         _fileStream.Flush(true);
  31: }

A side affect of this is that we are more likely to be writing to the disk in a sequential fashion because of this.

The end result of this change was doubling the performance of the system under worse case scenario to “just” 25% faster under best conditions.

time to read 1 min | 97 words

With the public release of RavenDB 2.5, we want to hear a lot more from users about what they are doing with RavenDB. Therefor, we decided to have a contest.

Basically, we would ask you to write a post about your RavenDB experience on the RavenDB page in Facebook. We will send a free RavenDB care package (which includes an awesome T-Shirt & laptop stickers) to the first 50 people to send us their stories.

We will also raffle 3 RavenDB DVDs from those who will submit their stories. The contest will end on Sep 20.

time to read 2 min | 229 words

RavenDB_LogoAbout six weeks ago, we actually released RavenDB 2.5 to the world. It was build 2666. I decided to do something a bit different, and do a silent launch. In other words, We released it, let the people in the mailing list know about it, but we didn’t make a big fuss about it.

Now we have a new build, 2700 (which isn’t going to evoke… certain issues for some people), and we want to make as much noise as possible. Because RavenDB 2.5 is out, and it is really cool.

Here are some of the new stuff:

And that is just a taste.

In fact, we are going to do a Webinar about how cool RavenDB 2.5 is on Monday. You can register using the following link: https://www2.gotomeeting.com/register/551636514

And, of course, go and get the latest RavenDB from our site.

And have a great New Year, everyone. We will be off for the Holiday until next Monday…h

time to read 14 min | 2649 words

One of the steps that we take before releasing a stable is to push the latest build into our own production servers, and see what goes on. By far, this has been a pretty pleasant process, and mostly served to increase our confidence that we can go to production with that version. But sometimes it does what it is supposed to do and find the sort of bugs that are very hard to run in production.

In this case, after several hours (8 – 12, I am guessing), we discovered that we would start getting errors such as EsentOutOfSessionsException on some of our sites. Esent sessions are the main way we access Esent, and we are pretty careful about managing them. Previously, there wasn’t really any way that you could get this error, indeed, that is pretty much the first time we saw that outside of the lab. The difference in 2.5 is that we allowed detached sessions to be used along with DTC calls. This gave us the ability to have a commit pending between the Prepare & Commit phases of the operation.

Reviewing the code, I found some places where we weren’t properly disposing the sessions, which could explain that. So I fixed that and pushed a new version out. It took a bit longer this time, but the same error happened.  Sad smileSad smileSad smile

The good thing about having this happen on our production servers is that I have full access there. Of course, it is production, so outright debugging it out, but taking a dump and transferring that to my machine was easy enough.

Now, open it with WinDBG, run “.loadby sos clr” and start investigating.

First command, as always, is !threads. And there I could see several threads marked with Microsoft.Isam.Esent.Interop.EsentOutOfSessionsException. That was interesting, and said that we caught the problem as it was happening, which was great.

Next, it was time to look a bit at the actual memory. I run: !DumpHeap -type Session

image

My reaction is Huh!!! There is absolutely zero justification for that.

Now, the only question is why. So I decided to look at the class that is holding the transaction state, assuming that this is probably what is holding into all those sessions. I run: !DumpHeap -type EsentTransactionContext

image

And that tells me quite a lot. There appear to be now a total of 317 in flight DTC transactions. Considering that I know what our usage is, that is a huge number. And it tells me that something isn’t right here. This is especially true when you consider that we don’t have that many open databases holding in flight transactions: !DumpHeap -type EsentInFlightTransactionalState –stat

image

In other words, we have 8 loaded dbs, each of them holding their in flight transactional state. And we have 317 opened transactions and 35 thousands sessions. That is pretty ridiculous. Especially given that I know that we are supposed to have a max of single digits concurrent DTC transactions at any one time. So somehow we are leaking transactions & sessions. But I am still very unhappy with just “we are leaking sessions”. That is something that I knew before we start debugging everything.

I can already tell that we probably need to add a more robust way of expiring transactions, and I added that, but the numbers don’t add up to me. Since this is pretty much all I can do with WinDBG, I decided to use another tool, MemProfiler. This gives me the ability to import the dump file, and then analyze that in a much nicer manner. Doing so, I quickly found this out:

image

Huh?!

Sessions are finalizable, sure, but I am very careful about making sure to dispose of them. Especially after the previous code change. There should be just 317 undisposed sessions. And having that many items in the finalizer queue can certainly explain things. But I don’t know how they got there. And the numbers don’t match us, either. We are missing about 7K items from the WinDBG numbers.

Okay, next, I pulled ClrMD and wrote the following:

   1: var dt = DataTarget.LoadCrashDump(@"C:\Users\Ayende\Downloads\w3wp\w3wp.dmp");
   2: var moduleInfo = dt.ClrVersions[0].TryGetDacLocation();
   3: var rt = dt.CreateRuntime(moduleInfo);
   4:  
   5: var clrHeap = rt.GetHeap();
   6:  
   7: var finalized = new HashSet<ulong>();
   8:  
   9: int cnt = 0;
  10: foreach (var ptr in rt.EnumerateFinalizerQueue())
  11: {
  12:     var type = clrHeap.GetObjectType(ptr);
  13:     if (type.Name == "Microsoft.Isam.Esent.Interop.Session")
  14:     {
  15:         finalized.Add(ptr);
  16:     }
  17: }
  18: Console.WriteLine(finalized.Count);
  19:  
  20: var live = new HashSet<ulong>();
  21: foreach (var ptr in clrHeap.EnumerateObjects())
  22: {
  23:     var type = clrHeap.GetObjectType(ptr);
  24:     if (type.Name == "Microsoft.Isam.Esent.Interop.Session")
  25:     {
  26:         if (finalized.Contains(ptr) == false)
  27:             live.Add(ptr);
  28:     }
  29: }
  30: Console.WriteLine(live.Count);

This gave me 28,112 sessions in the finalizer queue and 7,547 session that are still live. So something is creating a lot of instances, but not using or referencing them?

I did a code review over everything once again, and I think that I got it. The culprit is this guy:

image

Where createContext is defined as:

image

Now, what I think is going on is that the concurrent dictionary, which is what transactionContexts might be calling the createContext multiple times inside the GetOrAdd method. But because those create values that have to be disposed…  Now, in the normal course of things, worst case scenario is that we would have them in the finalizer queue and they would be disposed in due time. However, under load, we actually gather quite a few of them, and we run out of available sessions to operate with.

At least ,this is my current theory. I changed the code to be like this:

image

So if my value wasn’t created, I’ll properly dispose of it. I’ll be pushing this to production in a bit and seeing what is happening. Note that there aren’t locking, but we might be generating multiple sessions. That is fine, as long as only one of them survives.

time to read 3 min | 409 words

Another thing that is pretty common in development cycles is the notion of who can do more. Hours, that is, rather than work. That is a pretty important distinction.

In general, I appreciate Work much better than Hours. For the simple reason that someone doing 12 hours a day in the office usually do a lot less actual work. Sprints are possible, and we do that sometimes, usually if there is a major production issue or we are gearing up for a release.

Then again, we have just released RavenDB 2.5, and we haven’t had the need for doing that. It was simpler & easiest to push the date by a week than do long hours just to hit an arbitrary point in time. I think that in the last six months, we had people stay in the office past 5 – 6 PM twice.

There are three reasons for that. The two obvious ones are:

  • people doing 12 – 18 hours of work each day turn do crappy stuff, so that is bad for the product.
  • people doing 12 – 18 hours of work each day also tend to have… issues. They burn out, quite rapidly, too. Leaving aside issues such as this one. People crash and burn.

I know that I said it before, but it is important to note. Burn out will do nasty things to you. Leaving aside the proven physical and mental health issues that this cause, it boils down to this. I’ve burned out before, it sucks. Let’s us not do that is a pretty important aspect of what I do on a daily basis. That is why I turned to building products, because being on the road 60% of the time isn’t sustainable, and if it is something that I feel, this is certain for other people who work for Hibernating Rhinos.

But I said that there are three reasons. And the third might be just as important as the others. Hibernating Rhinos was built to be a place where people retire from. This is the ideal, and we are probably talking 40 years from now, considering all factors, but that is the idea. We aren’t a startup, chasing the pot of gold for that one in a hundred chance to make it rich.

And that is why I had to kick people out of the office and tell them to continue working on that issue tomorrow.

FUTURE POSTS

  1. fsync()-ing a directory on Linux (and not Windows) - 3 days from now

There are posts all the way to Jun 09, 2025

RECENT SERIES

  1. Webinar (7):
    05 Jun 2025 - Think inside the database
  2. Recording (16):
    29 May 2025 - RavenDB's Upcoming Optimizations Deep Dive
  3. RavenDB News (2):
    02 May 2025 - May 2025
  4. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  5. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
View all series

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}