The curse of memory fragmentation

time to read 2 min | 393 words

We took a memory dump of a production server that was exhibiting high memory usage. Here are the relevant parts:

00007ffc25564158 2182611 169439788 System.Int32[]
00007ffbc93b9548 2157229 172578320 Sparrow.Json.LazyStringValue
00007ffbc9a659a0 4052771 226955176 Raven.Server.Documents.Indexes.IndexingStatsScope
00007ffbca3f1fd8 767090 436007568 System.Collections.Generic.Dictionary`2+Entry[[System.Int64, System.Private.CoreLib],[Voron.Impl.PagePosition, Voron]][]
00007ffbc960b320 11504945 552237360 Voron.Impl.PagePosition
00007ffc25563050 88286 588757088 System.Byte[]
0000023c5381fb50 1180785 3027208726 Free
Total 38317500 objects
Fragmented blocks larger than 0.5 MB:
Addr Size Followed by
0000023ca7946198 92.2MB 0000023cad573308 System.Byte[]
0000023cad5745d0 12.6MB 0000023cae202810 System.Byte[]
0000023cae20a828 50.7MB 0000023cb14be978 System.Threading.OverlappedData
0000023cb14beba8 50.2MB 0000023cb46fbdd8 System.Byte[]
0000023cb46ffe10 35.2MB 0000023cb6a36bb0 System.Byte[]
0000023cb6a36df0 26.3MB 0000023cb847d8f0 System.Byte[]
0000023cb8481928 150.9MB 0000023cc1b67d98 System.Byte[]
0000023cc1b6bdd0 157.5MB 0000023ccb8f2820 System.Byte[]
view raw dump.output hosted with ❤ by GitHub

You can already see that there is a lot of fragmentation going on. In this case, there are a few things that we want to pay special attention to. First, there are about 3GB of free space and we are seeing a lot of fragmented blocks.

image

Depending on your actual GC settings, you might be expecting some of it. We typically run with Server mode and RetainVM, which means that the GC will delay releasing memory to the operating system, so in some cases, a high amount of memory in the process isn’t an issue, but you need to see its order. If you are looking at the WinDBG output and seeing hundreds of thousands of fragments, it means that the GC will need to work that much harder when allocating. And it means that it can’t really compact memory and optimize things for higher locality, prevent their promotion to a higher GC gen, etc.

This is also usually the result of pinned memory, typically for I/O or interop. This can cause small buffers that are pinned all over the heap, but most I/O systems are well aware of that and use various tricks to avoid this. Typically by allocating large enough buffers so they would reside in the Large Object Heap, which doesn’t gets compacted very often (if ever). If you are seeing something like this in your application, the first thing to check is the number of pinned buffers and instances you are seeing.

In our case, we intentionally made a change to the system that had the side affect to pin small buffers in memory for a long time, mostly to see how bad that would be. This was to see if we could simplify buffer management somewhat. The answer was that this is quite bad, so we had to manage the buffers more proactively. We allocate a large buffer on the large object heap, then slice it into multiple segments and pool these segments. This way we get small buffers that aren’t wasting a lot of memory, but avoid high memory fragmentation because they have to be pinned for longish periods.