Oren Eini

CEO of RavenDB

a NoSQL Open Source Document Database

Get in touch with me:

oren@ravendb.net +972 52-548-6969

Posts: 7,565
|
Comments: 51,184
Privacy Policy · Terms
filter by tags archive
time to read 3 min | 548 words

imageThe RavenDB Security Report most significant finding is something that cannot be fixed. Let me try to explain the core of this issue.

We want RavenDB to be secured, and we have chosen to use the well known (and trusted) TLS infrastructure. This means that we can use HTTPS, client certificate authentication and TLS 1.2. Basically, this means that we have a very high degree security and we use a common (and trusted) methods for both trust and encryption on the wire.  That does leave us with the problem of where to get the certificates from. Browsers has been tightening security for a while now, and the kind of alerts you get for self signed certificates are too scary to show by default.

So we need a solution that will be trusted. One option is to generate and install a root certificate when installing RavenDB. I don’t really like this option, to start with, installing a root certificate seems like an invasive action, even if it was generated locally. But this doesn’t solve the problem of accessing the server remotely. The root certificate will be installed on the server, not the client. So that isn’t a good option for us.

Enter Let’s Encrypt and the ability to generate certificates for free. That is a perfect solution for the problem. It is possible to generate them during installation, it is trusted by all major browsers and voila, we are there. Except there is still one issue in place. In order to get the certificate, we need to prove to Let’s Encrypt that we own the domain. But we can’t expect every user to configure DNS or setup routing properly during installation. So instead of making the user do the work, the automatic Let’s Encrypt installation is going to do that using a domain that RavenDB controls (ravendb.community, development.run, ravendb.run, etc). As part of the installation, the local RavenDB instance will talk to our cloud API to complete the Let’s Encrypt challenge. Each user gets their own subdomain under one of the root domains we use and the certificate is being generate locally (the cloud API is involved only for setting up the DNS entries).

This is perfect, because it means that you can very easily get a secured cluster (with URLs such as https://a.oren.development.run) which will just work.

However, from the point of view of the customer, there is an issue. The customer doesn’t own these domains, they are owned by Hibernating Rhinos. This means that technically,  we can issue additional certificates for the cluster domain and even update the DNS records to point to another server. This is something that we will never do, but it is a concern that should be raised during security reviews. For production usage, we expect operators to use their own certificates and domains to ensure that they have full control of their environment.

This is the only issue in the security review that we couldn’t fix and had to document as a warning to users, because it is too convenient a feature and the expected usage scenario (development and quick setup mode) are not likely to concern themselves with the full blown process of defining DNS and certificates.

time to read 1 min | 151 words

imageThe RavenDB Security Report called out the fact that we were using 2048 bits RSA keys when we were generating certificates. RavenDB generates certificates during automatic setup and when you want to generate client certificates directly from RavenDB.

Now, 2048 bits RSA has no known attacks, it seems that there wouldn’t be any shock and awe at the cryptographic community if it would be broken at sometimes in the future.

Because of that, the general recommendation is to use at least 3072 bits, but I don’t like that number, so RavenDB is now using 4096 bits RSA keys when it needs to generate a certificate. This significantly increases the certificate generation time (to the point where it is humanly observable!), but that is a very rare operation, so we don’t really care.

time to read 1 min | 162 words

The bug from yesterday would only show when a particular query is being run concurrently, and not always then.

Here is the code that is responsible for the bug:

It is quite hard to see, because it is so subtle. The code here create a cached lambda that is global for the process. The lambda takes the current engine, the object to transform return the converted object.

So far, so good, right?

Except that in this case,  the lambda is capturing the engine parameter that is passed to the function. The engine is single threaded, and must not be used concurrently. The problem is that the code already handles this situation, and the current engine instance is passed to the lamda, where it is never used. The original engine instance is being used concurrently, violating its invariants and causing errors down the line.

The fix was to simply use the current engine instance that was passed to us, but this was really hard to figure out.

time to read 2 min | 251 words

imageAbout a decade ago I was working at a client, and I was trying to get some idea about the scope of the project. This was a fairly large B2B system, with some interesting requirements around business logic behaviors.

For the life of me, I couldn’t get the customer to commit to even rough SLA or hard numbers around performance, capacity and scale. Whenever I asked, I got some variant of: “It has to be fast, really fast.”

But what is fast, and under what scenario, that was impossible to figure out. I got quite frustrated by this issue and pressed hard on this topic, and finally got something that was close to a hard number, “it has to be like Google”.

That was a metric I could do something with. I went to last year’s Google financial statements, figured out how much money they spent that year and sent the customer a quote for 11.5 billion US dollars.

As you might imagine, that number caught people’s attention, especially since I sent it to quite a few people at the customer. On a call with the customer I explained, “You want it to be like Google, so I used the same budget as Google”.

From that point it was much easier to actually get performance and scale numbers, although I did have to cut a few zeroes (like, all of them) from the quote.

FUTURE POSTS

No future posts left, oh my!

RECENT SERIES

  1. Production Postmortem (52):
    07 Apr 2025 - The race condition in the interlock
  2. RavenDB (13):
    02 Apr 2025 - .NET Aspire integration
  3. RavenDB 7.1 (6):
    18 Mar 2025 - One IO Ring to rule them all
  4. RavenDB 7.0 Released (4):
    07 Mar 2025 - Moving to NLog
  5. Challenge (77):
    03 Feb 2025 - Giving file system developer ulcer
View all series

RECENT COMMENTS

Syndication

Main feed Feed Stats
Comments feed   Comments Feed Stats
}