Ayende @ Rahien

Apr 05 2018

RavenDB Security ReportMan in the middle for customer domains

time to read 3 min | 548 words

Tags:

The RavenDB Security Report most significant finding is something that cannot be fixed. Let me try to explain the core of this issue.

We want RavenDB to be secured, and we have chosen to use the well known (and trusted) TLS infrastructure. This means that we can use HTTPS, client certificate authentication and TLS 1.2. Basically, this means that we have a very high degree security and we use a common (and trusted) methods for both trust and encryption on the wire. That does leave us with the problem of where to get the certificates from. Browsers has been tightening security for a while now, and the kind of alerts you get for self signed certificates are too scary to show by default.

So we need a solution that will be trusted. One option is to generate and install a root certificate when installing RavenDB. I don’t really like this option, to start with, installing a root certificate seems like an invasive action, even if it was generated locally. But this doesn’t solve the problem of accessing the server remotely. The root certificate will be installed on the server, not the client. So that isn’t a good option for us.

Enter Let’s Encrypt and the ability to generate certificates for free. That is a perfect solution for the problem. It is possible to generate them during installation, it is trusted by all major browsers and voila, we are there. Except there is still one issue in place. In order to get the certificate, we need to prove to Let’s Encrypt that we own the domain. But we can’t expect every user to configure DNS or setup routing properly during installation. So instead of making the user do the work, the automatic Let’s Encrypt installation is going to do that using a domain that RavenDB controls (ravendb.community, development.run, ravendb.run, etc). As part of the installation, the local RavenDB instance will talk to our cloud API to complete the Let’s Encrypt challenge. Each user gets their own subdomain under one of the root domains we use and the certificate is being generate locally (the cloud API is involved only for setting up the DNS entries).

This is perfect, because it means that you can very easily get a secured cluster (with URLs such as https://a.oren.development.run) which will just work.

However, from the point of view of the customer, there is an issue. The customer doesn’t own these domains, they are owned by Hibernating Rhinos. This means that technically, we can issue additional certificates for the cluster domain and even update the DNS records to point to another server. This is something that we will never do, but it is a concern that should be raised during security reviews. For production usage, we expect operators to use their own certificates and domains to ensure that they have full control of their environment.

This is the only issue in the security review that we couldn’t fix and had to document as a warning to users, because it is too convenient a feature and the expected usage scenario (development and quick setup mode) are not likely to concern themselves with the full blown process of defining DNS and certificates.

Apr 04 2018

RavenDB Security ReportNon-high Strength RSA Keys

time to read 1 min | 151 words

Tweet Share Share 0 comments

Tags:

The RavenDB Security Report called out the fact that we were using 2048 bits RSA keys when we were generating certificates. RavenDB generates certificates during automatic setup and when you want to generate client certificates directly from RavenDB.

Now, 2048 bits RSA has no known attacks, it seems that there wouldn’t be any shock and awe at the cryptographic community if it would be broken at sometimes in the future.

Because of that, the general recommendation is to use at least 3072 bits, but I don’t like that number, so RavenDB is now using 4096 bits RSA keys when it needs to generate a certificate. This significantly increases the certificate generation time (to the point where it is humanly observable!), but that is a very rare operation, so we don’t really care.

Apr 03 2018

ChallengeThe invisible concurrency bug–Answer

time to read 1 min | 162 words

Tweet Share Share 4 comments

Tags:

The bug from yesterday would only show when a particular query is being run concurrently, and not always then.

Here is the code that is responsible for the bug:

It is quite hard to see, because it is so subtle. The code here create a cached lambda that is global for the process. The lambda takes the current engine, the object to transform return the converted object.

So far, so good, right?

Except that in this case, the lambda is capturing the engine parameter that is passed to the function. The engine is single threaded, and must not be used concurrently. The problem is that the code already handles this situation, and the current engine instance is passed to the lamda, where it is never used. The original engine instance is being used concurrently, violating its invariants and causing errors down the line.

The fix was to simply use the current engine instance that was passed to us, but this was really hard to figure out.

Apr 02 2018

ChallengeThe invisible concurrency bug

time to read 1 min | 33 words

Tweet Share Share 3 comments

Tags:

A customer reported an error when a certain query was run concurrently. It took a while, but we finally found the piece of code that is responsible for this:

Can you spot it?

Apr 01 2018

This will be $11,509,586,000, please (excluding tip)

time to read 2 min | 251 words

Tweet Share Share 4 comments

Tags:

About a decade ago I was working at a client, and I was trying to get some idea about the scope of the project. This was a fairly large B2B system, with some interesting requirements around business logic behaviors.

For the life of me, I couldn’t get the customer to commit to even rough SLA or hard numbers around performance, capacity and scale. Whenever I asked, I got some variant of: “It has to be fast, really fast.”

But what is fast, and under what scenario, that was impossible to figure out. I got quite frustrated by this issue and pressed hard on this topic, and finally got something that was close to a hard number, “it has to be like Google”.

That was a metric I could do something with. I went to last year’s Google financial statements, figured out how much money they spent that year and sent the customer a quote for 11.5 billion US dollars.

As you might imagine, that number caught people’s attention, especially since I sent it to quite a few people at the customer. On a call with the customer I explained, “You want it to be like Google, so I used the same budget as Google”.

From that point it was much easier to actually get performance and scale numbers, although I did have to cut a few zeroes (like, all of them) from the quote.

Oren Eini

Oren Eini

CEO of RavenDB

RavenDB Security ReportMan in the middle for customer domains

RavenDB Security ReportNon-high Strength RSA Keys

ChallengeThe invisible concurrency bug–Answer

ChallengeThe invisible concurrency bug

This will be $11,509,586,000, please (excluding tip)

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed