Ayende @ Rahien

filter by tags archive

architecture (612) rss
bugs (451) rss
challanges (123) rss
community (379) rss
databases (481) rss
design (895) rss
development (641) rss
hibernating-practices (71) rss
miscellaneous (592) rss
performance (397) rss
programming (1085) rss
raven (1449) rss
ravendb.net (533) rss
reviews (184) rss

2025
- June (5)
- May (10)
- April (10)
- March (10)
- February (7)
- January (12)
2024
- December (3)
- November (2)
- October (1)
- September (3)
- August (5)
- July (10)
- June (4)
- May (6)
- April (2)
- March (8)
- February (2)
- January (14)
2023
- December (4)
- October (4)
- September (6)
- August (12)
- July (5)
- June (15)
- May (3)
- April (11)
- March (5)
- February (5)
- January (8)
2022
- December (5)
- November (7)
- October (7)
- September (9)
- August (10)
- July (15)
- June (12)
- May (9)
- April (14)
- March (15)
- February (13)
- January (16)
2021
- December (23)
- November (20)
- October (16)
- September (6)
- August (16)
- July (11)
- June (16)
- May (4)
- April (10)
- March (11)
- February (15)
- January (14)
2020
- December (10)
- November (13)
- October (15)
- September (6)
- August (9)
- July (9)
- June (17)
- May (15)
- April (14)
- March (21)
- February (16)
- January (13)
2019
- December (17)
- November (14)
- October (16)
- September (10)
- August (8)
- July (16)
- June (11)
- May (13)
- April (18)
- March (12)
- February (19)
- January (23)
2018
- December (15)
- November (14)
- October (19)
- September (18)
- August (23)
- July (20)
- June (20)
- May (23)
- April (15)
- March (23)
- February (19)
- January (23)
2017
- December (21)
- November (24)
- October (22)
- September (21)
- August (23)
- July (21)
- June (24)
- May (21)
- April (21)
- March (23)
- February (20)
- January (23)
2016
- December (17)
- November (18)
- October (22)
- September (18)
- August (23)
- July (22)
- June (17)
- May (24)
- April (16)
- March (16)
- February (21)
- January (21)
2015
- December (5)
- November (10)
- October (9)
- September (17)
- August (20)
- July (17)
- June (4)
- May (12)
- April (9)
- March (8)
- February (25)
- January (17)
2014
- December (22)
- November (19)
- October (21)
- September (37)
- August (24)
- July (23)
- June (13)
- May (19)
- April (24)
- March (23)
- February (21)
- January (24)
2013
- December (23)
- November (29)
- October (27)
- September (26)
- August (24)
- July (24)
- June (23)
- May (25)
- April (26)
- March (24)
- February (24)
- January (21)
2012
- December (19)
- November (22)
- October (27)
- September (24)
- August (30)
- July (23)
- June (25)
- May (23)
- April (25)
- March (25)
- February (28)
- January (24)
2011
- December (17)
- November (14)
- October (24)
- September (28)
- August (27)
- July (30)
- June (19)
- May (16)
- April (30)
- March (23)
- February (11)
- January (26)
2010
- December (29)
- November (28)
- October (35)
- September (33)
- August (44)
- July (17)
- June (20)
- May (53)
- April (29)
- March (35)
- February (33)
- January (36)
2009
- December (37)
- November (35)
- October (53)
- September (60)
- August (66)
- July (29)
- June (24)
- May (52)
- April (63)
- March (35)
- February (53)
- January (50)
2008
- December (58)
- November (65)
- October (46)
- September (48)
- August (96)
- July (87)
- June (45)
- May (51)
- April (52)
- March (70)
- February (43)
- January (49)
2007
- December (100)
- November (52)
- October (109)
- September (68)
- August (80)
- July (56)
- June (150)
- May (115)
- April (73)
- March (124)
- February (102)
- January (68)
2006
- December (95)
- November (53)
- October (120)
- September (57)
- August (88)
- July (54)
- June (103)
- May (89)
- April (84)
- March (143)
- February (78)
- January (64)
2005
- December (70)
- November (97)
- October (91)
- September (61)
- August (74)
- July (92)
- June (100)
- May (53)
- April (42)
- March (41)
- February (84)
- January (31)
2004
- December (49)
- November (26)
- October (26)
- September (6)
- April (10)

RavenDB Workshops - Deep dive into practical use of Document Data Modeling

Jan 17 2019

Using TLS with RustAuthentication

time to read 3 min | 478 words

Tweet Share Share 0 comments

Tags:

After running into a few hurdles, I managed to get rust openssl bindings to work, which means that this is now the time to actually wire things properly in my network protocol, let’s see how that works, shall we?

First, we have the OpenSSL setup:

As you can see, this is pretty easy and there isn’t really anything there that is of actual interest. It does feel a whole lot easier than dealing with OpenSSL directly in C, though.

That said, when I started actually dealing with the client certificate, things got a lot more complicated. The first thing that I wanted to do is to do my authentication, which is defined as:

Client present a client certificate (can be any client certificate).
If a client doesn’t give a certificate, we accept the connection, send a message (using the encrypted tunnel) and abort.
If the client provide an certificate, it must be one that was previously registered in the server. That is what allowed_certs_thumbprints is for. If it isn’t, we accept the connection, write an error and abort.
If the client certificate has expired or is not yet valid, accept, write error & abort.

You get the gist. Here is what I had to do to implement the first part:

Most of the code, actually, is about generating proper and clear error messages, more than anything else. I’m not sure how to get the friendly name from the certificate, but this seems to be a good enough stand-in for now.

We validate that we have a certificate, or send an error back. We validate that the certificate presented is known to us, or we send an error back.

The next part I wanted to implement was… really far too hard than it should be. I just wanted to verify that the certificate not before/not after dates are valid. And the problem is that the rust bindings for OpenSSL do not expose that information. Luckily, because it is using OpenSSL, I can just call to OpenSSL directly. That led me to some interesting search into how Rust calls out to C, how foreign types work and a lot of “fun” like that. Given that I’m doing this to learn, I suppose that this is a good thing, though.

Here is what I ended up with (take a deep breath):

Notice that I’m doing all of this (defining external function, defining helper functions) inside the authenticate_certificate function. Coming up with that was harder than expected, but I really liked the fact that it was possible, and that I can just shove this into a corner of my code and not have to make a Big Thing out of it.

And with that, I the authentication portion of my network protocol in Rust done.

The next stage is going to be implementing a server that can handle more than a single connection at a time Smile .

Jan 16 2019

Technical presentation delivery notes

time to read 4 min | 654 words

Tweet Share Share 1 comments

Tags:

community

I’m writing this post as I’m sitting in the airport, leaving the CodeMash conference. This has been my first “true” conference for a while, because I was able to not just speak and stand in a sponsor booth but actually participate in a lot of sessions and talk to a lot of people. I had a blast, and both the IRS and my wife consider this a work trip Smile .

I have been presenting in international conferences for over a decade now and I wanted to put in a few words for people who are speaking at a technical conference. None of this is new, mind. If you have been reading any recommendations about how to present in conferences, I’m probably going to bore you. I’m writing this because I saw several sessions that had technical issues in how they were delivered. That is, the content was great, but the way it was delivered could be improved.

Probably the most important factor that I need to mention is: Make your content readable.

When you are presenting, use big fonts everywhere. That means that (ahead of time) you should make sure that your PPT’s content is readable from the end of the room, if you are presenting code in an IDE, make sure that you know how to increase the zoom so the code is readable. For that matter, syntax highlighting is not an optional feature when showing code. And use the default syntax highlighting for the language you are working with. And no dark themes.

I’m assuming that what you care about is the code, so you want to show that, in a way that the people in your talk are familiar with, so no new themes, aqua on pink color schemes, etc. Go with the default.

But code is just one factor of it. If you are showing output, make sure that this is readable. That means that if you are writing to the console, make sure that the console font it big enough, use colors for emphasis, etc.

If you are showing XML / JSON / data formats, make sure that they are pretty printed. If you dump something like this:

The audience is going to be too busy parsing the text to actually pay attention to what you are saying.

And if at all possible, use a console application and have no architecture.

If you are actually talking about a React app, you can’t do that, and if your talk is about architecture, you are going to need to show that. But for most cases, if you are showing off some new language feature, or talk about a particular service or API, you want to use a console app to demo it. This is because it allow you (and your audience) to focus solely on the problem at hand rather than look the the control –> service –> repository magic via DI that hides a lot of the backend details. In general, you want to strip away as much as possible that isn’t directly core to your topic.

Another thing that I noticed, when you want to show additional data (files, artifacts, etc), make sure that you have them already opened before you start. If you are doing a demo, have screen shots available for when / if the demo mess up.

Everything I just mentioned might seem obvious, but you need to go over that before you go and present, it was surprising to see several speakers make some of these mistakes.

Now, to be fair, that might sound like I’m piling on people, but that isn’t my intention. I’m aggregating a lot of different small mistakes to point them out, they can detract from the presentation, but at least in the sessions that I was at, they didn’t kill the presentation for me.

Jan 15 2019

Using TLS with Rust–Getting OpenSSL to work

time to read 2 min | 340 words

Tweet Share Share 0 comments

Tags:

programming

After getting really frustrated with the state of Rust & TLS, and I decided to sit down and figure out what it would take to make the OpenSSL crate actually build successfully. Even though the crate claims to support vcpkg, it seems that there were issues there. I started from a clean slate, and checked that I have openssl via vcpkg installed:

I then got into a rabbit hole of errors in the build, first:

This seems like it wants to statically link to them by default, but when I set the env variable, I got:

Looking closely at the error (always read the error message), you can see that it is looking for a 64 bits build, but I’ve a x86 build.

That very likely explains the issues that I previously had. I tried to point it to the SSL build directory, and I’m pretty sure that I used the 32bits directory. It rejected the attempted link, but didn’t bother to tell me about it.

To be fair, this isn’t Rust’s fault, it is link.exe’s fault for not providing a clear error about this case. Actually, this is the case where you are going to invest some time writing a feature whose only purpose is to get good errors when the user messed up. But that kind of attention to detail make a world of difference.

Here is what fixed this for me.

And with that, I can build using:

Hurray! That is enough for now, I guess. I’ll get things actually working in another post.

Jan 14 2019

Data modeling with indexesPredicting the future

time to read 3 min | 505 words

Tweet Share Share 1 comments

Tags:

Computation during indexes open up some nice features when we are talking about data modeling and working with your data. In this post, I want to discuss predicting the future with it. Let’s see how we can do that, shall we?

Consider the following document, representing a (simplified) customer model:

We have a customer that is making monthly payments. This is a pretty straightforward model, right?

We can do a lot with this kind of data. We can obviously compute the lifetime value of a customer, based on how much they paid us. We already did something very similar in a previous post, so that isn’t very interesting.

What is interesting is looking into the future. Let’s see how we can start simple, but figuring out what is the next charge rate for this customer. For now, the logic is about as simple as it can be. Monthly customers pay by month, basically. Here is the index:

I’m using Linq instead of JS here because I’m dealing with dates and JS support for dates is… poor.

As you can see, we are simply looking at the last date and the subscription, figuring out how much we paid the last three times and use that as the expected next payment amount. That can allow us to do nice things, obviously. We can now do queries on the future. So finding out how many customers will (probably) pay us more than 100$ on the 1st of Feb both easy and cheap.

We can actually take this further, though. Instead of using a simple index, we can use a map/reduce one. Here is what this looks like:

And the reduce:

This may seem a bit dense at first, so let’s de-cypher it, shall we?

We take the last payment date and compute the average of the last three payments, just as we did before. The fun part now is that we don’t compute just the single next payment, but the next three. We then output all the payments, both existing (that already happened) and projected (that will happen in the future) from the map function. The reduce function is a lot simpler, and simply sum up the amounts per month.

This allows us to effectively project data into the future, and this map reduce index can be used to calculate expected income. Note that this is aggregated across all customers, so we can get a pretty good picture of what is going to happen.

A real system would probably have some uncertainty factor, but that touches on business strategy more than modeling, so I don’t think we need to go into that here.

Jan 11 2019

Using TLS with RustPart III–Will native tls do the trick?

time to read 2 min | 274 words

Tweet Share Share 2 comments

Tags:

After trying (and failing) to use rustls to handle client authentication, I tried to use rust-openssl bindings. It crapped out on me with a really scary link error. I spent some time trying to figure out what was going on, but given that it said that I wanted to write Rust code, not deal with link errors, I decided to see if the final alternative in the Rust eco system will work, native-tls package.

And… that is a no go as well. Which is sad, because the actual API was quite nice. The reason it isn’t going to work? The native-tls package just has no support for client certificate authentication when running as a server, so not usable for me.

That leaves me with strike three out of three:

rustls – native Rust API, easy to work with, but doesn’t allow to accept arbitrary client certificates, only ones from known issuers.
rust-openssl – I have build this on top of OpenSSL before, so I know it works. However, trying to build it on Windows resulted in link errors, so that was out.
native-tls – doesn’t have support for client certificates, so not usable.

I think that at this point, I have three paths available to me:

Give up and maybe try doing something else with Rust.
Fork rustls and add support for accepting arbitrary client certificates. I’m not happy with this because it requires changing not just rustls but also probably webpki package and I’m unsure if the changes I have in mind will not hurt the security of the system.
Try to fix the OpneSSL link issue.

I think that I’ll go with the third option, but this is really annoying.

Jan 10 2019

Data modeling with indexesBusiness rules

time to read 4 min | 613 words

Tweet Share Share 6 comments

Tags:

In my last post on the topic, I showed how we can define a simple computation during the indexing process. That was easy enough, for sure, but it turns out that there are quite a few use cases for this feature that go quite far from what you would expect. For example, we can use this feature as part of defining and working with business rules in our domain.

For example, let’s say that we have some logic that determine whatever a product is offered with a warranty (and for how long that warranty is valid). This is an important piece of information, obviously, but it is the kind of thing that changes on a fairly regular basis. For example, consider the following feature description:

As a user, I want to be able to see the offered warranty on the products, as well as to filter searches based on the warranty status.
Warranty rules are:
For new products made in house, full warranty for 24 months.
For new products from 3rd parties, parts only warranty for 6 months.
Refurbished products by us, full warranty, for half of new warranty duration.
Refurbished 3rd parties products, parts only warranty, 3 months.
Used products, parts only, 1 month.

Just from reading the description, you can see that this is a business rule, which means that it is subject to many changes over time. We can obviously create a couple of fields on the document to hold the warranty information, but that means that whenever the warranty rules change, we’ll have to go through all of them again. We’ll also need to ensure that any business logic that touches the document will re-run the logic to apply the warranty computation (to be fair, these sort of things are usually done as a subscription in RavenDB, which alleviate that need).

Without further ado, here is the index to implement the logic above:

You can now query over the warranty types and it’s duration, project them from the index, etc. Whenever a document is updates, we’ll re-compute the warranty status and update the index.

This saves you from having additional fields in your model and greatly diminish the cost of queries that need to filter on warranty or its duration (since you don’t need to do this computation during the query, only once, during indexing).

If the business rule definition changes, you can update the index definition and RavenDB will effectively roll out your change to the entire dataset. That is nice, but even though I’m writing about cool RavenDB features, there are some words of cautions that I want to mention.

Putting queryable business rules in the database can greatly ease your life, but be wary of putting too much business logic in there. In general, you want your business logic to reside right next to the rest of your application code, not running in a different server in a mode that is much harder to debug, version and diagnose. And if the level of complexity involved in the business rule exceed some level (hard to define, but easy to know when you hit it), you should probably move from defining the business rules in an index to a subscription.

A RavenDB subscription allow you to get all changes to documents and apply your own logic in response. This is a reliable way to process data in RavenDB, this runs in your own code, under your own terms, so it can enjoy all the usual benefits of… well, being your code, and not mine. You can read more about them in this post and of course, the documentation.

Jan 09 2019

Implementing Phantom Reference in C#

time to read 1 min | 182 words

Tweet Share Share 9 comments

Tags:

I run into this very interesting blog post and I decided to see if I could implement this on my own, without requiring any runtime support. This turned out the be surprisingly easy, if you are willing to accept some caveats.

I’m going to assume that you have read the linked blog post, and here is the code that implement it:

Here is what it gives you:

You can have any number of queues.
You can associate an instance with a queue at any time, including far after it was constructed.
Can unregister from the queue at any time.
Can wait (or easily change it to do async awaits, of course) for updates about phantom references.
Can process such events in parallel.

What about the caveats?

This utilize the finalizer internally, to inform us when the associated value has been forgotten, so it takes longer than one would wish for. This implementation relies on ConditionalWeakTable to do its work, by creating a weak associating between the instances you pass and the PhantomReference class holding the handle that we’ll send to you once that value has been forgotten.

Jan 08 2019

Data modeling with indexesIntroduction

time to read 4 min | 734 words

Tweet Share Share 2 comments

Tags:

The title of this post is pretty strange, I admit. Usually, when we think about modeling, we think about our data. If it is a relational database, this mostly mean the structure of your tables and the relations between them. When using a document database, this means the shape of your documents. But in both cases, indexes are there merely to speed things up. Oh, a particular important query may need an index, and that may impact how you lay out the data, but these are relatively rare cases. In relational databases and most non relational ones, indexes do not play any major role in data modeling decisions.

This isn’t the case for RavenDB. In RavenDB, an index doesn’t exist merely to organize the data in a way that make it easier for the database to search for it. An index is actually able to modify and transform the data, on the current document or full related data from related documents. A map/reduce index is even able aggregate data from multiple documents as part of the indexing process. I’ll touch on the last one in more depth later in this series, first, let’s tackle the more obvious parts. Because I want to show off some of the new features, I’m going to use JS for most of the indexes definitions in these psots, but you can do the same using Linq / C# as well, obviously.

When brain storming for this post, I got so many good ideas about the kind of non obvious things that you can do with RavenDB’s indexes that a single post has transformed into a series and I got two pages of notes to go through. Almost all of those ideas are basically some form of computation during indexing, but applied in novel manners to give you a lot of flexibility and power.

RavenDB prefers to have more work to do during indexing (which is batched and happen on the background) than during query time. This means that we can push a lot more work into the background and just let RavenDB handle it for us. Let’s start from what is probably the most basic example of computation during query, the Order’s Total. Consider the following document:

As you can see, we have the Order document and the list of the line items in this order. What we don’t have here is the total order value.

Now, actually computing how much you are going to pay for an order is complex. You have to take into account taxation, promotions, discounts, shipping costs, etc. That isn’t something that you can do trivially, but it does serve to make an excellent simple example and similar requirements exists in many fields.

We can obvious add an Total field to the order, but then we have to make sure that we update it whenever we update the order. This is possible, but if we have multiple clients consuming the data, this can be fairly hard to do. Instead, we can place the logic to compute the property in the index itself. Here is how it would look like:

The same index in JavaScript is almost identical:

In this case, they are very similar, but as the complexity grow, I find it is usually easier to express logic as a JavaScript index rather than use a single (complex) Linq expression.

Such an index give us a computed field, Total, that has the total value of an order. We can query on this field, sort it and even project it back (if this field is stored). It allow us to always have the most up to date value and have the database take care of computing it.

This basic technique can be applied in many different ways and affect the way we shape and model our data. Currently I have at least three more posts planned for this series, and I would love to hear your feedback. Both on the kind of stuff you would like me to talk about and the kind of indexes you are using RavenDB and how it impacted your data modeling.

Jan 07 2019

Using TLS with RustPart II - Client authentication

time to read 6 min | 1017 words

Tweet Share Share 4 comments

Tags:

The task that I have for now is to add client authentication via X509 client certificate. That is both obvious and non obvious, unfortunately. I’ll get to that, but before I do so, I want to go back to the previous post and discuss this piece of code:

I’ll admit that I’m enjoying exploring Rust features, so I don’t know how idiomatic this code is, but it is certainly dense. This basically does the setup for a TCP listener and setting up of the TLS details so we can accept a connection.

Rust allows us to define local functions (inside a parent function), this is mostly just a way to define a private function, since the nested function has no access to the parent scope. The open_cert_file function is just a way to avoid code duplication, but it is an interesting one. It is a generic function that accepts an open ended function of its own. Basically, it will open a file, read it and then pass it to the function it was provided. There is some error handling, but that is pretty much it.

The next fun part happens when we want to read the certs and key file. The certs file is easy, it can only ever appear in a single format, but the key may be either PKCS8 or RSA Private Key. And unlike the certs, where we expect to get a collection, we need to get just a single value. To handle that we have:

First, we try to open and read the file as a RSA Private Key, if that isn’t successful, we’ll attempt to read it as PKCS8 file. If either of those attempts was successful, we’ll try to get the first key, clone it and return. However, if there was an error in any part of the process, we abort the whole thing (and exit the function with an error).

From my review of Rust code, it looks like this isn’t non idiomatic code, although I’m not sure I would call it idiomatic at this point. The problem with this code is that it is pretty fun to write, when you read it is obvious what is going on, but it is really hard to actually debug this. There is too much going on in too little space and it is not easy to follow in a debugger.

The rest of the code is boring, so I’m going to skip that and start talking about why client authentication is going to be interesting. Here is the core of the problem:

In order to simplify my life, I’m using the rustls’ Stream to handle transparent encryption and decryption. This is similar to how I would do it when using C#, for example. However, the stream interface doesn’t have any way for me to handle this explicitly. Luckily, I was able to dive into the code and I think that given the architecture present, I can invoke the handshake manually on the ServerSession and then hand off the session as is to the stream.

What I actually had to do was to setup client authentication here:

And then manually complete the handshake first:

And this is when I run into a problem, when trying to connect via my a client certificate, I got the following error:

I’m assuming that this is because rustls is actually verifying the certificate against PKI, which is not something that I want. I don’t like to use PKI for this, instead, I want to register the allowed certificates thumbprints, but first I need to figure out how to make rustls accept any kind of client certificate. I’m afraid that this means that I have to break out the debugger again and dive into the code to figure out where we are being rejected and why…

After a short travel in the code, I got to something that looks promising:

This looks like something that I should be able to control to see if I like or dislike the certificate. Going inside it, it looks like I was right:

I think that I should be able to write an implementation of this that would do the validation without checking for the issuer. However, it looks like my plan run into a snag, see:

I’m not sure that I’m a good person to talk about the details of X509 certificate validation. In this case, I think that I could have done enough to validate that the cert is valid enough for my needs, but it seems like there isn’t an way to actually provide another implementation of the ClientCertVerifier, because the entire package is private. I guess that this is as far as I can use rustls, I’m going to move to the OpenSSL binding, which I’m more familiar with and see how that works for me.

Okay, I tried using the rust OpenSSL bindings, and here is what I got:

So this is some sort of link error, and I could spend half a day to try to resolve it, or just give up on this for now. Looking around, it looks like there is also something called native-tls for Rust, so I might take a peek at it tomorrow.

Jan 04 2019

Modeling data using a multi model database

time to read 13 min | 2462 words

Tweet Share Share 6 comments

Tags:

Most developers have been weaned on relational modeling and have to make a non trivial mental leap when the times comes to model data in a non relational manner. That is hard enough on its own, but what happens when the data store that you use actually have multi model capabilities. As an industry, we are seeing more and more databases that take this path and offer multiple models to store and query the data as part of their core nature. For example, ArangoDB, CosmosDB, Couchbase, and of course, RavenDB.

RavenDB, for example, gives you the following models to work with:

Documents (JSON) – Multi master with any node accepting reads and writes.

ACID transactions over multiple documents.
Simple / full text queries.
Map/Reduce and aggregation queries.

Binary data – Attachments to documents.
Counters (Map<string, int64>) – CRDT multi master distributed counters.
Key/Value – strong distributed consistency via Raft protocol.
Graph queries – on top of the document model.
Revisions – built in audit trail for documents

With such a wealth of options, it can be confusing to select the appropriate tool for the job when you need to model your data. In this post, I aim to make sense of the options RavenDB offers and guide you toward making the optimal choices.

The default and most common model you’ll use is going to be the document model. It is the one most appropriate for business data and you’ll typically follow the Domain Driven Design approach for modeling your data and entities. In other words, we are talking about Aggregates, where each document is a whole aggregate. References between entities are either purely local to an aggregate (and document) or only between aggregates. In other words, a value in one document cannot point to a value in another document. It can only point to another document as a whole.

Most of your business logic will be focused on the aggregate level. Even when a single transaction modify multiple documents, most of the business logic is done at each aggregate independently. A good way to handle that is using Domain Events. This allow you to compose independent portions of your domain logic without tying it all in one big knot.

We talked about modifying documents so far, but a large part of what you’ll do with your data is query and present it to users. In these cases, you need to make a conscious and explicit decision. Whatever your display model is going to be based on your documents or a different source. You can use RavenDB ETL to project the data out to a different database, changing its shape to the appropriate view model along the way. RavenDB ETL allow you to replicate a portion of the data in your database to another location, with the ability to modify the results as they are being sent. This can be a great tool to aid you in bridging the domain model and the view model without writing a lot of code.

This is useful for applications that have a high degree of complexity in their domain and business rules. For most applications, you can simply project the relevant data out at query time, but for more complex systems, you may want to have strict physical separation between your read model and the domain model. In such a scenario, Raven ETL can greatly simplify the task of facilitating the task of moving (and transforming) the data from the write side to the read side.

When it comes to modeling, we also need to take into account RavenDB’s map/reduce indexes. These allow you to define aggregation that will run in the background, in other words, at query time, the work has already been done. This in turn leads to blazing fast aggregation queries and can be another factor in the design of the system. It is common to use map/reduce indexes to aggregation the raw data into a more usable form and work with the results. Either directly from the index or use the output collection feature to turn the results of the map/reduce index to real documents (which can be further indexes, aggregated, etc).

Thus far, we have only touched on document modeling, mind. There are a bunch of other options as well. We’ll start from the simplest option, attachments. At its core, an attachment is just that. A way to attach some binary data to the document. As simple as it sounds, it has some profound implications from a modeling point of view. The need to store binary data somewhere isn’t new, obviously and there have been numerous ways to resolve it. In a relational database, a varbinary(max) column is used. In a document database, I’ve seen the raw binary data stored directly in the document (either as raw binary data or as BASE64 encoded value). In most cases, this isn’t a really good idea. It blow up the size of the document (and the table) and complicate the management of the data. Storing the data on the file system lead to a different set of problems, coordinating transactions between the database and the file system, organizing the data, securing paths such as “../../etc/passwd”, backups and restore, and many more.

Attachments

These are all things that you want your database to handle for you. At the same time, binary data is related to but not part of the document. For those reasons, we use the attachment model in RavenDB. This is meant to be viewed just like attachments in email. The binary data is not stored inside the document, but it is strongly related to it. Common use cases for attachments include profile picture for a user’s document, the signatures on a lease document, the excel spreadsheet with details about a loan for a payment plan document or the associated pictures from a home inspection report. A document can have any number of attachments, and an attachment can be of any size. This give you a lot of freedom to attach (pun intended) additional data to your documents without any hassle. Like documents, attachments also work in multi master mode and will be replicated across the cluster with the document.

Counters

While attachments can be any raw binary data and has only a name (and optional mime type) for structure, counters are far more strictly defined. A counter is… well, a counter. It count things. On the most basic level, it is just a named 64 bits integer that is associated with a document. And like attachments, a document may have any number of such counters. But why is it important to have a 64 bits integer attached to the document? How could something so small can be important enough that we would need a whole new concept for it? After all, couldn’t we just store the same counter more simply as a property inside the document?

To understand why RavenDB have counters, we need to understand what they aren’t. The are related to the document, but not of the document. That means that an update to the counter is not going to modify the document as a whole. This, in turn, means that operations on the counters can be very cheap, regardless of how many counters you have in a document or how often you modify the counter. Having the counter separate from the document allow us to do several important things:

Cheap updates
Distributed modifications

In a multi master cluster, if any node can accept any write, you need to be aware of conflicts, when two updates to the same value were made on two disjoint servers. In the case of documents, RavenDB detect and resolve it according to the pre-defined policy. In the case of counters, there is no such need. A counter in RavenDB is stored using a CRDT. This is a format that allow us to handle concurrent modifications to the same value without losing data or expensive locks. This make counter suitable for for values that changes often. Good examples of counters is tracking views on a page or an ad, you can distribute the operations on a number of servers and still reach the correct final tally. This works both for increment and decrement, obviously.

Given that counters are basically just map<string, int64> you might expect that there isn’t any modeling work to be done here, right? But it turns out that there is actually quite a bit that can be done even with that simple an interface. For example, when tracking views on a page or downloads for a particular package, I’m interested not only in the total number of downloads, but also in the downloads per day. In such a case, whenever we want to note another download, we’ll increment both the counter for overall download and another counter for downloads for that particular day. In other words, the name of the counter can hold meaningful information.

Key/Value

So far, all the data we have talked about were stored and accessed in a multi-master manner. In other words, we could chose any node in the cluster and make a write to it and it would be accepted. Data that is modified on multiple nodes at the same time would either be merged (counters), stored (attachments) or resolved (documents). This is great when you care about the overall availability of your system, we are always accepting writes and always proceed forward. But that isn’t always the case, there are situations where you might need to have a higher degree of consistency in your operations. For example, if you are selling a fixed number of items, you want to be sure that two buyers hitting “Purchase” at the same time don’t cause you problems just because their requests used different database servers.

In order to handle this situation, RavenDB offers the Cmp Xcng model. This is a cluster wide key/value store for your database, it allows you to store named values (integer, strings or JSON objects) in a consistent manner. This feature allow you to ensure consistent behavior for high value data. In fact, you can combine this feature with cluster wide transactions.

Cluster wide transactions allow you to combine operations on documents with Cmp Xcng ops to create a single consistent transaction. This mode enable you to perform conditional operations to modify your documents based on the globally consistent Cmp Xcng values.

A good example for Cmp Xcng values include pessimistic locks and their owners, to generate a cluster wide lock that is guaranteed to be consistent and safe regardless of what is going on with the cluster. Other examples can be to store global configuration for your system in all the nodes in the cluster.

Graph Queries

Graph data stores are used to hold data about nodes and the edges between them. They are a great tool to handle tasks such as social network, data mining and finding patterns in large datasets. RavenDB, as of release 4.2, has support for graph queries, but it does so in a novel manner. A node in RavenDB is a document, quite naturally, but unlike other features, such as attachments and counters, edges don’t have separate physical existence. Instead, RavenDB is able to use the document structure itself to infer the edges between documents. This dynamic nature means that when the time comes to apply graph queries on top of your existing database, you don’t have to do a lot of prep work. You can start issuing graph queries directly, and RavenDB will work behind the scenes to make sure that all the data is found, and quickly, too.

The ability to perform graph queries on your existing document structure is a powerful one, but it doesn’t alleviate the need to model your data properly to best take advantage of this. So what does this mean, modeling your data to be usable both in a document form and for graph operations? Usually, when you need to model your data in a graph manner, you think mostly in terms of the connection between the nodes.

One way of looking at graph modeling in RavenDB is to be explicit about the edges, but I find this awkward and limiting. It is usually better to express the domain model naturally and allow the edges to pop up from the underlying data as you work with it. Edges in RavenDB are properties (or nested objects) that contain a reference to another document. If the edge in a nested object, then all the properties of the object are also the properties on the edge and can be filtered upon.

For best results, you want to model your edge properties as a single nested object that can be referred to explicitly. This is already a best practice when modeling your data, for better cohesiveness, but graph queries make this a requirement.

Unlike other graph databases, RavenDB isn’t limited to just graph representation. A graph queries in RavenDB is able to utilize the full power of RavenDB queries, which means that you can start your graph operation with a spatial query and then proceed to the rest of the graph pattern match. You should aim to do most of the work in the preparatory queries and not spend most of the time in graph operations.

A common example of graph operation is fraud detection, with graph queries to detect multiple orders made using many different credit card for the same address. Instead of trying to do the matches using just graph operations, we can define a source query on a map/reduce index that would aggregate all the results for orders on the same address. This would dramatically cut down on the amount of work that the database is required to do to answer your queries.

Revisions

The final topic that I want to discuss is this (already very long) post is the notion of Revisions. RavenDB allow the database administrator to define a revisions policy, in which case RavenDB will maintain, automatically and transparently, an immutable log of all changes on documents. This means that you have a built-in audit trail ready for use if you need to. Beyond just having an audit trail, revisions are also very important feature in several key capabilities in RavenDB.

For example, using revisions, you can get the tuple of (previous, current) versions of any change made in the database using subscriptions. This allow you to define some pretty interesting backend processes, which have full visibility to all the changes that happen to the document over time. This can be very interesting in regression analysis, applying business rules and seeing how the data changes over time.

Summary

I tried to keep this post at a high level and not get bogged down in the details. I’m probably going to have a few more posts about modeling in general and I would appreciate any feedback you may have or any questions you can raise.

Oren Eini

Oren Eini

CEO of RavenDB

Using TLS with RustAuthentication

Technical presentation delivery notes

Using TLS with Rust–Getting OpenSSL to work

Data modeling with indexesPredicting the future

Using TLS with RustPart III–Will native tls do the trick?

Data modeling with indexesBusiness rules

Implementing Phantom Reference in C#

Data modeling with indexesIntroduction

Using TLS with RustPart II - Client authentication

Modeling data using a multi model database

Attachments

Counters

Key/Value

Graph Queries

Revisions

Summary

FUTURE POSTS

RECENT SERIES

RECENT COMMENTS

Syndication

Main feed
Comments feed