Thursday, July 7, 2011

Concurrent Access to Models

Most people know that EMF models are inherently unsafe to access concurrently from multiple threads. It's immediately obvious when you look at the following code that has been generated with the standard JET templates.

Ed usually argues that it's the application's responsibility to control concurrent access to the model if it knows that multiple threads are involved. The application knows best how to do it efficiently for specific access patterns and ideally how to avoid deadlocks. Note that adding synchronized modifiers everywhere is counter productive. It wouldn't make the model completely thread safe and in addition unordered access would likely end up in deadlocks that are hard or impossible to resolve.

A model is basically nothing more than an object graph and how a particular thread navigates through such an object graph is highly specific to the particular application. As a result the most commonly implemented locking scope is the entire model. Only one thread at a time can access the model, all other threads must block on a single mutex:

The EMF Transaction project supports a protocol for clients to read and write EMF models on multiple threads but it has two major drawbacks:
  • It is very coarse grained because the locking scope is the entire model.
  • It is intrusive because each single access to the model must be wrapped.
A one-way road! What if we, instead of letting threads compete for the ability to access the model, hand a separate model copy to each thread. This is neither coarse grained nor intrusive because each thread can access all model elements at all times with normal application code; no wrapper commands are needed.

This approach obviously enables concurrent threads to access the (their) model at any time, but hey, isn't it extremely expensive to instanitate the entire model multiple times? Of course it is! So let's go further down this road and see what can be done to solve the foot print issues.

Let's assume that in the most common scenarios the models can be pretty big but a single transaction, i.e., the number of objects changed between two consecutive commits, is rather small. Then we could refactor our model classes to delegate all model state access to a new kind of entity that can now be shared among the model objects of all open transactions. Let's call these shared entities revisions and their managing container a session.

The model objects are now very cheap in terms of foot print because they only store a pointer to their current revision in addition to some general EMF infra structure such as the list of adapters. The revisions contain all the modeled state plus a version number (which is explained below).

Nice, now the model can be read by multiple threads without main memory being blown up. But with this design the original problem of concurrent write access is not addressed! The modifications that one thread applies to a model object end up in a shared revision, possibly overwriting changes made by other threads.

It's obvious that transaction scoped writes must not alter the shared state. So we refactor our model classes again so that the setters automatically create and link copies of the used shared revisions. Let's call them transactional revisions.

A simple implementation of a commit operation would execute these steps:
  1. The versions of all transactional revisions are checked against the versions of the current shared revisions to detect conflicting commits of other transactions.
  2. Move the transactional revisions into the session.
  3. Notify other transactions so that they can eventually adjust their revision pointers to the new shared revisions. Note that conflict potential in these other transactions can be detected early at this point in time!

That's it! Too simple?

It probably isn't that simple in many ways. But there's already a mature Eclipse technology available that cares for all of the aforementioned aspects and more.

Surprise, surprise, it's the CDO Model Repository, a highly efficient and scalable runtime platform for your models. The following code snippet illustrates how to use CDO to let 100 threads modify the same model:

You may have noticed that in the above example code the commit operation of a background thread can fail because the company object has just been modified by a different thread. With CDO you can easily implement a pessimistic locking strategy by acquiring a single explicit write lock on the company object. Alternatively you can register shipped or custom conflict resolvers with your transactions if you prefer to stay optimistic as long as possible.

Happy multi threading!