GWT / GAE development blog Articles about GWT/GAE and the development of TeamScape: A sports team portal

Datastore frameworks: The interview

Posted on March 27, 2010

In this article we will have a look at three of the low-level wrapper frameworks for the AppEngine datastore. It's an interview-style post where the authors of each framework answers a number of questions related to the AppEngine datastore and their frameworks. There will also be a follow-up article where I solve some typical datastore scenarios with each of the frameworks and write about my experience with that.

Background

Anyone who has tried to use JDO/JPA on AppEngine will probably agree that it is too complex and has a steep learning curve. Offering a JDO/JPA solution is kind of like saying "Hey. We have a standardized API so you don't have to worry about the implementation details of the underlying datastore.". The problem is that in the case of AppEngine, there is also a fine print saying "But you should also be aware of these limitations, these GAE extensions and...etc...etc...".

This means that you MUST worry about the implementation details anyway, since the datastore <-> JDO/JPA mapping is far from optimal. If you don't know about the datastore limitations and think you can use JDO/JPA like you would on a relational datastore, you are in for a bumpy ride. And just to make this clear, once again, the root cause of this problem is not related to JDO/JPA or the DataNucleus implementation in general.

The low-level API is on the other end of the scale. It has an extremply simple API, but it's not type safe and only works with native entities, which requires lots of tedious work from the developer. What should a developer do?

The low-level wrapper frameworks to the rescue

I'm not the only one who thinks that the JDO/JPA solution is too complex and makes development harder than it has to be. There are, to my knowledge, 6 different attempts at offering an alternative, which is not a coincidence. These 6 are Objectify, Twig, SimpleDS, Siena, Slim3 and cloud2b. Update: A 7th framework called jiql which is a SQL/JDBC interface on top of AppEngine.

I have chosen to look at three of these; Objectify, Twig and SimpleDS. There is a good reason to why I chose those three. They are designed specifically for the AppEngine datastore with its features and limitations in mind. The other three frameworks are designed with other goals in mind.

Siena and cloud2b has multi-platform support and, like JDO/JPA, has a generic API to support this. Slim3 is a full-stack MVC framework that goes outside the scope of a thin low-level datastore wrapper. Update: Apparently Slim3 can be used as a standalone datastore framework, which wasn't clear from the documentation at the time. The documentation has now been updated to describe this.

I'm sure that these frameworks are great and probably good choices for a multi-platform strategy. Or, in the case of Slim3, a complete server-side framework. But I'm not at all interested in that. I want to use a simple and efficient datastore framework that mainly acts as a convenience layer on top of the AppEngine low-level API. Objectify, Twig and SimpleDS fulfills this requirement. Read this thread for some debating on this matter and why I don't want to use SQL/JDBC interfaces for my project.

The interview

Ok. Let's get on with the juicy stuff. First, let me introduce the authors. (Lots of credit and thanks to the authors for this!)

Jeff Schnitzer, lead developer Objectify.

John Patterson, lead developer Twig.

Ignacio Coloma, lead developer SimpleDS.

The original plan was to wrap this article up with a nice debate, but one of the GAE/J threads turned into a fantastic debate that covers many different aspects of datastore philosophies and designs. The authors would need to repeat themselves here if we were to keep the debate, so I decided to skip it and refer to the debate thread instead. In one of the interview sections there are links to various stuff you should have a look at.

The interview has been split into 4 subpages to make it more lucid.

Table of content

AppEngine datastore

The frameworks

Feature checklist

Summary

Next: AppEngine datastore

  • Share/Bookmark

objectify evaluated

Posted on March 3, 2010

In this post we'll evaluate objectify with regards to performance and complexity compared to our current JDO solution.

Objectify is basically a framework wrapping the GAE datastore low-level API. There are other frameworks for this (see further down), but objectify seems to be the most comprehensive one with support for transactions, caching, cursors etc. It also provides a very clean and simple API using generics and annotations.

There is no need for me to describe objectify further since the author has done an excellent job documenting it on the wiki. I have used objectify 2.0.2 for this evaluation.

So why are we evaluating JDO alternatives?

  • The JDO GAE implementation is quite complex and hard to grasp for beginners, partly due to unsupported features, workarounds and extensions.
  • JDO is designed to work "everywhere", which means it is highly abstracted and very general in its nature. Great for portability, but it also means that it might not be a perfect match for a specific engine like GAE with its quite unique datastore.
  • The Datanucleus/JDO/JPA jars are quite big (~2.3 mb in total), while the low-level frameworks are in the kb range.
  • JDO/JPA *might* perform class path scanning at startup to find and register your entities, which could add to the load time. As Jeff points out in a comment below, the time spent would be proportional to the number of classes in your project.

Do note that JDO as a specification and as an implementation by DataNucleus does not necessarily have these drawbacks. I'm only discussing JDO as implemented on GAE in this post.

What about JPA? Well, JPA is designed for RDBMS, which the GAE datastore is not. I guess the datastore supports it mainly because there is a number of frameworks that uses JPA. For apps like TeamScape where don't use any server side framework, it makes no sense to use JPA. Google doesn't exactly seem to encourage developers to use JPA for the datastore either.

Why not use the low-level API directly in that case? The objectify introduction answers this question very well.

What about other low-level wrappers, like SimpleDS or Twig? Basically, I heard about objectify first ;-) . There is an interesting objectify-SimpleDS discussion here and an interesting objectify-Twig discussion here. I considered making this post a JDO/objectify/SimpleDS/Twig comparison, but I just don't have the time or energy for that. I do however make comments about them here and there. But check out each framework and compare for yourself. If you find something useful to discuss, feel free to post a comment in this post. My initial impression is that objectify is the most mature of the three frameworks with regards to development cycles, releases and documentation. Update: I just found two more frameworks for this, Siena and Slim3. And one more, cloud2db.

Why use objectify? Answering that is what this post is about. If the evaluation shows that objectify offers an edge over JDO, we will probably switch.

Before I get started, I really need to point out that I'm in no way an expert on databases, JDO or the GAE datastore. I do however consider this as an advantage when writing blog posts like these, since I don't take knowledge and details of these things for granted. But let me know if anything is unclear so I can clarify it. Also, if you think something I write seems completely crazy, let me know as well. I refer to JDO and objectify as "frameworks" in this post, although this might not be an accurate term for them.

Ok, let's move on.

I will compare objectify and JDO (and might make remarks on the other frameworks to) with regards to:

  • API design/complexity and how it relates to the GAE Datastore implementation
  • Load time (GAE cold start time)
  • CRUD + Q performance (create, read, update, delete, query)
  • Portability

Framework/API design and complexity

Let's start by having a look at the two frameworks from a developer perspective. I had never used GAE or JDO until a couple of months ago, so I can evaluate both JDO and objectify from a beginners perspective. Well, I did have a fair amount of knowledge on the GAE datastore when I got started with objectify, but that doesn't matter much for this comparison.

JDO@GAE is actually really well documented if you combine the GAE docs/articles, Datanucleus docs and the GAE Java forum. But for a beginner, it is a LOT of information to take in before you have a decent picture on the work flow. Once you feel comfortable with JDO, you have to dig into the DataNucleus/GAE extensions and how GAE features such as transactions, owned/unowned relationships, entity groups, joins, keys, etc plays with JDO. And once you think you are ready and start testing it, you will bump into lots of problems along the way that is caused by DataNucleus/GAE implementation details that you were not aware of. For me it was a lot of write - test - fail - rewrite cycles before I had something up and running.

I first heard about objectify in the GAE forum and decided to have a look at it after reading David Chandler's interesting posts on it. When I evaluate third party frameworks, one of the most important factors for me is good documentation (which loooots of frameworks lack). Anyways, my point is that I was very impressed by the objectify wiki. It has an excellent introduction and lots of examples, howtos, best practices, etc. It answered most of my questions before I even thought of asking them...! I checked out the documentation for SimpleDS and Twig as well. SimpleDS has decent documentation, but Twig has some work to do to reach up to the same level as objectify.

After reading through the whole wiki and checking out the JavaDoc, I setup a separate track using objectify instead of JDO for my app. It took me about 30 minutes to setup objectify, define an entity and write a Datastore implementation using the common objectify operations. And it worked on the first attempt. Do note that I only use very trivial datastore operations at this point (saveModel, getModel, getModelByQuery and deleteModel). The objectify API is very straightforward and it's easy to understand the purpose of each class/method.

So let's just do a quick comparison of the JDO API and the objectify API. We'll compare the main interfaces a developer is faced with. Remember here that JDO is designed to work everywhere and is very generalized because of this. It can never be as straightforward and simple as a framework targeted specifically for one engine, like objectify is for GAE in this case.

The factory wrappers: It is common to create a PMF class if you use JDO where you have a PersistenceManagerFactory singleton. This is used to retrieve PersistenceManager instances. Objectify has a corresponding class called ObjectifyService that wraps a ObjectifyFactory singleton. The main difference is that you can use ObjectifyService.begin() to retreive a Objectify instance directly, whereas in the JDO case you do something like: PMF.get().getPersistenceManager(). Slightly easier in the objectify case since ObjectifyService provides some convenience functions and you don't have to create a PMF corresponding class yourself.

The factory: The factories contains similar functionality, although the objectify one is clearly more light-weight and easier to understand.

The primary interface: PersistenceManager vs Objectify. This is where objectify really shines in simplicity compared to JDO. Objectify contains ~15 very straightforward methods, while PersistenceManager has like ~50-60 methods. It is much easier to get a quick overview of objectify.

I will not discuss the other classes, but if you open up each JavaDoc, you can see that JDO is a pretty large library while objectify is pretty slimmed.

Let's have a look at how some of the datastore implementation details are handled in the frameworks.

Entity relationships: The frameworks differ a bit here. JDO supports what we call owned relationships by having a direct reference to another entity. This is not supported by objectify. Objectify does however support parent/child relationships using keys and annotations. While you can configure JDO to automatically fetch all children (fetch groups) when you fetch the parent, it seems you have to manually handle this with objectify. You can of course use unowned relationships by keeping another entities string key/long id in a field with both frameworks. Both frameworks also support embedded entities.

JDO does more work for you automatically in the case of owned relationships, but it is also much more complex and more error-prone. Also note that owned relationships is direcly related to entity groups and transactions. If you don't need to perform atomic operations on a set of entities, you should generally NOT use owned relationships because of the limitations entity groups results in. It is prettty straighforward to use unowned relationships in both frameworks.

Side note: One of the main differences between objectify and Twig is related to this. Twig does support owned relationships (Java style). The authors discuss this at length in the thread I referred to in the beginning of the post.

Transactions: From our perspective, the transaction handling is basically identical in JDO and objectify. Easy to use in both cases.

Entity refactoring: If you decide to refactor your entities, you can easily get into trouble in a live app if you're not careful. I know that JDO has support for schema mapping / metadata that can be used in these cases, but it is not really mentioned anywhere in the GAE docs and this thread seems to suggest that it is not implemented in GAE yet. Objectify has some basic support to handle this using annotations.

Caching: Objectify integrates very nicely with GAE memcache. All you have to do is to add @Cached annotation to your entity, and objectify takes care of the rest. Very neat. JDO has some caching support as well that apparently uses memcache, as mentioned by Erik in a comment below. See this page.

Usage with GWT: You probably want to seamlessly pass your entities over the wire to and from GWT without using DTOs. First, your entities and all its fields needs to be serializable. Secondly, your entity code must not refer to any server specific libraries (for instance, you can't use the GAE Keys directly). Third, you must keep your entities in a package known to GWT (in our example, the shared package). Fourth, the datastore framework must accept "detached" entities to be saved directly to the datastore. For JDO, you have to define your entities as detachable and then detach them via PersistenceManager before you return them to GWT. This basically works out of the box with objectify as long as you follow the rules above. Very nice!

Dependencies: JDO with all its dependencies is huge. It's something like ~2 mb in total. Objectify on the other hand has no external dependencies and the jar is ~100 kb.

Another thing that makes sense to discuss is how well JDO integrates with the GAE datastore. The GAE JDO implementation has lots of unsupported JDO features and a bunch of extensions to make it work properly in the GAE environment. The GAE datastore is quite unique in many ways, while JDO is a standardized API. Because of this, there will always be some unsupported stuff, some necessary workarounds and some extensions needed for frameworks like JDO/JPA. Frameworks like objectify that is targeted specifically for the GAE datastore doesn't have this problem. The API is basically an abstraction layer above the low-level API, so the API/method/features design makes more sense for a developer trying to relate to the GAE datastore.

Cold start time

Let's move on and talk a bit about the GAE cold start issue and startup time.

This issue has been under heavy debate lately, which is understandable considering the time it takes to cold start a JVM instance. Please read this appengine blog post, this performance faq, this forum post and this forum post to gain an understanding of the issue.

In TeamScape, I have seen load times from 5 seconds all the way up to 13 seconds. This is simply not acceptable.

Since I deploy new TeamScape versions regularly and we're developing the app, it will basically cold start every time someone access it. We basically need at least one visitor per minute to keep the instance running, and I don't think this is a likely scenario for TeamScape even when its live. This is quite frustrating, so we need to do everything we can to improve the performance. I have enabled the pre-compilation (which is default now btw), but it is still way too slow. The best way to improve this is to remove any library we don't need in WEB-INF/lib and try to minimize the work done when our servlets are initalized.

One of the reasons I looked forward to evaluating objectify was this forum post where the poster reported very interesting performance improvements by switching from JDO to objectify. JDO/JPA does something called class path scanning to automatically find and register your entities. This is done each time a new JVM instance if spawned, so it adds significant time to your load time. The objectify authors seems to have taken an active decision not to support this, which means that you have to manually register your entities. However, it also means that the cold start time for an objectify solution should be much faster than a JDO/JPA solution since classpath scanning can be avoided. Note: The text above is not entirely correct. Read the comments below.

To test this, I logged 10 cold start requests using the JDO solution. Then I removed all JDO code and dependencies, removed all DataNucleus/JDO/JPA related jars from the libs directory, turned off the enhancer, integrated objectify and deployed a new version. I logged 10 more cold start requests after that.

I'm going to report 3 values for each framework: shortest, longest and average. This should give a fairly good overview of the results.

JDO Objectify
Shortest 5.6 s 4.2 s
Longest 11.7 s 11.5 s
Average 7.8 s 7.2 s

There seems to be a slight improvement with objectify, but it is honestly very hard to tell since the range of values I got varied from 4-5 to 11 seconds in both cases. I really don't understand how the load time can vary that much. I have read all the documentation, faqs and threads I can find on this matter, and I still don't quite understand why the hell it takes up to 11 seconds to load an app like TeamScape. We only have one servlet that in my latest deployed version only does some guice stuff. I haven't measured the load time without Guice, so not sure how much it adds to the total load time. I give up on this for now, or this post will never be finished. But trust me, we will come back to this later when we look at performance and optimizations.

CRUD performance

Alright, time to look into datastore operations performance.

I have compared JDO and objectify performance for the typical cases of saving an entity, reading an entity, updating an entity, deleting an entity and querying for set of entities. Below you can find the code that I used for testing each of the operations on both JDO and objectify. The listed code is what differed between the frameworks. I have omitted common code such as try-catch, error handling etc. Note that each operation is implemented in its own method. I have just collected the code in one place here for convenience.

JDO operations

// Create/Update
  PersistenceManager pm = PMF.get().getPersistenceManager();
  Model savedModel = pm.makePersistent(model);

// Read
  PersistenceManager pm = PMF.get().getPersistenceManager();
  Model model = (Model) pm.getObjectById(className, encodedKey);
  model = pm.detachCopy(model);

// Delete
  PersistenceManager pm = PMF.get().getPersistenceManager();
  pm.deletePersistent(model);

// Query
  PersistenceManager pm = PMF.get().getPersistenceManager();
  Query query = pm.newQuery(queryString);
  List<Model> results = (List<Model>) query.executeWithArray(parameterValues);
  results = (List<Model>) pm.detachCopyAll(results);

objectify operations

// Create/Update
  Objectify ofy = ObjectifyService.begin();
  Key<? extends Model> key = ofy.put(model);

// Read
  Objectify ofy = ObjectifyService.begin();
  Model model = ofy.get(className, id);

// Delete
  Objectify ofy = ObjectifyService.begin();
  ofy.delete(model);

// Query
  Objectify ofy = ObjectifyService.begin();
  Query<? extends Model> q = ofy.query(className).filter(fieldName, fieldValue);
  List<Model> models = new ArrayList<Model>();
  for (Model m : q) {
     models.add(m);
  }

Each operation was run 50 times, and I ran the whole test scenario at 5 different occasions for each framework. This means that each operation was run 250 times in total for each framework. The datastore was of course cleared between each test run for a fair comparison. All the tests were carried out completely on server side, so no network traffic had any impact on the results. The reason I chose 50 iterations was simply because a larger number resulted in a DeadlineException since the request lasted more than 30 seconds. In order to increase the number of test runs (and possibly the accuracy of the results), each test operation would need to be handled in a separate request, which I did not have time for.

Here are the average run times for each operation. Unlike the cold start measurements, the values here were quite consistent over the different test runs, so it makes sense to present the results as an average value.

JDO Objectify
Create 57 ms 55 ms
Read 25 ms 20 ms
Update 56 ms 68 ms
Delete 71 ms 52 ms
Query 32 ms 26 ms

I didn't expect these figures to differ much since most of the time is spent in the datastore for these operations. Objectify seems to be slightly faster on read/query operations, which is what matters most. I was a bit surprised that the objectify update operation was so much slower than JDO, but perhaps there's an explanation for that. Delete is also quite faster with objectify, although that is probably the least time critical operation.

Portability

Let's just briefly discuss this. JDO is designed with portability in mind, and objectify is designed to run on GAE, so it's probably safe to say that a JDO solution is more portable by definition. However, this is not the whole truth as far as I'm concerned. The DataNucleus/GAE JDO implementation has lots of extensions and unsupported features that requires workarounds in the GAE environment. This means that any somewhat complex application probably requires some adaptation effort to run in another JDO compliant environment. The effort needed is of course related to the complexity of the database model.

Conclusion

Time to conclude what we have discussed. These observations are of course related to the GAE environment in specific, and not in general.

Documentation: Both frameworks are well documented on how they relate to GAE. It's a tie.

Features: Both frameworks has nice feature sets and most/all of the datastore features are exposed in both cases. A tie, I would presume.

Simplicity: objectify. The API, documentation and usage is much more straightforward with objectify.

Getting it up and running: objectify is a clear winner. A GAE newbie could probably get objectify up and running in a matter of hours. Not so much with JDO, no.

Cold start performance: Inconclusive. But an objectify solution can never be slower than a JDO solution, so a theoretical win for objectify :-) .

CRUD performance: Slightly in favor of objectify due to better read/query performance.

Portability: JDO. Obviously...

To sum up, if you are getting started with GAE and looking into a datastore solution, you should definitely consider objectify. The only reason to choose JDO on GAE over objectify is if you:

  • Want your app to be engine independent. JDO is the way to go for portability reasons.
  • Have an existing app that already uses JDO and you want to port it to GAE
  • Plan an application that has a very complex database model.
  • Like torturing yourself

TeamScape has a fairly simple database model and is targeted for GAE, so I have decided to make the switch. The main factors for this is simplicity and performance (which, at least in theory, outperforms that of JDO).

I will look into SimpleDS/Twig a bit more, but unless I find some extremly useful feature there that objectify is lacking, we will probably integrate objectify next.

http://sites.google.com/site/slim3appengine/
  • Share/Bookmark