« The trouble with source code and licenses | Main | Very large heaps and caches »

February 14, 2007

Database invalidation and caches

I get asked a lot how does ObjectGrid remove data that has been updated in a database, the assumption being that the application is storing database data.

The thing is, ObjectGrid like any other cache has no idea where the data in the cache came from. The only thing that knows the source of the data is the application or an object relational mapper (if the cache is plugged in to one) that put the data in the cache. This means the question that needs to be asked is the following:

Does my object relational mapper support cache invalidatation when the database changes?

This is saying does the OR mapper like Hibernate or JPA have a way to look for database changes and then tell the cache through an SPI to remove those entries. The OR mapper has the meta data and connections to the database to do this job. The cache has no idea about how to connect to the database as well as which tables/columns are in what maps in the cache. It cannot do this job automatically.

Can my application remove invalidated data from the cache?

If an application is using the cache directly then it is the applications responsibility to remove data from the cache when the data changes on the backend. Again, the cache has no idea, meta data etc to do this.

Can I remove or invalidate data in the cache?

Absolutely, APIs are provided allowing an application or OR mapper bridge to easily remove entries from the cache.

Can the cache keep itself in sync?

The answer is yes. Here, the question is can changes made to a cache cause other cache peers to invalidate or update them selves. Yes is the answer.

Can a client/server cache do self consistency?

Yes, a client can have a local copy of server side cached data and it can use optimistic locking or multi-version locking to ensure that if changes are made, they will only commit if the server side data didn't change. A client can also use pessimistic locking and no local cache for higher consistency and server side locking. It's very much like a normal database in this mode.


So, to summarize, caches normally do not have meta data describing the source of the data being cached in it. This means it can't check the backend looking for entries to invalidate. Even if it did have meta data, would that really be desirable? Do you want the meta data stored in two places, i.e. the OR mapper AND the cache. The cache could use the meta data from the OR mapper, i.e. the JPA stuff if the JPA implementation provided this information to the cache as well as a db connection factory AND credentials to login as well as hints on the database vendor so that optimized routines can be used.

The bottom line is that caches absolutely give applications to the ability to remove data from the cache. But, the task of keeping it in sync with the database or another backend falls to the application or an OR mapper implementation that obtained the data and stored it in the cache.

February 14, 2007 | Permalink


This was a really good post. This is the first time I've read your blog, too! Someone brought up a good question the other night about design patterns for accessing a persistence layer that I explained adequately, but I wish I had addressed some of the things you just mentioned. I sort of addressed them, just not as clearly. I guess I learned my lesson!

Posted by: John "Z-Bo" Zabroski | Feb 21, 2007 4:03:44 PM

Post a comment