« Offtopic: Liking my MacBook a lot | Main | Speaking at The Spring Experience 2007 »

July 12, 2007

ObjectGrid 6.1 is out the door

WebSphere XD 6.1 went GA on April 28th. ObjectGrid was extensively reworked for the V6.1 release. It still can be used with or without WebSphere but requires a 1.4.2 JVM or better to run it, Java 5 or 6 is fine.
Feature wise, lets look at what we added. I'll give a quick overview in terms of deployment and programming model.

Deployment and topology management
We support very large grids of JVMs with the V6.1 release and automatically balance the partition primaries and replicas across the current set of JVMs. If a JVM fails then we promote replicas on survivors to be primaries and repair any lost replicas by adding new ones if possible on surviving JVMs. This means scaling out the grid is simply a matter of starting additional JVMs and data is automatically migrated to them once they register with the catalog service. We scale in automatically also if JVMs are stopped/killed.
We don't use multicast so all communication is done using point to point communication. The catalog service is a replicated service running on at least two machines. It acts as a fault tolerant controller for the grid. The catalog service uses well known endpoints so that clients and new JVMs can find it and hence we can avoid multicast. These endpoints are the only static end points in the component.
All JVMs holding data automatically find unused ports for their endpoints and then the catalog service allows peers/clients to find each others ports. This avoids the need to preconfigure individual JVMs with ports and allows multiple JVMs to be easily started on a single machine without worrying about port conflicts. A big driver for these changes was to cut TCO by eliminating per JVM configuration and to automatically organize large grids even on complex topologies.
Clients bootstrap to the catalog service and then interact with the grid JVMs directly afterwards. The catalog service isn't a factor in the runtime performance of the grid and shouldn't be thought off as a bottleneck etc.

This update allows scaling to very large (thousands) numbers of JVMs across subnets or even building without flooding multicast packets everywhere. We simply require that every member of all subnets is addressable from the others.

Programming model: EntityManager
We added a POJO style programming model called EntityManager on top of the existing Map APIs. This lets you define a schema for whats stored in the grid and then different applications using the same or different Java objects can share the data in the grid. This provides for a significantly simplified programming model when dealing with graphs of objects as compared with a Map based API. We handle navigating relationships etc on behalf of the application. The application basically looks up the root of a graph using a finder method and then just walks around the object graph. When the application is done, we detect the changes and write them back to the grid. It's a little slower than using the Map APIs directly, of course, but it's a lot easier to program to. We can use annotations or xml files to specify how the mapping of data to POJOs is done.

Programming Model: DataGrid Services
We added agents that can be deployed in the grid and then invoked either for a particular key or key subset or on all partitions by clients. These act like stored procedures that can run at in memory speeds against their local data partitions. These allow a grid to search or process the data within the grid at grid speeds. A 100Gb of data stored in 100JVMs running on 100 4 core blades can now use all 400 cores to search/prcoess the 100Gb of data in parallel whilst doing so as local memory speeds. The results of the processing can be returned to the clients.

Programming model: Query support
We added in support for a EJBQL like query language which allows the data within a partition to be easily searched. Indexing of the data by attributes is supported.

Programming model: Continuous Streaming SQL support
We allow continuous queries to be defined to process all changes to data in the grid over a given time window. This allows top 50 type lists to be calculated in real time as the data changes.

Programming model: Client side replicated data
Clients can request that an asynchronous replica of grid side data be pushed to that client. This allows clients to locally store copies of data on the grid and have that data updated in real time as the grid side data changes.

Policy based replication
We don't force you to always use synchronous replication. We let you tell us how much replication you want. You can have 2 synch replicas for each primary or none. You can have a single sync, a single async or a single sync and single async at the same time. We can also do topology aware replication to avoid placing primaries and replicas on machines with common failure modes.

Easy deployment within WebSphere J2EE
Whilst, we remain fully functional with or without WebSphere. We do need to make it work well with WebSphere also. J2EE applications can make use of ObjectGrid very easily by just including some xml files in a folder within a war or ejb module and ObjectGrid on detecting the xml files will automatically start the services for that module and stop then when the module is stopped. This makes it very easy for WebSphere application server customers to use ObjectGrids within their clusters.

Wiki based documentation
We decided to put the programming and administration guides in to a wiki rather than the info center. This allows us to keep it updated and receive comments quickly and is more in line with industry directions for documentation.

To finish
We did an incredible amount of work for V6.1 and I'm happy to see it out the door. It allows customers to deploy coherent caches very easily and if required run parallel business logic colocated with the data stored in the grid. Customers can choose the quality of replication which allows them to control their performance/cost/quality trade offs very closely. You can download a fully functional standalone ObjectGrid from this link: http://www.ibm.com/developerworks/downloads/ws/wsdg/learn.html This trial version allows you to try it out using J2SE or your favorite application server.

The wiki is located at this link: http://www-03.ibm.com/developerworks/wikis/display/objectgrid/Getting+started

I'll cover some common scenarios and solutions in upcoming blog posts.

July 12, 2007 | Permalink

Comments

How does it compare to Tangosol Coherence?

Thanks

Posted by: | Jul 23, 2007 1:21:47 PM

It's very similar. I think our replication technology is ahead, we offer synch and async replication depending on the customers need rather than just synch but I'm hoping someone from Tangosol sees this as I am legally not allowed to look at their product so it's hard for me to be sure what I'm saying is correct. We seem to be lower cost also. Performance wise, I don't know, previous tests showed us being faster sometimes and they were quicker some times also depending on the scenario. We're competitive is probably the best way of putting it.

Posted by: Billy | Jul 23, 2007 1:29:17 PM

Post a comment