December 14, 2006
Compute Grid versus DataGrids
Compute grids are the typical grid computing task most people are familiar with so far. DataGrids are a relatively recent entry. Whats the difference between the two? Well, there are certainly many similarities. Both can use thousands of processors to perform a job but this is where the differences end. This isn't really one versus another, it's about which one is appropriate for the problem to be solved.
Compute grid characteristics
A compute job is typically stateless. The job is split in to many parallel steps which are dispatched to a set of machines for processing. The jobs produce some form out output that may be aggregated. Typically, the time taken to fetch the data needed by a step is a very small relative to how long the data takes to be 'processed'. The various internet grid computing initiatives are excellent examples of this type of job. A step can usually be placed on any server with access to any resources needed by the job, affinity isn't typically needed. The process running the job step may be launched for each step (better isolation) or shared between subsequent steps (limited value unless starting the process is very expensive, i.e. an application server).
A data grid is similar but here the computation cost relative to data fetch cost is different. Fetching the data takes longer than the time to process the entry. So, a datagrid is fast because it has the data already loaded in the memory of a grid of servers. These servers may use replication to make this data 'persistent'. The data is usually automatically partitioned and placed on the servers comprising the grid. As servers are added and removed, the data may be redistributed to balance the data over the available servers in the grid.
The data is partitioned across the servers. Each piece of data/partition is hosted by one or more servers. The job logic is available within the container hosting the data. When a client submits a job for a particular piece of data then the job can be routed to a server/container with the data needed for the job and the result returned. Another model is that the job may require processing data matching a query/filter/predicate. Here, the job is multicast to all partitions. Each partition runs the query, and processes the results and returns them to the client. Either way, the client receives a list of results for the set of keys requested. This job may be run in parallel on all machines hosting relevant partitions.
Another pattern is the aggregation or reduction pattern (popularized by google). Here, the job consists of selecting a subset or all of the data and applying a predicate to it to reduce it to a single result. An example might be, calculate the sum of all entries, find the minimum value, find the best matching hotel property or the best matching web page. Each partition can run the predicate, and apply the predicate to all matching data. The reduced results from each partition and then aggregated again to provide to grid-wide result and this is returned to the client.
Both patterns are useful and both can utilize the power of the grid to process the data at extremely high speeds.
Compute grid or data grid, it depends..
So which one do you use. Clearly it depends on the cost of accessing the data versus processing the data. Longer processing times relative to data fetch costs mean a compute grid is more appropriate and vice versa. While not first in these spaces, WebSphere XD is clearly aiming at supporting both types of grids in farms of thousands of servers. ObjectGrid supports and will support DataGrids and Business grid supports compute grid type work loads. XD will be able to manage both types of job along with OLTP workloads and balance assigning resources to the tasks based on business priorities.
December 14, 2006 | Permalink
Sounds very familiar :-)
Posted by: Cameron Purdy | Dec 16, 2006 7:57:28 PM