April 06, 2006
Categorizing grids over coffee
We were talking this over today at an informal design meeting, i.e. a coffee break. There are so many flavors of grid. Here are the ones we're identified over coffee today. We characterized them on what moves to what? Does code move to data or does data move to code. Does it run in a managed data center or on unused MIPs on the interest or corporate LAN.
Internet Grid: Send Data to code.
This seems to be the typical internet grid workload. An agent runs as a screen saver, pulls the algorithm of the month to the desktop occasionally and it then pulls work destined for that algorithm from a central server and then spends hours processing it, protein folding, AIDs molecules etc, sends the answer back and then repeat. The workers don't use JDBC connections or similar to get the data. All the data is contained in the job data in a single packet. It's common to see these types of grid organized in to pipelines, each grids output feeding the input of the next grid in the pipeline. One stage in such a pipeline may be an instance of a DataGrid.
IBM's community grid product is an example of this.
This is very similar to the above except it's running in a datacenter, possibly on a virtualized set of resources managed according to service level policies defined by administrators. These code chunks can use JDBC or access a DataGrid because it's running in a corporate data center and typically firewalls etc don't have to be negociated. This means the input messages for each job can be smaller as implied data can be fetched from the source. We're assuming this doesn't overload the source, for example, a database. This could easily happen if enough nodes are running in the grid. A datasource like a DataGrid is much more suited than a database to this as it can be scaled easier and cheaper than a database can.
WebSphere XD's business grid is an example of this.
Data Grids: Jobs with large datasets, send work to data
Here, we've preloaded a large amount of data in to a partitioned cluster of JVMs. We have some code to run against all or a subset of this data so we send logic to the servers with the data, run the logic against the local slice in each server and then aggregate the results on the submitting system. The dataset may be static or changing in realtime. The various ObjectGrid primary servers may be subscribed to feeds updating the data constantly. For example, a pricing grid in an investment bank would be consuming market data to keep it's dataset up to the second.
ObjectGrid is an example of this.
Nothing earth shattering here, I'm just getting tired of describing a DataGrid and then being told that's not a grid etc... I guess my objective is to make ObjectGrid as easy to use even on large grids and be as easy to use as a storage server like my NetWork Appliance filter which is about as close to a toaster in terms of ease of use as I've ever seen.
April 6, 2006 | Permalink