« Silly ObjectGrid Java 6 vs Java 5 test | Main | Java 5 generic List strangeness »

December 24, 2006

DataGrid and NUMA/Bluegene

Multicore processors allow many CPU cores symmetric access to a single cache/memory space/bus. Some multicore designs such as AMDs current one uses two multicore CPUs with a fast link between them to allow chip A to access chip B's memory, i.e. it's asymmetric or NUMA (non uniform memory access). An application running on the AMD chip pays a penalty when chip A accesses memory on chip B albeit over a high speed link, it's still slower than the local memory.

DataGrid type architectures are basically very well suited to NUMA architectures. From a macro architecture level, Blade based systems are NUMA. We have a blade with CPU, network to other blades and local memory. Lots of them can be connected together using a network. Each blade can access data on other blades using the network but local memory is clearly fastest. Hence, DataGrid uses partitioned architectures. We have some data in memory and we want some CPU power and network to run the DataGrid agent and the application. A problem thats been mentioned is that network bandwidth can be a problem with DataGrids where the data is changed frequently. Usually changes are replicated to another agent and this needs network. Suppose one agent makes changes at 2k transactions per second and each transaction changes 20k of data. Thats 40MB/sec. That fits in a GB network connection. Now, this assumes a single agent per box. ObjectGrid can place multiple agents in a single JVM so we could have 2k x N transactions per second with enough CPU power. Network wise, a problem should start to become apparent. If each agent needs 40MB/sec then even with 2 agents per JVM/box, Gb ethernet runs out of oomph. This also ignores that these agents may be backups for other agents which means they are receiving 40MB/sec of data per agent. It also ignores the bandwidth for incoming client requests looking for data in the agent at a rate of 2k per second.

So, unless the data changes rarely, it should be clear that network wise, it's a bottleneck. Even with one agent per box which means it should be able to be a backup for one agent and be a primary for its own data. This requires 80MB/sec (using the above rate) of network bandwidth for full speed operation. Lets install these agents in a bunch of servers. Each box has a Gb ethernet connection and all boxes are plugged in to a switch. That switch needs to be a very powerful switch capable of 80MB/sec per port. That kind of switch is expensive. IBMs older blades had only 4Gb links between chassis where each chassis had 14 blades with 2 x 1Gb ethernet each. Clearly, the 4Gb/sec means that throughput will suffer as if all 14 ran at full speed, we'd need 14 x 80MB/sec or about 1.1GB/sec of bandwidth between two chassis.

That kind of network switch is expensive. This is only really an issue if the DataGrid is changing data many times per second. Clearly, multi-core based servers could have a big problem here as all that CPU power is bottlenecked by network bandwidth from the server to the network. Theoretically, the 8 core ship could do 8k tps but network wise, it'd never reach that rate in practise because of network bandwidth issues.

Enter BlueGene at the ultimate DataGrid system

I was talking with a buddy (Jim VO) who works on IBMs bluegene the other day in Best Buy, go figure. Bluegene turns out to be a pretty cool system for DataGrids. Each compute node in bluegene consists of 2 Power architecture cores and 512MB of memory (more memory per node is possible). These nodes are interconnected with an absolutely scalable network. This is a balanced design. Both cores working flat out are unlikely to overload the network connection to the rest of the system. The network bandwidth of a bluegene is extreme.

CPU power wise, it's clear that Bluegenes' processors are not that fast compared with Intel/AMD/Sun. But, it doesn't matter. DataGrid apps can only use as much CPU as the network connection to other CPUs allows. 32 cores is pointless unless there is a big scalable network pipe and switch connecting everything together. Applications that were mostly read only would run better so long as each chip had enough network for the requests. But, even here, it would be interesting bootstrapping and preloading data from somewhere in to it given the network issues.

So is ObjectGrid going to run on Bluegene? Not right now. A number of challenges need to be addressed first.

  1. Is there a market for it?
  2. Each compute node has a minimal operating system if it can even be called that. We'd need a JVM running on the node with a working network of some kind, not necessarily TCP but something like it. Once we had the JVM then we could run the OG code and application on the node. ObjectGrid could make the entire grid fault tolerant using it's self organising and self repair (through redundant replication).

This starts to look pretty interesting as a platform for running ObjectGrid or DataGrid type applications. Lets recap on why?

  1. Enough network to handle a node at full CPU speed. This is important as otherwise, the CPU power is wasted. High multi-core CPUs will likely have this issue unless they also incorporate a very high performance network interface.
  2. A network backplane that scales with the number of nodes attached. This is a weakness on blade based solutions usually.

Summary

So, is ObjectGrid going to run on Bluegene, I don't know, it'd be nice as it would make a great product to run on the hardware. The hardware uses a balanced, system optimised approach (enough CPU with enough memory and enough network bandwidth). Each BlueGene rack has 1024 nodes or 2k processors. You can add 64 racks together for 128k CPUs with 512MB * 1024 * 64 = 32Tb of memory which with replication would be 16Tb of usuable memory.

Costwise, I don't know how much one costs but a blade based system on this scale would not be cheap either. The network switches alone would likely dominate cost. Power wise and heat wise, the blade system may not even be feasible.

December 24, 2006 | Permalink

Comments

Post a comment