May 18, 2007
Infiniband on Linux, not having much luck
We've been trying to get infiniband running on some Linux boxes but aren't having that much luck with stability. I have a blade chassis of JS21 blades running AIX 5.3 with infiniband on order so I'll be giving up on Linux for now and continue doing some performance work on the JS21 kit when it arrives. IBM ships the boxes with infiniband built in and AIX has GA quality drivers for it so testing should be smoother than on Linux. I also have another chassis coming with 10GB ethernet so we'll get some interesting comparisons between 10Gbe and IB.
We'll be trying a couple of things with ObjectGrid on infiniband using TCP over IB (SDP) emulation initially to see if we can improve on the 1GB ethernet latencies we've seen and then look into using Infiniband APIs directly. I'm hoping to take around 150-200us off the sync replication times which will give us a significant percentage improvement from the current already fast numbers. Low latency is critical for many front office applications in Investment banks and other latency sensitive environments.
If we can have a single vendor solution with hardware, ib, operating system, jdk and middleware then we should have a pretty stable setup for customers to use rather than mix and match multi-vendor approach. That said, whenever we get Linux working, we will, of course, try that also.
May 18, 2007 | Permalink
Which Linux? Which IB package? What stability issues do you see? We run SUSE, RH, Windows all day long using OFED 1.1, IBGD or OFED 1.2. Works like a champ!
I would be very suspicious of proprietary IB stacks now that OFED is stable.
In regards to performance, you will see a slight uptick with IPoIB over straight 1GB ethernet but the real win will come with loading the SDP module. For true speed get out of IP and move calls to RDMA or OS Bypass. The Ethernet/IP interrupt/context switches will never scale for large volume high speed messaging.
In addition, since you are riding the IB wave throw on a few IB SRP storage devices and see up to 600GB/sec disk access. That will get the data moving.
Lastly, 10GB Ethernet will give you some boost but by why settle for 10GB when DDR IB gives you 20GB and 40GB is on the horizon?
Posted by: Kevin Moran | May 19, 2007 1:53:19 PM
Not sure exactly what level of the SW we had, but we were running SLES 10 x86_64 right off the dvd and had issues with the build that was provided to us WRT the configuration of all things. Seemed that it wasn't creating the routed correctly, and we couldn't configure it to use SDP only connections. We've since been provided a fix but haven't had a chance to try it.
I was also told that the updated kernel is needed for SLES 10 to get expected performance. The GA kernel is only good for ~4.5Gb/s. We'll be upgrading that too to quantify the difference.
Posted by: Rob Wisniewski | Jun 4, 2007 1:34:59 PM