« Problems with open source versioning | Main | Synchronous replication edge cases »

July 25, 2006

10Gb ethernet and virtualization

(digg this)

I was reading InfoStor (V10. Issue 7) and Saqib Jang was saying that 10Gb will help virtualization take off. He talked about I/O virtualization but I don't think that the network was a problem going from a VMWare image to a hardware card in the same box. If it was somehow connecting over the network to it then cool, it'd be faster. Things like iSCSI can obviously gain a lot from having a 10Gbe backbone.

The area which it should help a lot of migrating virtual machines from one box to another. This doesn't work very well now. Imagine, you have a VMWare virtual machine using 8Gb of memory running on box A. Now we want to hot migrate it to box B.

  • Copy the processor state across
  • Copy the page tables for the image across
  • Set the page tables for the virtual image to fault to the virtualization manager for all pages.
  • Make sure the filesystems etc are available on the second box which may mean reassigning disks on a SAN.
  • Move all virtual devices across to target box, network cards etc.
  • Start a parallel stream to copy all 8Gb of memory to the target box and then as each page is copied across, update the page tables to return the newly copied page.
  • Start running the virtual machine on the target box.
  • If the target box hits a memory page that hasn't been copied yet (remember, we modded the page table to catch this) then copy it across and then resume running.

Clearly, while we can quickly get the virtual machine running on the target box, it'll be very slow if it hits pages that haven't been copied over. 8GB of memory takes a while to copy with GBe. Lets say we can get 70Mb/sec, thats still 114 seconds, nearly two minutes before the virtual machine is fully over on the target. 10Gbe lowers this down to 10-15 seconds so long as both sides can keep up. You better have a very good switch also as if not then this traffic will be murder on the network.

But, the bottom line is that with GBe or normal 1Gb ethernet, the migrated virtual machine performance will be woeful initially and gradually improve until the majority of the memory has been copied acrosss which may take well over a minute. In the meanwhile, the customers using this virtual machine will not be happy. If you're running a performance monitoring solution on the virtual machine during this time then expect plenty of alerts saying it's running very slowly. If you are running software in the image that heart beats, expect chaos as the timing will be all wrong. Products like Veritas cluster manager or Oracle RAC cluster manager or even WebSphere 6.0s HAManager will all be confused by the suddenly drop in performance by orders of magnitude as the memory is copied to the target while the target is running.

10GBe or 10Gbit ethernet can make a pretty big dent in this as it can potentially copy the image across in 10 or 20 seconds. Imagine larger images? A virtual machine running a database with a 32Gb image, it's going to take a while even with 10GBe.

So, while this virtual machine migration is cool, and while they claim very fast image restart times on the target, if you think about it then it takes considerably longer for this process to complete and until you get quite far long that path, the performance on the new image is going to be aweful. 10gb ethernet helps but as virtual machine memory sizes get larger and 64bit is going to encourage this then even 10GBe will feel the pain and we may see trunking happening with multiple 10GBe pipes moving the data if the two endpoints can keep up.

July 25, 2006 | Permalink



Posted by: Jonathon | Sep 11, 2006 6:25:09 PM

Post a comment