« JavaEE 5 injection issues | Main | Move over XML config files, roll on YAML »

October 19, 2005

Design: Making WebSphere components for WAS Z/OS

Control Region and Servant Region architecture

WebSphere Z/OS has an unusual architecture where basically they make a virtual JVM out of multiple JVM processes. There is a controller JVM called a Control Region (CR) and one or more Servant Regions (SR).

The CR is basically the only public interface to this collection of JVMs. All incoming work flows through the CR and the CR moves the work to one of the servants for actual processing. Thus the CR is like a router/dispatcher. The CR is also priviledged and is the only JVM allowed to accept incoming connections from the outside world. Servants are not allowed to listen for any incoming connections, thats the Control Regions job.

This makes the servants anonymous and invisible to the outside world as all communications are with the Control region and the idea is that the Z/OS operating system varies the number of servants based on load. Low load, one servant. High load then multiple servants are started automatically.

Issues from a middleware perspective

When we make components that run on a traditional application server, i.e. a Windows JVM then this isn't typically the way you think. You want to listen on a socket then you do it. The clients attach directly to your JVM and then the threads receiving the work hand the work to a thread pool where finally the work is dispatched to the application.

If this code is to work on WebSphere Z/OS then it's not so easy. You need to design your component into two pieces. The first piece runs in the control region and it listens on the socket or what ever protocol brings in the request. The other piece runs in the servant and forwards any received work to the application etc. The CR piece needs to be designed so that multiple servants can be present and so that they can be started and stopped as needed by Z/OS based on load. Servants should be stateless as a result as they can be started/stopped continually. If you need in memory state for your component then this state really needs to be kept only in the piece in the CR given the SRs can be cycled by Z/OS.

The way to think about this is as follows maybe. The servants are application code containers with thread pools and contain the actual applications. They are ready for the controller to send work their way. The controllers job is to receive incoming tasks and then send them for execute to one of more servants (think servant = isolated threadpool). So, logically, to someone outside, the combination of control region and servants looks like a single logical JVM. Except, this JVM can scale to large memory as well as large numbers of threads and the JVM can contain a failure to a servant with the problem.

So, we need to take care when designing the middleware so that when WebSphere runs on Z/OS then the code we wrote can be split like this easily otherwise, it could be a rewrite to try make it work and there is a big push to make sure that the same code works on both conventional WebSphere as well as WebSphere Z/OS.

Summary

Why is it like this? It is a clever way to make a very scalable server and isolate application code from the 'server' mechanics. It allows a single logical JVM to have very large heaps and when GC occurs in a servant it has no impact on the other servants. If a servant has an issue then it can be stopped and route routed to the other servants. The controller also has a much larger thread pool than normal in that its the sum of all the thread pools in the servant. It also helped work around limitations in terms of heap size and number of threads per JVM on Z/OS.

I guess whats odd with it is the SR's don't start/stop immediately. The SR's take time to start up just like on conventional WebSphere. SR's also take time to stop. This adds load to the box while one is starting/stopping and adds latency in that it takes some time (maybe a couple of minutes) before when Z/OS decides it needs more servants and the servant actually becoming ready for work.

The controller could be thought of as a single point of failure and bottleneck. But, there is no application code running in the controller so in theory it is supposed to be very reliable but if the controller fails then all servants are killed also. It's not a bottleneck because the link between the CR and the SRs is basically a queue in shared memory so it's very fast. Forwarding the requests to the shared memory is thus very efficient.

The work load is throttled by Z/OS starting and stopping servants which listen to that queue. More servants means more throughput until the box saturates. If multiple applications are assigned to the logical JVM then Z/OS can vary the ratio of servants receiving work for application A versus application B to make sure service level agreements are met for both applications.

WebSphere XD has a similar architecture

WebSphere XD has a similar architecture to this. We have an on demand router (ODR) sitting in front of the node group. It can be thought of as a control region. The servants are JVMs hosting a single application that can be started and stopped on any node in the node group. All incoming work goes to the on demand router. There is a controller that monitors the response times as measured by the ODR and it also monitors the CPU burn on the nodes.

The controller uses this information coupled with the response times specified by the customer to vary the amount of JVMs running each application in the node group. This is very similar to the Z/OS architecture. The ODR/controller can monitor individual servant JVMs and cycle them every X requests or when a response time exceeds X seconds or when the heap is above 85% for 5 minutes (a probably memory leak).

I guess the Z/OS guys had this architecture for quick some time now and with XD, a very similar approach is making its way on to distributed platforms.

October 19, 2005 | Permalink

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/3412870

Listed below are links to weblogs that reference Design: Making WebSphere components for WAS Z/OS:

» Christmas Gift Ideas from Christmas Gifts Directory
Christmas Gift Manufacturer Directory. Christmas gift shopping ideas - shop for Christmas gifts in US, Canada... [Read More]

Tracked on Dec 15, 2005 8:20:25 AM

Comments

Hello,
I want to know how can I run java code in CR and how SR can acces to CR JVM Objects?
Regards

Posted by: noureddine | Dec 13, 2005 8:08:58 AM

Post a comment