Monday, May 2, 2011

UltraESB implements clustering

You may already know from Asankha's blog or from the New item that we have released the 1.4.0 version of the UltraESB few hours ago, and I was waiting and waiting to write about this awesome clustering implementation that I have done for the UltraESB 1.4.0

The story begins with the decision to use ZooKeeper for the clustering implementation, which is what Asankha was initially suggesting me. I've done few feasibility studies and found it to be just right for the clustering implementation that I have been dreaming for years. Why I said I was dreaming is that, most of the ESB implementations out there have not identified the 2 main concerns of clustering;


  • Group coordination

  • State replication
to be different and they failed to keep those 2 concerns nicely separated. What I believe is that these 2 concerns of clustering needs to be managed separately and hence implemented with right amount of isolation yet integrating them together to work nicely. It is like spice or salt for food, you need that but only the just right amount, too less or too much gives you trouble. I have explained about this in more detail in my Clustering Part I article.

As explained in the above article UltraESB has cleanly separated the group coordination and cooperative control aspect of the clustering and that has been implemented with using ZooKeeper. On top of this we have written a command framework to command the complete cluster in one go, which enabled us to implement several set of cluster wide management controls.

First and the most important/useful cluster management command that is available on the UltraESB is the Round-Robin graceful restart of the nodes in the cluster. One of the major concerns on the modern ESB clusters was to managing the maintenance restarts without affecting the availability. I haven't come across any ESB out there including commercial once which provides a single control to do the maintenance restart with zero down-time of the service and zero message loss of the system. UltraESB guarantees this with the above operation. Restart being a graceful it guarantees that any node will not restart without serving all the messages it accepted, and the nodes in the cluster are being restarted in a round robin fashion makes sure that the system do not have a state where it doesn't have any live node to dispatch the message to, even if the cluster is just 2 nodes.

One key feature of the new 1.4.0 release is it's web based management console, which I will keep to a separate blog, but I had to mention it here to show you the one of 2 options available for you to try out this in action. So the first option is using the console Cluster Control Panel as shown below;

The other option is to use the jconsole. Which lead me to say a word on the improvements that we have done on the JMX management aspect of the ESB. Now we are using all new MXBeans and not MBeans, and that helped us a lot in improving the management. Will talk about all those on a separate article on the console and its implementation.

Apart from that this command framework enables you to turn on/off sequences, endpoints and proxy services in the complete cluster in one go. Not just that but you could write your own control commands to the UltraESB cluster too and use them in your solution to command the complete cluster.

If you are seeking to try it out the Clustered deployment of UltraESB and control commands is all what you need to go through. You may run the console using the uconsole.sh or uconsole.bat and see this in action in the console too.

That was about group coordination, and then I have evaluated several frameworks for implementing the state replication as that is a significant part of a good clustering implementation and many aspects like scalability and performance measures has to be taken into account when developing such a replication framework. We found the ehCache to be a good fitting distributed cache which can be used to implement state replication, but that used some kind of multi-casting based peer discovery to discover cache peers and we didn't want to rely on that as on cloud it is giving trouble and may not be scalable in a considerably big cluster. There we have used the right amount of salt and use the underlying ZooKeeper based coordination to discover the cache peers of distributed ehCache, thanks to ehCache it had a flexible API and the peer discovery was plugable. I am planning to write a complete article on this aspect of clustering later as the Clustering Part - II and once I am done with it, I will share the link as a comment too.

With that it brings me to the end of the story. Stay tuned to hear more about the UltraESB 1.4.0 and its newly born uconsole.