Friday, April 25, 2008

Are people afraid of group communication or fault tolerance is not important nowadays?

March was a hell of a month and April started exactly as March: almost 7 per 7 and 12 hours per day on a replication plug-in for MySQL. I hope that this hard work ends up with a prize: a trip to California for the MySQL Users Conference and enough euros to buy me a boat. Despite this strenuous but delightful effort, my friend Eduardo has kept asking me to write something in the blog. I still can hear his words: "We should write frequently... Luis have written something, I have... So, it is your turn". So, I took my spear time from 3 a.m from 7 a.m to start writing something...

In fact, I have already started writing three different posts but I have not had time to finish none. One of them is a joint work with Luis, and the subject is quite good. Wait and see. But this post, I started writing from scratch and is about something that is bothering me for a while and today the issue was raised again: "Why people don't use the main concepts on group communication?"

Roughly, a group communication toolkit provides a set of primitives to send messages to and receive from a group of peers (i.e. hosts) and manages which peers are in the group, thus providing information when a member joins or leaves the group spontaneously or due to a crash.

Let us however reduce the scope of this question as I don't have much time: "Why people don't take into account consensus and in particular group membership algorithms when they design fault tolerant applications?" Don't think I am lousy writer... I don't know... Maybe I am... But as this is my spear time, I am writing and at the same time drinking a bottle of wine and I am getting more and more relaxed...

Of course, the distributed system community uses group communication and knows that consensus is a fundamental problem. I am asking about other communities such as those involved in building database and parallel system. Such communities usually try to come up with fault tolerant applications or high availability solutions that don't properly take into account group membership, for instance. Don't understand this statement as a personal opinion. I've been seeing different cases. When these communities need functionalities that might be provided by a group communication infra-structure, such as group membership, they develop their own stuff without taking into account important concepts such as consensus. Unfortunately, not doing that means bugs that are hard to trace as there are several corner cases (e.g. failures while dealing with previous failures) that must be properly handled.

Most likely, this is time for an explanation on consensus and group membership. Why group membership is important for fault tolerant applications?

The group membership monitors which members are active in a group and designing it in a naive way may have unfortunate consequences.

For instance, build a group membership service solely based on a heart beat approach that waits for periodic messages from peers is not a good idea as a burst in the network may lead peers to think that others are dead. Of course, a "smart developer" has already thought about that and before assuming that a pear is dead he/she would try to contact it again. But for how long should he/she keep trying to? Does he/she should wait for a TCP-IP timeout connection before giving up ? Most likely this is not a good idea because it may take hours regarding the type of failure. Due to congestion problems a peer may think that one is dead and the others may not. In this case, the same "smart developer" may come up with a voting protocol that collects information from other peers before kicking one out. This may work in most cases but suppose that during the voting process the pear that was responsible for collecting votes and deciding fails? And now?

To properly circumvent these corn cases, one will eventually end up with a group membership solution which needs a consensus protocol to achieve decisions on which peers are operational. Any naive attempt to circumvent these problems may end up with a case that was not taking into account and therefore may generate unfortunate consequences: bugs and bugs and bugs that are hard to trace.

Does this happen because the distributed system community is not good enough to disseminate what their know? Does this happen because other communities are only looking at their belies? Does this happen because group communication is still complex after all these years? Yeah, group communication is not a simple subject but neither b-trees and nobody consider developing a database (not a main-memory or in-memory database) without taking into account b-trees and their complex algorithms for concurrency and recovery.

Most likely, such theories are not applied in practice because there is no group communication toolkit ready to be deployed. I agree that are none but at the same time I believe that different applications require group communication protocols tailored for their needs and in my opinion developing a group communication toolkit is simpler that developing a full-fledged b-tree implementation. At least this is my biased opinion.

So I have this question if group communication is so great why other communities don't give enough attention to this subject? Please, give me answers.


Ronaldo said...

Well, I don`t have all that knowledge in GC, but, as far as I`m concerned there are plenty of GC Subsystems in the market from the old Isis to Horus and BCG, JavaGroup, Eva, Adam, etc... That treats this problems. They use all that kind of cool things you just said, and they are ready! So, I think the DB community doesn't give a damn about this stuff because they doesn't need to. They just have to take this algoritms and implement them into their code. Or just use old good black box reuse, and call the libaries that implement that stuff. I think that's the reason.

Alfrânio Júnior said...

First all, there are a lot of prototypes out there. But they are just prototypes as so they are not ready for production. They all suck my friend. They do all the cool stuffs but they do not do anything well. Most likely there are reliable implementations inside the industry in some products such as storages but they are not open source or public available.

And the database and parallel system community need these kind of things. No matter how smart you are, you will end up building a subset of group communication infra-structure to circumvent the sort of problems that I mentioned.

Thus as they are not group communication toolkits ready to be used is fair that each community build their own stuff. Yeah, this is true.

But the problem relies on the fact that they try to build something that does not take into account what were learned by the distributed system community and their solutions have several of flaws.

Ronaldo said...

OK, agree with you, they all suck. I may have a tool to help solve this problem. I designed a GC Framework, and implemented it. It can realy helps to test diferent GC algoritms, including consensus algoritms. So, if anyone wants the code, just say so and I'll mail it. I have docs of it too, don't know if it can helps, but never too much to try ;)

Alfrânio Júnior said...

Put it in the sourceforge and wait to see what happens.

Good Luck.

Luís Soares said...

Hmm... in people minds, GC is right next to "Pandora Box" expression or the term "Mysticism". This is mostly about FUD playing tricks. I know little about GC usage in the enterprise world, I believe that at least Isis, or some variant, is still in production (correct me if I am wrong). The general feeling is that GC is still too complex after all these years, and there is not a broad-range production-ready toolkit available. As such, investors get reluctant investing in some GC product, and people who need it, will design their own "GC-less" system, because (i) they don't want to open "Pandora's Box" and (ii) the existing solutions are mostly academic and paper oriented, proof-of-concept toolkits. It is getting quite similar to a chicken-egg scenario. No one invests because all the FUD hovering might scare customers away, and no one buys, because there is no one selling a rock solid toolkit (and the ever needed support/customizations/know how assistance).

For a start, what I would really like to know is how much do the existing GC toolkits suck. Put numbers where the mouth is. I would love to read a non-biased survey on the existing toolkits, with some proper benchmarking. (Should you know any interesting one, please point it).

With respect to Ronaldo's framework, as Alfrânio has pointed, open it up. I would suggest to host it on launchpad, though...