March was a hell of a month and April started exactly as March: almost 7 per 7 and 12 hours per day on a replication plug-in for MySQL. I hope that this hard work ends up with a prize: a trip to California for the MySQL Users Conference and enough euros to buy me a boat. Despite this strenuous but delightful effort, my friend Eduardo has kept asking me to write something in the blog. I still can hear his words: "We should write frequently... Luis have written something, I have... So, it is your turn". So, I took my spear time from 3 a.m from 7 a.m to start writing something...
In fact, I have already started writing three different posts but I have not had time to finish none. One of them is a joint work with Luis, and the subject is quite good. Wait and see. But this post, I started writing from scratch and is about something that is bothering me for a while and today the issue was raised again: "Why people don't use the main concepts on group communication?"
Roughly, a group communication toolkit provides a set of primitives to send messages to and receive from a group of peers (i.e. hosts) and manages which peers are in the group, thus providing information when a member joins or leaves the group spontaneously or due to a crash.
Let us however reduce the scope of this question as I don't have much time: "Why people don't take into account consensus and in particular group membership algorithms when they design fault tolerant applications?" Don't think I am lousy writer... I don't know... Maybe I am... But as this is my spear time, I am writing and at the same time drinking a bottle of wine and I am getting more and more relaxed...
Of course, the distributed system community uses group communication and knows that consensus is a fundamental problem. I am asking about other communities such as those involved in building database and parallel system. Such communities usually try to come up with fault tolerant applications or high availability solutions that don't properly take into account group membership, for instance. Don't understand this statement as a personal opinion. I've been seeing different cases. When these communities need functionalities that might be provided by a group communication infra-structure, such as group membership, they develop their own stuff without taking into account important concepts such as consensus. Unfortunately, not doing that means bugs that are hard to trace as there are several corner cases (e.g. failures while dealing with previous failures) that must be properly handled.
Most likely, this is time for an explanation on consensus and group membership. Why group membership is important for fault tolerant applications?
The group membership monitors which members are active in a group and designing it in a naive way may have unfortunate consequences.
For instance, build a group membership service solely based on a heart beat approach that waits for periodic messages from peers is not a good idea as a burst in the network may lead peers to think that others are dead. Of course, a "smart developer" has already thought about that and before assuming that a pear is dead he/she would try to contact it again. But for how long should he/she keep trying to? Does he/she should wait for a TCP-IP timeout connection before giving up ? Most likely this is not a good idea because it may take hours regarding the type of failure. Due to congestion problems a peer may think that one is dead and the others may not. In this case, the same "smart developer" may come up with a voting protocol that collects information from other peers before kicking one out. This may work in most cases but suppose that during the voting process the pear that was responsible for collecting votes and deciding fails? And now?
To properly circumvent these corn cases, one will eventually end up with a group membership solution which needs a consensus protocol to achieve decisions on which peers are operational. Any naive attempt to circumvent these problems may end up with a case that was not taking into account and therefore may generate unfortunate consequences: bugs and bugs and bugs that are hard to trace.
Does this happen because the distributed system community is not good enough to disseminate what their know? Does this happen because other communities are only looking at their belies? Does this happen because group communication is still complex after all these years? Yeah, group communication is not a simple subject but neither b-trees and nobody consider developing a database (not a main-memory or in-memory database) without taking into account b-trees and their complex algorithms for concurrency and recovery.
Most likely, such theories are not applied in practice because there is no group communication toolkit ready to be deployed. I agree that are none but at the same time I believe that different applications require group communication protocols tailored for their needs and in my opinion developing a group communication toolkit is simpler that developing a full-fledged b-tree implementation. At least this is my biased opinion.
So I have this question if group communication is so great why other communities don't give enough attention to this subject? Please, give me answers.