Parallel Universe: April 2008

Friday, April 25, 2008

MySQL Users Conference 2008

Yeah !!! We win a trip to California after working too hard on a replication plugin for MySQL.

It was a 30 hours journey to arrive in California. The goal was to attend the MySQL Users Conference and show ideas developed in the context of the GORDA Project.

Mission accomplished my friends.

Me and my Friend Luis met a lot of interesting people and had passioned discussions on replication. Unfortunately, there was no much time to attend talks even the one given by Werner Vogels. There were more than 2000 people thinking on MySQL, learning a little bit more on it, doing business, doing contacts and hiring people. I was not expecting such atmosphere: everyone was breathing MySQL. Even the competition was attend the conference. There were many guys from Microsoft, IBM and Oracle.

In our spare time, we went to San Francisco to see the Golden Gate and Alcatraz. While driving through the street of Santa Clara, Palo Alto, etc, etc we came across buildings from important enterprises such as Google, IBM, Yahoo, Oracle, Microsoft and from important universities such as Berkley and Standford.

Nice trip, but I am afraid that I got excited for driving a Mustang as I did not pay attention to the traffic lights. Most likely my next credit card bill will give me a stroke.

Cheers.

Are people afraid of group communication or fault tolerance is not important nowadays?

March was a hell of a month and April started exactly as March: almost 7 per 7 and 12 hours per day on a replication plug-in for MySQL. I hope that this hard work ends up with a prize: a trip to California for the MySQL Users Conference and enough euros to buy me a boat. Despite this strenuous but delightful effort, my friend Eduardo has kept asking me to write something in the blog. I still can hear his words: "We should write frequently... Luis have written something, I have... So, it is your turn". So, I took my spear time from 3 a.m from 7 a.m to start writing something...

In fact, I have already started writing three different posts but I have not had time to finish none. One of them is a joint work with Luis, and the subject is quite good. Wait and see. But this post, I started writing from scratch and is about something that is bothering me for a while and today the issue was raised again: "Why people don't use the main concepts on group communication?"

Roughly, a group communication toolkit provides a set of primitives to send messages to and receive from a group of peers (i.e. hosts) and manages which peers are in the group, thus providing information when a member joins or leaves the group spontaneously or due to a crash.

Let us however reduce the scope of this question as I don't have much time: "Why people don't take into account consensus and in particular group membership algorithms when they design fault tolerant applications?" Don't think I am lousy writer... I don't know... Maybe I am... But as this is my spear time, I am writing and at the same time drinking a bottle of wine and I am getting more and more relaxed...

Of course, the distributed system community uses group communication and knows that consensus is a fundamental problem. I am asking about other communities such as those involved in building database and parallel system. Such communities usually try to come up with fault tolerant applications or high availability solutions that don't properly take into account group membership, for instance. Don't understand this statement as a personal opinion. I've been seeing different cases. When these communities need functionalities that might be provided by a group communication infra-structure, such as group membership, they develop their own stuff without taking into account important concepts such as consensus. Unfortunately, not doing that means bugs that are hard to trace as there are several corner cases (e.g. failures while dealing with previous failures) that must be properly handled.

Most likely, this is time for an explanation on consensus and group membership. Why group membership is important for fault tolerant applications?

The group membership monitors which members are active in a group and designing it in a naive way may have unfortunate consequences.

For instance, build a group membership service solely based on a heart beat approach that waits for periodic messages from peers is not a good idea as a burst in the network may lead peers to think that others are dead. Of course, a "smart developer" has already thought about that and before assuming that a pear is dead he/she would try to contact it again. But for how long should he/she keep trying to? Does he/she should wait for a TCP-IP timeout connection before giving up ? Most likely this is not a good idea because it may take hours regarding the type of failure. Due to congestion problems a peer may think that one is dead and the others may not. In this case, the same "smart developer" may come up with a voting protocol that collects information from other peers before kicking one out. This may work in most cases but suppose that during the voting process the pear that was responsible for collecting votes and deciding fails? And now?

To properly circumvent these corn cases, one will eventually end up with a group membership solution which needs a consensus protocol to achieve decisions on which peers are operational. Any naive attempt to circumvent these problems may end up with a case that was not taking into account and therefore may generate unfortunate consequences: bugs and bugs and bugs that are hard to trace.

Does this happen because the distributed system community is not good enough to disseminate what their know? Does this happen because other communities are only looking at their belies? Does this happen because group communication is still complex after all these years? Yeah, group communication is not a simple subject but neither b-trees and nobody consider developing a database (not a main-memory or in-memory database) without taking into account b-trees and their complex algorithms for concurrency and recovery.

Most likely, such theories are not applied in practice because there is no group communication toolkit ready to be deployed. I agree that are none but at the same time I believe that different applications require group communication protocols tailored for their needs and in my opinion developing a group communication toolkit is simpler that developing a full-fledged b-tree implementation. At least this is my biased opinion.

So I have this question if group communication is so great why other communities don't give enough attention to this subject? Please, give me answers.

Wednesday, April 23, 2008

Anouncements: JOIN 2008 cancelled and talk at UAB

It is really a pity and unfortunate that the JOIN 2008 event was canceled. I was preparing a talk with the idea of having content and fun, something different than traditional so it would not be boring and, most of all, put students to think about and, if possible, discuss the topic.

I was having lots of fun preparing it. The idea was it to be presented like a fairy tail (suited to the title). The index was called "Once upon a time..." and the first theme was "Chapter One: When the princess became a frog" where I would talk about my first experience with software engineering in a huge software development where all the aspects of engineering were imposed by the contract and our consulting company had no engineering culture in the software field. Well, the result is that instead of being helped by the process I felt we would be far more productive without it (or with a different use of it). The software engineering princess (I know in the tale it is a prince but I rather prefer a princess myself) became a heavy frog :)

Anyway. The event as well as the talk are canceled. I wish the organization board more luck next time.

In the meantime I was invited for a talk at Universidad Autonoma of Barcelona about Post-Doc. The idea is to talk about my experience and how I see a post-doc. Title is:

Postdoc: useful, necessary ... or a waste of time?

It will take place in the Computer Architecture and Operating Systems department at UAB, May 9th at 12:30. More information here.

Friday, April 4, 2008

Anouncement: Talk at JOIN 2008

Hi there, I will be giving a talk at JOIN ("Jornadas de Informática") in Braga, Portugal on May 1st. If you are around and want to "JOIN", be my guest (don't forget to talk with the organization team before :-) ).

The talk will be on Software Engineering.

This will be a great challenge for me once I am not an expert in software engineering although I have been deploying software for at least 10 years now.

I decided to take the risk and talk about my personal experience. I think students have their professors to talk about theories but experience is different, especially in software engineering that is so "not" used everywhere.

The title of the talk is:

Werewolves, little red riding hood, software engineering and other fairy tales: my personal experience in 10 years at the “software development” field.

There it goes a short abstract.

Over the last 10 years I worked on the developing of database applications in an IT Consulting Company in Brazil; managing the development of an ERP system; warehouses and ETL systems in the IT department of a Communication Group in Brazil; deploying scientific applications for my PhD and post-doc in the different cultural environments of Spain and Germany. Nowadays I work at HP Labs doing research on the development of a full-system execution-driven simulator for massive clusters of massive cores. Where and how did software engineering interact with my personal work? How close is software engineering to the traditional engineering? Or is it a long and elaborate fairy tale? What is my personal opinion about it? What am I using now and where I would like to get? This talk is about a one-man experience and his believes of software engineering, computer systems and a bit more.

I hope it will be interesting and fun.

Parallel Universe