Friday, March 28, 2008

“To the infinite and beyond”[1] or Who on Earth needs several cores at his desktop?

In 1965 Gordon Moore published his paper "Cramming more components onto integrated circuits" and became famous by what is called Moore’s law. Moore’s law states that the amount of transistors of a computer processor doubles every 2 years[2]. In this paper he predicts a whole revolution in computing and communication that would be caused by the integration, like the advent of home computers or cell phones (the picture above, taken from his paper, illustrates that):
"Integrated circuits will lead to such wonders as home computers—or - at least terminals connected to a central computer - —automatic controls for automobiles, and personal portable communications equipment".

Moore was very successful with his statement especially because the amount of transistors is also related to every other computer metric such as processors’ performance for example. If I had read this article in 1965 I would have thought mathematically and reasoned that this integration could not be sustainable for quite long. Moore also thought that and this prediction was just for the following 10 years, in other words, until 1975.

To understand Moore’s law implications all you need is to do the math. If it doubles every two years, in 10 years we would be able to attain 32 times our current “performance”, and in 20 years we would have a processor, believe it or not, more than 1,000 times better.

I say better because processors’ performance did grow proportionally with its integration not just because of faster clock cycles but mainly because we used the extra transistors in amazing architectural improvements on pipelining, super-scalars, branch prediction, and etc.

But what Moore did not imagine (or at least I believe he did not) is that his law would be still valid almost 40 years after Intel has launched the first world microprocessor in 1970.

I can imagine myself, in late 60's talking to some “mates”:

- What do you think of Gordon’s paper? – I would have asked

- Well, in 10 years we may reach 32 times more transistors. It seems feasible – somebody would have said.

- What if we can keep this for, let’s say, the next 40 years at least? What if computers could be really found at every desk? – I wish I would have said this :-)

- Wow –someone would say – but this would mean 1,000,000 times more transistors. And, of course, tons of computers.

- Imagine the performance growth, the memory growth, …

- But – there is always a skeptical – who on Earth would need that much performance at his desk?

So, from the time Moore enunciated his “law” up to today, microprocessor integration had indeed grown the amazing factor of 1,000,000 times as you can see in the graphic below. And we all want the new processors at our desktop and, believe it or not, we sometimes want more performance.

The fact is that computers got faster and applications got far more complex, not necessarily in this order. We use today the computer for applications that would have been unthinkable decades ago by the simple fact that they just could arise now, when we have the right conditions.

By now you might be asking yourself where I want to get. Well, my dear reader, for a series of reasons (power limitations, heat, …) processors industry decided to use the extra transistors to put more than one processor, or core, in a chip. Multi-cores are now the fashion of computer architecture and everybody is discussing about it. It is certainly a big turn of direction.

Cores per processor are growing (2, 4, 6, 8, 16 …) and advancing to the desktop. Some people then started asking for how long? How many cores can we use in the desktop? Who on Earth will need so many cores at the desk?

And this is exactly the point. Most of today’s applications are not really able to take advantage of several cores. Does this mean that we did reach the point we do not need to buy new computers? Will I be happy for several years with my 6 core computer? Will computer industry finally slow down on computer sells?

Some people think so but I disagree. I believe it is just a matter of time for current applications to start multi-threading everywhere and give some use for our cores. But this, in my opinion, will not be the reason we will buy next generations hardware. I believe a whole set of new applications will arise just because today we have all this processing power “for free”. Applications we may just imagine today and applications we can not yet think about. Really good speech transcriptors (I would not need to type this post), image recognition and photo/videos image searching. What about asking the computer for all the videos, pictures and documents where your child is present? What about the ones where the day was sunny? Your laptop will drive your car, based on the traffic info, while you are dictating a memo.

Either multi-cores will slow down computer industry and we will be satisfied for longer with our desktop computers or the race will keep going and new amazing hardware and software will emerge. The third possibility is the advent of a totally different model of computer business. Choose your future scenario and tell me your opinion.

I am not good at all in “futurology” but I believe cores are here to stay and programmers will make their way to use them giving us different types of functionalities we did not even know we wanted but we will not be able to live without. And this technology race will keep on going.

For how long? For at least a couple of decades when the business might change.

Where is the limit? “The infinite and beyond”.

[1] Buzz Lightyear in ToyStory

[2] Initially he said every year but then he backed up and stated it would double every couple of years.


Alfrânio Júnior said...

Very interesting post. I started reading at 4:00 a.m. and since then I cannot stop thinking on different applications... But before getting into applications I have some fundamental questions.

1 - It is a common sense that threads are bad. In the sense, that we should reduce the number of threads in a program. Thus, a good design put things in queues to exchange information among different parts of the program using one thread (or a limited number) per part. Any comments ?

2 - Putting things in a central computer does not circumvent scalability problems. Most likely in the future, current applications that are only feasible in clusters will perfectly run on a super-power multi-core. But at the same time, new applications will arise that need super-power multi-cores in a distributed environment. So, why not to invest in clusters, etc.. ?

3 - It is hard to program multi-threaded applications. Do you think that is it possible to efficiently exploit multi-cores ? I mean if I had a multi-threaded application and a machine with several cores, would it run faster ?

Don't get me wrong... I completely agree with you but I want to foster discussion. Most likely, these topics deserve their own posts, but feel free to answer the questions in here. Or if you want, we can write something together. I promise I will find time at the end of this week.

Cheers, Alfranio.

Edu said...

I do not have all the answers and indeed each of these questions could be exploited better in a post. But I will try to give some insights.

1) I don't agree that threads are bad. Why should I reduce the amount of threads? I have 100 different searches, why not starting up 100 threads? Because the services layers works badly? Am I consuming too much resources with that?

2) You are right on the scalability side. The post was about our single desktop computer with one processor (that might have multiple cores). For the big processing demand, clusters will still be the best alternative and I believe a mix between thread and message passing is the way. At what level is this mix done? Nowadays the programmer has all the work but we need to develop an "intelligent" and sensible middleware to do that. This is for me a very important research topic for the future.

3) What is harder: multi-thread or message passing? What is more natural? I think a multi-threaded would run faster and the main problem of programming threads are on the control of the shared memory areas (locks and etc). Maybe Transactional Memory will solve this problem simplifying programmers life.

The research path is quite open in this area and this is the good thing. Lots of challenges and opportunities.


Alfrânio Júnior said...

I completely agree with you on items 2 and 3. And I truly believe that transactional memory is the future.

Unfortunately, I don't buy the item 1.
To show my point of view, let us consider a common DBMS.
Firstly, when a request is received from a client, it is put into a queue and, only when there is an idle thread, this request is processed. In other words, the number of threads is not exactly equal to the number of concurrent clients. This is done to avoid increasing the number of threads and as consequence memory usage and context switching. Secondly, when a thread is about to access a device, the request is delegated to a coordinator which reads or writes on behalf of several other threads. This solution is used in order to avoid blocking threads per I/O request thus increasing CPU usage and in some extent parallelism.

Both designs are applied on Web Servers and file systems to cite a few. And basically, they show the concern on using threads and at the same time that it is necessary more than threads to achieve scalability as promised by some applications.

Take a look at this link:

Edu said...

All right. I was thinking more on highly parallel applications in clustered systems.
Lots of threads do not make sense when competing to a same resource. For computing 1000 tasks in a single processor it will probably be worst if I launch 1000 threads. It is better to have one proccess with a queue of execution.
But if I have 10 processors then launching a certain amount a bit bigger than 10 might be wonderful.
I think the same works with the disk and any shared device.
This is the reason you may have a smaller pool of threads (dimensioned for the system depending on the amount of parallel-accessible resources) with a queue of petitions as the best design.
What if you have 64 cores and one specific for the OS? Could this kind of policy be generalized to the OS (or a middleware)?
Ok... ok... I might be getting out off the main topic.

Nice discussion anyway.


Alfrânio Júnior said...

Now I buy your idea...

There are researches on autonomic computing based on control theory that try to deal with such things. But again, this is out of the scope of this post. Let's stick to cores and related stuff.

Indeed a nice discussion. :)