Why the Opteron still rocks

I am not impressed. The new Intel Nehalem technology was a milestone in chip technology. The quad core, eight threads, Core i7 920 at 2.66GHz has 94 GFlops using SSE3 streaming in the synthetic Sisoft Sandra Whetstone test. That is a hammer [Source] compared to a meager 11.4 GFlops with a dual CPU (single core) 2.8 GHz AMD Opteron 254. But looking at older programs, the Opteron is still OK, actually doing quite well.

In Sciencemark (which is preferring AMD a bit more) the Core i7 at 2.66 Ghz is scoring only little bit higher (total score 1591 blue) and the 5 year old Opteron 2.8 GHz is doing quite well (total score of 1573 RED). The current benchmark was run in a MS virtual machine (overhead 5-10%) and oher sources on the web cite a score between 1600 and 1700 for the Core i7 920 [WEB]. Well still not 3000 or 6000.

Core i7 920 at 2.66GHz
Opteron 254 at 2.80 Ghz

So whats the deal? Well first of all the Opteron 254 was released in 2005, thats almost 4 years ago and the Core i7 920 was introduced some months ago in 2009. Older applications are mostly single threaded and certainly do not use the 4 cores of the Core i7. So older applications will not run much faster, see above, GHz still count. In integer performance for example in SuperPi (mod1.5) the Core i7 2.66 Ghz runs the 1M Pi calculation in 15 seconds, the Opteron 254 at 2.8 Ghz needs 30 seconds.

The silly fact is that a lot of new software is still single threaded and many programmers do not spend a single thought on that, lets say only progressive programmers try to get into parallelization. But with the upcoming 6 and 8 core workstations and the future 80 CPU core (Intel Teraflop) or the GPU streaming cores with literally hundreds and thousands of cores, there needs to be a change.

Unfortunately also several hardware companies closed down, which wanted to bring personal cluster computing for the masses, including Orion Multisystems and SiCortex. So we have to hope that either Intel, AMD or IBM can glue more cores on a CPU and Microsoft, Intel and the open source compiler community can come up with better software solutions and compiler technologies for automatic threading and parallelization. Check out Dr. Dobbs journal which is on the forefront of reporting. To finalize it, I am not impressed with the parallelization capabilities of old software programs, they basically have none. Old applications should be run through an auto-parallelizing compiler and re-compiled with the new SSE2, SSE3 and SSE4 instruction sets.