Making Multicore Work With Minimal Pain Points

Two multicore benchmarks demonstrate that dividing the total work into multiple threads yields higher performance, but interdependencies between threads limit the speedup to less than the total capacity of each additional core, as predicted by Amdahl's law. Measurements report performance of the PERC Ultra SMP virtual machine running Java threads on Dell PowerEdge 1900, Dual Quad Core Xeon E5310, 1.6-GHz hardware.
Declaring a Java variable to have the volatile property has a similar effect. No instructions may be reordered around a fetch or store of a volatile variable. Further, all cached copies of non-local variables are refreshed from shared memory at the moment of a volatile read, and all cached values are committed to shared memory at the moment of each volatile write. Note that these built-in language features make it straightforward to develop portable and maintainable code that will run reliably on a variety of multiprocessor configurations. Given the semantic guarantees offered by these built-in syntaxes, it is important to use them judiciously.

By comparison, support for multiple threads with C and C++ was added after the original language definition. POSIX Libraries enable a new thread to start up and to enforce mutual exclusion locks with semaphores and other locking mechanisms. For variables declared with the volatile keyword, the compiler promises not to reorder assignments and fetches associated with other volatile variables. However, it does not guarantee the absence of reordering with respect to other variables that are not declared volatile. If developers want to enforce that the compiler does not reorder accesses to multiple related variables, they have to declare all of those variables to be volatile, meaning that every access to any of those variables will be more expensive than a normal access.

Another difficulty with C and C++ is that the language does not provide control over reordering of instructions and memory barriers with respect to invocation of semaphore operations and other POSIX services. Certain C compilers (e.g. Gnu gcc) provide special directives to enforce memory barriers, but these are non-standard and non-portable. A proposed C++0x revision of the C++ language provides new mechanisms to improve memory barrier abstractions in C++ code. Of course, existing C++ code has not been written to use this new standard, and it may be a while before C++ compilers are updated to support the new standard.

In summary, the greatest challenge of moving to multicore is structuring the workload to be efficiently divided between multiple independent threads. Once the workload is so partitioned, implementing the design is a matter of mapping the desired communication and coordination activities to the chosen programming language. Languages like Java provide multiprocessor programming notations that are portable and efficient. Legacy languages like C and C++ can support multiprocessor applications, but currently require the use of non-standard and non-portable compiler features.

This article was written by Kelvin Nilsen, CTO for Java at Atego Systems, San Diego, CA. For more information, visit http://info.hotims.com/28060-121.


The U.S. Government does not endorse any commercial product, process, or activity identified on this web site.