Design engineers soon will need to bridge the growing gap between hardware reality and software capabilities in the high-performance computing (HPC) realm as the use of multicore microprocessors grows. If your software development or sourcing plans haven't anticipated these development situations, your applications may have a shorter life than you had planned.

The 2006 version of technical computing "reality" is an inexpensive dual-core processor from AMD or Intel on a desktop system, or a dual- or quad-core RISC processor from Sun or IBM running on a server. In 2007, we should expect to see inexpensive quad-core processors from AMD and Intel, and processors with up to eight or more cores in 2008. These small symmetric multiprocessing (SMP) systems will be a far cry from the proprietary $500,000+ SMP systems of a few years ago. This technology transition has big implications for the "democratization" of computing power. On the horizon are four- to eight-core systems that cost only a few thousand dollars and sit on the desk of every design engineer.

Multicore Optimization

This illustration represents standard iterative development cycles used by the Numerical Algorithms Group to insure software robustness and portability, and extend the lifetime of the application to include unreleased multicore processor systems.

In stark contrast to the increasing availability of higher performance hardware is a lack of commercially available HPC applications that can take full advantage of the hardware. Why? The business model for HPC-specific application software has all but evaporated in the last decade. A recent study by the U.S. Council on Competitiveness comments that independent software vendors find that, "... (given the) relatively low revenue from the high-end part of the technical computing pyramid—the return on investment for developing highly scalable codes for HPC users usually does not justify the expenditures or risks." This means that design engineers and others relying on commercially created technical computing applications likely will face challenges exploiting more than a fraction of the problem-solving power in the readily accessible processors clearly on the horizon.

For those who develop and maintain their own applications, the answer is somewhat more complicated. If you plan to use one of the new multicore architectures, substituting calls to serial components with calls to SMP component libraries can provide dramatic improvements in performance, and performance will increase commensurately with the increase in the number of cores available over the next two years. Using OpenMP or Posix threading directives within the application can also provide additional performance, but requires greater expertise.

Porting and Accuracy

If your computing environment is a cluster of single- or dual-core systems interconnected with a high-bandwidth network, the principle option is to look for relevant component libraries that utilize message passing to achieve performance that scales with the number of systems in the cluster. Alternatively, you can use one of the message passing tools such as MPICH and redesign the most computationally intensive parts of the code to scale with the number of processors. This course potentially may provide the largest performance improvement but it is also the most complex and will not be effective in some applications. Eventually, we may see tools that permit hybrid use of Shared Memory (SMP) and Distributed Memory (DMP) Parallelism, but such complexity is practical only for a few today. (Shared-memory parallel computing often is referred to as the new defining acronym (SMP), previously referred to as symmetric multiprocessing.)

Whatever your approach to harnessing the horsepower of new SMP and/or DMP systems, there is a high probability that the applications you use today will be ported one or more times to various combinations new chip architectures, operating systems, compilers, and vendor performance libraries. As an organization, the Numerical Algorithms Group (NAG) has been developing and porting component libraries to new platforms for over 35 years and can say with confidence that porting code is rarely simple. Even without any changes to application logic, the aforementioned changes can introduce errors into the ported application.

Whether your code is commercial, open-source, or internally developed, you should realistically assess the challenge of designing it for long life. It's not enough to have fast software. Speed without sufficient accuracy is a "non-starter," or as NAG developers often put it, "How fast do you want the wrong answer?"

Designing for SMP

While there are a number of things you can do to extend the life of an application, the following are a few that are particularly appropriate to computationally intensive HPC applications.

For the computational "core" of the application, pay extra attention to the underlying algorithms used in code you write or acquire from others. Often there is more than one algorithm that can solve a particular problem and some will be faster than others, although a method that is computationally faster might not be very robust in handling extreme cases of data or ill-posed problems. A method that breaks easily is likely to be frustrating to users and potentially more troublesome when the application is ported to a new platform. The best advice is to err on the side of robustness since new machines will continue to make code faster.

Next, avoid proprietary language extensions. Rather, write the algorithmic code under standards that emphasize portability. The work you save now by adopting proprietary extensions may be swamped by the effort of rewriting code for a new platform that doesn't support the language extension. Research and use automated tools for software development that check for common errors (e.g., validating argument lists, finding memory "leaks," etc.) and enforce standards-compliant coding. A good standards-compliant compiler with various checking features can save significant work when the next platform comes along. Don't skimp on quality assurance, though it may be tempting to do so. An independent peer review of core code, interface, and a proofreading of documentation must be done to confirm that the developer adhered to coding standards, ran required tools, and properly documented the code.

Write test programs that exercise all error exits and code paths for a broad range of problems, including edge cases where key algorithms begin to fail. In our experience, these test programs often are longer than the code they test, but the payback is portable code that inspires user confidence.

Sound exhausting? If your goal is application longevity, there really aren't any shortcuts. Code lifecycle costs are kept in check precisely by this upfront attention to debugging and quality checks. Whether your solutions come from an in-house developer, an open source project, published sources, or supported commercial libraries, they either will have gone through the above described rigors or send you back to the drawing board when porting to the HPC platforms on the horizon.

This article was written by Rob Meyer, CEO, at Numerical Algorithms Group, Downers Grove, IL. For more information, contact Mr. Meyer at This email address is being protected from spambots. You need JavaScript enabled to view it., or visit .

Embedded Technology Magazine

This article first appeared in the December, 2006 issue of Embedded Technology Magazine.

Read more articles from the archives here.