Ames Research Center
As Engineering Branch Chief for NASA’s Advanced Supercomputing Division, Bill Thigpen led the team that built and deployed the 10,240-processor Columbia supercomputer in just 120 days. Listed as one of the world’s fastest and most powerful supercomputers, Columbia is just part of the computing resources currently being managed by Mr. Thigpen.
NASA Tech Briefs: As chief of the engineering branch for NASA’s Advanced Supercomputing Division, what are your primary responsibilities?
Bill Thigpen: My primary responsibilities are to make sure that the systems are up and running, and that they’re being used effectively by the scientists and engineers at NASA.
NTB: One of your most significant achievements to date at NASA has been leading the team that built and deployed the Columbia supercomputer. Tell us about that project and some of the challenges you faced managing it.
Thigpen: Well, the first thing is that it was done in 120 days, which far exceeded any other system of its size as far as deployment time goes. It normally takes several years to put in a system this big. From the time that we gave the order to SGI (Silicon Graphics Inc.) until we were fully operational was 120 days. That was done on an operational floor, so we actually didn’t bring the existing systems down until we had enough of Columbia built to actually provide the users with more capability than they had prior to Columbia.
We say 120 days, but the first users actually went on the system in July – the first week of July – so the order went to SGI in the middle of June 2004, and by the first week of July we started putting users on the first nodes of Columbia that came in. Then we built the system as it went across.
As for challenges, there were several. One was keeping the floor operational while we were bringing in that much of a system. The system actually filled our entire floor. We were building this system basically from one side of the computer floor to the other side, and we were taking out all of the existing systems that were there. In one 10-day period we actually got nine 512-processor nodes in. Each of those nodes is more than 3 teraflops, which is more compute capability than a lot of high-end computing centers have. So, the electrical upgrades had to happen, all the networking had to happen, we had plumbing that went into some parts of the floor because of the 20 nodes that went in, 8 of those nodes needed liquid cooling. It was a very intense time.
Basically, we had a standup meeting every day to make sure that everything was on track and everything was going well.
NTB: What types of projects are typically run on Columbia?
Thigpen: Oh man! It’s a high-end computing resource for the agency overall, so all four mission directorates are running on this system. And the mission directorates are all doing different types of work on it.