Notes from the meeting of SciComp, the SP Scientific User's Group
in BoF Session, at SC2000, on 8 Nov, 2000

as recorded by T. M. DeBoni, secretary.

Preliminaries:

Technical difficulties with projection and presentation equipment were ironed out.

Agenda Items

Review of attendee feedback from San Diego August 2000.

The following were the main feedback points:

  1. General happiness with it, and with Jay Boisseau for organizing it.
  2. More exercises were desired in tutorials; this implies some hands-on time and resources to support it. Discussion did not cover these points.
  3. More user applications and algorithms were desired in presentations.
  4. More AIX talks were desired, and it was also seen as desirable to continue the IBM presentations. I think all would agree to the latter point.
  5. There were good accommodations and food associated with the meeting. San Diego gets this praise.
  6. The meeting rooms were too warm; and, the meeting rooms were too cold. This is just the nature of things, I think.
    NOTE: The beach was nice, on the one afternoon I played hooky and went out there. Perhaps some talks could be held there in future meetings...)
  7. It was generally agreed that the IBM roadmap talk was the best. I think the IBM talks were all quite good.
  8. One user talk that was applauded had to do with early experiences on NAVOs 2 TFLOP machine. Several others were popular, as well.

SciComp Draft Bylaws

A set of articles were proposed based on the CUG bylaws, and discussion ensued. The outcomes are as follows:

The next US meeting will be in Knoxville, TN, at the Radisson Summit Hill, on October 9-12 2001.

SciComp 2000 technical concerns and IBM responses

Seven technical concerns were solicited from the attendees of the 2000 meeting. They were written up by the officers and formally sent to IBM officials David Turek, Peter Ungarro, and John Levesque.

The Concerns so documented were discussed, and the IBM responses solicited from J. Levesque were as follows:

  1. Running N user threads, processes, or tasks on N-CPU SMP nodes causes performance degradation and variability. This has been noted reliably and repeatedly, and documented in numerous contexts. This is considered crippling to the use of SP systems.

    Possible fix: bump up priority to starve system daemons.

    IBM Response: Bob Davis of IBM offers settings from a well-tuned system in New York. RAS daemons may be at fault. Time-of-day related changes of behavior also seem to occur.

  2. Cleaning up abnormally terminated jobs often does not happen completely and automatically - orphaned processes and shared memory segments are often left behind, which causes trouble for subsequent use of the nodes affected.

    Possible fix: job postscript and periodic daemons.

    IBM Response: IBM will advise on fixes. Official response is needed, as IBM sometimes disparages ad hoc fixes applied by user to such problems.

  3. DPCL supports dynamic instrumentation of parallel jobs, an idea many like. IBM initially claimed it would be open source. Lately, this claim has come to be doubted. Users want this to be the case.

    IBM Response: It will be open source, as was formally announced at the IBM SC2000 booth by Ted Hoover. This was applauded.

  4. Thread stack overflow errors are easy to introduce in correct programs and can depend on environment variable settings. You can jump across the read-only boundary-guard page and cause a mysterious segmentation fault somewhere else from resulting data corruption. This is considered a very big problem, equal in magnitude to (1), above.

    Possible fix: signal and detect all such overflows, at least in a debug mode. This could result in large performance degradation, and should possibly be done in a special debug mode, but such a mode would not guarantee an appropriate response to all such faults. There should also be support for detection of such faults in debuggers.

    IBM Response: No response is ready at present, but one will be provided forthwith.

  5. Power 3 is a 64 bit processor, but MPI is not yet released in 64 bit mode. This is considered serious for large systems. The release schedule should be accelerated.

    IBM Response: No response is ready at present, but one will be provided forthwith.

  6. Increasing SMP node CPU counts will cause time-sharing as well as space sharing; LL does not support specifications for CPU counts/thread counts; also it does not enforce memory specifications. This can lead to significant performance degradation with time shared nodes.

    IBM Response: No response is ready at present, but one will be provided forthwith.

  7. Colony switch and adapters will have more adapters per node and higher hardware bandwidth; the existing HAL-based MPI implementation requires unnecessary memory copies, and limits single task bandwidth to lower than the hardware limits (approximately 50%, with NightHawk II nodes). A zero copy user-level protocol is needed, similar to the KLAPI implementation used by GPFS.

    IBM Response: No response is ready at present, but one will be provided forthwith.

IBM SP Roadmap Update

This talk contained such a wealth of detail that I could not capture more than a small fraction of it. Also, there's the matter of IBM proprietary and possible NDA-protected information. Therefore, the following notes represent only the tip of the iceberg.

IBM will implement fundamental technical improvements that will prevent CMOS for "topping out" in the near or mid-term (3 to 4 years) future, in terms of processor or system performance, or the functionality that can be put on a single chip. Insulated copper on-chip conductor path will prevent inter-conductor crosstalk, so higher frequencies can be used. Distributed clocks operating at different frequencies, for different parts of the chip, will also allow higher frequencies to be used. The first generation of processors to benefit from these will be the Power 4 chips, which will be rolled out 3/4q 2001. These chips will represent a merging of the now-separate Power 3 technical and commercial product lines. They will be 2-way SMPs, running at 9 GF peak. They will have architectural hooks for very high performance, and to allow efficient interconnection into larger aggregations. The chips will contain very large L-2 cache shared between the processors, along with L-3 cache controllers and directories. Initial system offerings will be 32-way SMPs, but larger (smaller?) systems will follow. They will be usable as message passing or NUMA systems, as whole or partitioned machines.

Numerous software and utility enhancements are planned for RAS and user convenience and usability.

Interconnects will evolve to pure hardware, eliminating embedded processors for control and switching. Bandwidth will increase, and latency decrease, accordingly. (This also should enhance their reliability, although they will be harder to debug on the floor.)

RS/6000 SPs running AIX will continue to be IBM's primary HPC platform. However, Linux has the potential to become the volume Application Development environment. There will be a strong affinity between AIX and Linux, and they will both be made available across a variety of Intel and power Platforms. IBM will work with the Linux community to infuse AIX technology into the Linux kernel. IBM will also deliver robust Linux cluster solutions.


On matters pertaining to the information herein, send email to Thomas M. DeBoni at TMDeBoni@LBL.GOV, or call 510-486-8617.