Philip K. McKinley
Lionel M. Ni
Department of Computer Science
Michigan State University
The ComPaSS library components are organized according to a hierarchy. The Low-Level Communication Services (LLCS) module contains operations that can be invoked either directly by message-passing applications or indirectly though calls generated by a compiler for a data parallel language. The High-Level Communications Services (HLCS) module offers an assortment of data manipulation and control operations to SPMD applications. These operations include both those related to data decomposition and realignment as well as those that implement application-specific functions, for example, linear algebra routines. Finally, the project studied how specific applications can benefit from ComPaSS services.
The hierarchical structure of ComPaSS supports both the message-passing programming paradigm and programming in data-parallel languages. ComPaSS complements related projects in the field by providing high-performance implementations of operations needed in both these programming paradigms, while providing interfaces compatible with existing and emerging industry standards. For example, the interfaces to message-passing operations are consistent with other packages and the emerging Message Passing Interface (MPI) standard, and the operations for data parallel programming are consistent with then needs of High Performance Fortran (HPF).
ComPaSS provides global communication operations for both data manipulation and process control, many of which are based upon a small set of low-level communication primitives. In particular, the one-to-many, or multicast, operation distributes a single value to multiple nodes, and the multireceive operation is used to read input from multiple nodes. In the ComPaSS library, these low-level operations are optimized for specific classes of commercial scalable parallel computers, such as the nCUBE-2 and IBM SP-1/2. Moreover, we have used these operations to construct efficient implementations of other collective communication operations, such as global reduction and all-to-all broadcast, for these architectures.
We have also developed efficient communication interfaces for workstation clusters, including those based on FDDI and Asynchronous Transfer Mode (ATM) interconnects. In the case of ATM networks, we have developed a set of collective operations that use an underlying virtual topology composed of ATM virtual multicast channels. We have analyzed, prototyped, and tested implementations on an ATM testbed, and have shown how several of ATM's salient features, such as full-duplex I/O, high bit rate, concurrent sends, and hardware multicast, are very useful in reducing the turnaround time of such operations.
At the higher level, we have developed an efficient data redistribution library for HPF, which has been implemented on both the IBM SP-1/2 and FDDI-based workstation clusters. We have proposed efficient data distribution and alignment algorithms which can help programmers and compilers to achieve better load balancing and to reduce interprocessor communication. In addition, we have used the ComPaSS library routines to improve the performance of parallel applications, namely, parallel eigenvalue and singular value algorithms, and parallel data clustering and image segmentation algorithms.