Research Projects

My primary research interest lies in the areas of compilers, programming languages, and systems in high-performance computing. In my Ph.D. dissertation, I explored techniques for an optimizing compiler that presents to programmers a sequentially consistent view of explicitly parallel programs while performing correct optimizations expected from a compiler. Since then, I have been focusing on two research projects. One is building an optimizing compiler for relaxed memory consistency models, such as the Java memory model, which is an extension of my Ph.D. thesis and funded recently by the NSF Information Technology Research initiative program (with Prof. David Padua at the University of Illinois at Urbana-Champaign and Dr. Samuel Midkiff at IBM T. J. Watson research center). The other is compilation techniques for processor in memory architectures (FlexRAM). The latter is a collaborative project with Prof. Josep Torrellas at the University of Illinois at Urbana-Champaign.

An Optimizing Compiler for Languages with Programmable Memory Models

While memory consistency models has long been an interesting topic in the computer architecture community, the programming languages and compilers community has not given much attention to the issues of correctness and ease of programming with respect to the underlying memory consistency models. The influences of the memory consistency model to programming model become more important because the widespread use of SMP's and the increasing need for multi-threaded languages, such as Java, OpenMP, and Pthreads.

Unfortunately, the usability of memory consistency models, their impact on performance, and the compiler technology needed to perform optimizations of parallel programs are poorly understood. A concrete example is the current Java Memory Model that is one of the relaxed memory consistency models. Java is the first widespread concurrent programming language that explicitly specifies a memory model at the language level. However, the Java memory model is very difficult for programmers to understand, and there are several ways of interpreting the memory model. In addition, like most programming languages that follow the shared memory parallel programming model, non-deterministic behaviors due to data races can also occur in Java concurrent programs. The synergic effect of the non-intuitive Java memory model and the non-deterministic behavior makes it difficult for a programmer to write correct and efficient Java concurrent programs. Moreover, the relaxed memory consistency models make programming and porting more difficult, even though most of the current shared memory multiprocessors have a relaxed memory consistency model and provide a wide variety of hardware level optimizations. The programmer expect the behavior of the memory to be similar to that of a uniprocessor undertaking concurrent execution of several tasks. In this point of view, sequential consistency is arguably the most intuitive and natural memory consistency model for programmers.

This research focuses on building an optimizing compiler for explicitly parallel shared memory programs that hides the underlying relaxed memory consistency model. The compiler presents an intuitive and natural memory consistency model (e.g., sequential consistency) to the programmer. It shifts the programmer's burden of considering the underlying machine architecture to the compiler itself so that it makes programming and debugging easier. Moreover, it provides correct compiler level optimizations that are not considered by conventional compilers. The compiler uses Shasha and Snir's delay set analysis, and Concurrent Static Single Assignment program representation to distinguish the effects of different memory consistency models on compiler optimization and analysis techniques. In addition, the compiler will serve as a testbed to prototype new memory consistency models at the language level, and to measure the effects of different memory models on program performance. Because of its widespread deployment, and its use in general purpose, commercially important applications programming, the project is targeting Java, or a Java-like language.

Relevant Publications

[1] Jaejin Lee and David A. Padua. ``Hiding Relaxed Memory Consistency with a Compiler'', An extended version of [5], IEEE Transactions on Computers: Special Issue on Parallel Architectures and Compilation Techniques, 50(8), August 2001.

[2] Samuel P. Midkiff, Jaejin Lee, David A. Padua "A Compiler for Multiple Memory Models", The 9th Workshop on Compilers for Parallel Computers (CPC 2001), June 2001

[3] Jaejin Lee, ``Hiding the Java Memory Model with Compilers'', Technical Report MSU-CSE-00-29 (Presented in the OOPSLA workshop: Revising the Java Thread Specification), Department of Computer Science and Engineering, Michigan State University, Dec. 2000. ( pdf )

[4] Jaejin Lee, David A. Padua, and Samuel P. Midkiff. ``Basic Compiler Algorithms for Explicitly Parallel Programs'', An extended version of [7], Submitted to ACM Transactions on Programming Languages and Systems, Oct. 2000.

[5] Jaejin Lee and David A. Padua. Hiding Relaxed Memory Consistency with Compilers. Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques (PACT). ( pdf )

[6] Jaejin Lee, ``Compilation Techniques for Explicitly Parallel Programs'', Ph.D. Thesis, Technical Report UIUCDCS-R-93-1814, Department of Computer Science, University of Illinois at Urbana-Champaign, Oct. 1999. ( pdf )

[7] Jaejin Lee, David A. Padua, and Samuel P. Midkiff. ``Basic Compiler Algorithms for Parallel Programs'', Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) , Atlanta, Georgia, pp. 1-12, May 1999. ( pdf )

[8] Jaejin Lee, Samuel P. Midkiff, and David A. Padua. ``A Constant Propagation Algorithm for Explicitly Parallel Programs'', International Journal of Parallel Programming, Vol. 26, No 5, pp. 563-589, Dec. 1998. ( pdf )

[9] Jaejin Lee, Samuel P. Midkiff, and David A. Padua. ``Concurrent Static Single Assignment Form and Constant Propagation for Explicitly Parallel Programs'', Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing (LCPC), No. 1366 in Lecture Notes in Computer Science, Springer, pp. 114-130, Twin Cities, Minnesota, Aug. 1997. ( pdf )

[10] Jaejin Lee and David A. Padua. ``Parallel Static Single Assignment Form and Constant Propagation for Explicitly Parallel Programs'', Proceedings of the 2nd HPCA Workshop on Interaction between Compilers and Computer Architectures, San Antonio, Texas, Feb. 1997. ( pdf )

Compilers for Processor in Memory Architectures

Processor In Memory (PIM) technology integrates processor logic and DRAM in the same chip. One interesting usage of these chips is to replace the main memory chips in a workstation or a server. In this case, PIM chips act as co-processors in memory that execute code when signaled by the host (main) processor. The target intelligent memory architecture has a heterogeneous mix of processors: host and memory processors. Typically, a host processor is a wide-issue superscalar with a deep cache-hierarchy and a long memory access latency, while a memory processor is a simple, narrow-issue superscalar with only a small cache and a low memory access latency. Cache coherence of the system is guaranteed by a compiler.

While other compiler approaches in processors-in-memory architectures are often not much different from compiling for systems where processors in memory chips are the main processors, our approach is to automatically partition the code into sections and schedule each section on its most suitable processors. We exploit the heterogeneity of the system in addition to the parallelism provided by the architecture. To do so, we use adaptive execution techniques with dynamic feedback in addition to performance prediction. When the intelligent memory architecture has multiple host processors and/or memory processors, it is expected that the communication overhead between processors is large and that the compiler cache coherence and memory consistency become more complicated. In this case, the compiler inserts code that makes the program to be executed adaptively in order to reduce the overheads. To fully exploit the parallelism provided by the architecture, we are exploring speculative execution techniques on the intelligent memory architecture.

The target applications include floating point, integer, multimedia, and object-oriented applications. By building a compiler we measure the effectiveness of exploiting the heterogeneity of the intelligent memory architecture in the domain of performance and low-power consumption, and the effectiveness of adaptive and speculative execution techniques on the intelligent memory architecture.

Relevant Publications

[11] Yan Solihin, Jaejin Lee, and Josep Torrellas. "Memory-Side Correlation Prefetching Using a General-Purpose Core in Memory", To appear in the 29th Annual International Symposium on Computer Architecture (ISCA 2002), Anchorage, Alaska, May 2002.

[12] Yan Solihin, Jaejin Lee, and Josep Torrellas. Prefetching in an Intelligent Memory Architecture Using a Helper Thread, Proceedings of Workshop on Multithreaded Execution, Architecture and Compilation (MTEAC-5), December 2001.( pdf )

[13] Yan Solihin, Jaejin Lee, and Josep Torrellas. ``Automatic Code Mapping on an Intelligent Memory Architecture'', An extended version of [13], IEEE Transactions on Computers: Special Issue on Advances in High Performance Memory Systems, 50(11), November 2001.

[14] Jaejin Lee, Yan Solihin, and Josep Torrellas. ``Automatically Mapping Code on an Intelligent Memory Architecture'', Proceedings of the 7th International Symposium on High Performance Computer Architecture (HPCA), Jan. 2001. ( pdf )

[15] Yan Solihin, Jaejin Lee, and Josep Torrellas. ``Adaptively Mapping Code in an Intelligent Memory Architecture'', Proceedings of the 2nd Workshop on Intelligent Memory Systems, Nov. 2000. ( pdf )