Time & Place. Mon Wed : 3:00 PM-4:20 PM, in-person, Wells Hall A220.
Instructor. Parisa Kordjamshidi (Office hours: Mon-Wed, 4:20-5:00 PM, or by appointment ).
"Selected Topics on Large Language Models" is a graduate-level course designed to explore the technical foundations of both statistical and neural language modeling, with a primary focus on the latest generation of large language models. These models have significantly impacted not only the field of natural language processing but also various AI domains, contributing to the development of intelligent systems capable of processing multiple modalities such as language, vision, speech, and code. The course begins by delving into the classical concept of language modeling based on n-grams and statistical modeling. Subsequently, it covers distributed word representations and extends these to context-dependent representations. The basics of attention mechanism and transformers are discussed, leading to an exploration of encoder-based language models, including pre-training and fine-tuning paradigms. The course also addresses encoder-decoder and generative language models, touching on concepts like prompt engineering, soft prompt learning, and the emergence of in-context learning in generative auto-regressive language models. After establishing these foundational concepts, the course delves into state-of-the-art literature that leverages language models to address various AI challenges, with a particular emphasis on multi-modal language models. The analysis covers the capabilities of Large Language Models (LLMs) from both theoretical and experimental perspectives, examining different facets of intelligence, including various types of reasoning such as mathematical reasoning, compositional reasoning, and reasoning on spatial and temporal information. As part of the course, students engage in a hands-on project involving a variety of language models, including members of the BERT family and older generative models like T5 and GPT-2. We exercise with new versions of LLaMA, GPT and mutimodal ones such as LLaVA and Qwen-VL. Computational resources are provided specific to this class to facilitate the successful completion of the project for the students.
Grading scheme:
Students are responsible for adhering to the MSU Academic Honesty (see link) policy and violation will result in a grade of 0 for the course. Furthermore, copying, paraphrasing, or plagiarizing someone else's work, including their code, or allowing your own work to be copied or paraphrased, even if only in part, is not allowed and will result in an automatic grade of 0 for the copied assignment. Using websites that allow you to get help from or pay someone else to write your code or solve the homework assignments are also prohibited and will result in a grade of 0 for the assignment. If students collaborate on an assignment, i.e., discuss the assignment in any way, the names of students and what was discussed must be included in the homework write-up.