CSE842: Natural Language Processing

Spring 2019


2205 Engineering Building


Tuesday and Thursday, 12:40-2:00 pm 


Joyce Chai, 2138 Engineering Building, (517)432-9239, jchai AT cse DOT msu DOT edu

Office Hours: 

Wednesday: 1:00-3:00pm, or by appointment

Course Description: 

The field of Natural Language Processing (NLP) is primarily concerned with computational models and computer algorithms to process  human languages, for example, automatically interpret, generate, and learn natural language. In the past twenty years, the rise of the world wide web, mobile devices, and social media have created tremendous opportunities for exciting NLP applications. The advances in machine learning have also paved the way to tackle many NLP problems in the real world. This course provides an introduction to the state of the art in modern NLP technologies. In particular, the topics to be discussed include: syntax, semantics, discourse, and their applications in information extraction, machine translation, and sentiment analysis.  These topics will be examined through reading, discussion, and hands-on experience with NLP systems. 

This course will be focused on  text-based language processing. Topics related to spoken language processing, dialogue, and language-based human agent communication are covered in CSE843: Language and Interaction. 

Text book:


(1)  Speech and Language Processing, an introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, third edition (draft), by Daniel Jurafsky and James Martin, Prentice Hall.  (JM)

(2)  Foundations of Statistical Natural Language Processing, by Christopher D. Manning and Hinrich Schutze. MIT Press. ISBN 0-262-13360-1 

(3)  Neural Network Methods for Natural Language Processing, by Yoav Goldberg, Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers. 


Course Grades: 

The work in this course consists of three programming assignments, a midterm written exam, a paper presentation/discussion, and a final project.

Homework assignments: 45% (15% each)

Midterm exam: 15%

Paper presentation and discussion: 10%

Final Project: 30% 

Tentative Schedule for Homework Assignment and Final Project:

Assigned date

Due date

Homework 1 

Jan. 17

Feb. 3

Homework 2 

Feb. 7

March 1

Homework 3 

March 1

March 21

Project  Progress report


April 12

Project Final Report 


May 3

The assignment is due at 11:59pm of the due date (submitted through handin facility). No late homework will be accepted.  

Tentative Schedule of Topics


Class Date


Suggested Readings


Jan 8


JM: Chapter 2


Jan 10

Language Modeling N-Grams

JM: Chapter 3

S. Katz, Estimation of probabilities from sparse data for the language model component of a speech recognizer.


Jan 15

Classification and Sentiment Analysis 

JM: Chapter 4  


Jan 17

Logistic Regression 

JM: Chapter 5


Jan 22

Vector Semantics 

JM: Chapter 6

Elements of Information Theory, Chapter 2, by Cover and Thomas

Jan 24

Neural Network and Neural Language Model 

JM: Chapter 7

Y. Bengio, R. Jucharme, P. Vincent, and C. Jauvin. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3 (2003) 1137-1155. 

T. Mikolov, K. Chen, G. Corrado, J. Dean. Efficient Estimation of Word Representation in Vector Space


Jan 29

Recurrent Networks

JM: Chapter 9


Jan 31

Introduction to Programming using Pytorch, etc.


Feb 5

Part-of-Speech Tagging and HMM

JM: Chapter 8, JM: Appendix A

Lawrence R. Rabiner, 1989. A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE 77(2), pp. 257-286.


Feb 7

Formal Grammars of English

JM: Chapter 10


Feb 12

Syntactic Parsing

JM: Chapter 11


Feb 14

Statistical Parsing 

JM: Chapter 12 


Feb 19

Dependency Parsing  

JM: Chapter 13

Feb 21

Meaning Representation 

JM: Chapter 14


Feb 26

Semantic Parsing


  Semantic Role Labeling

JM: Chapter 18


Mar 5

Spring Break, no class


Mar 7

Spring Break, no class



Mar 12


Open Notes


Mar 14

Discourse Processing 


Mar 19

Recent Advances (1)

Paper presentation and discussion  (Session 1) 


Mar 21

Recent Advances (2)

Paper presentation and discussion  (Session 2) 


Mar 26

Information extraction 

JM: Chapter 17


Mar 28

Question Answering

JM: Chapter 23


Apr 2

Recent Advances (3)

Paper presentation and discussion  (Session 3) 


Apr 4

Recent Advances (4) 

Paper presentation and discussion  (Session 4) 


Apr 9

Generation and summarization


Apr 11

Recent Advances (5)

Paper presentation and discussion  (Session 5) 


Apr 16

Intro to MT

Papineni, K. et al., BLEU: A Method for Automatic Evaluation of Machine Translation, ACL 2002

P. Brown et al., The Mathematics of Statistical Machine Translation: Parameter EstimationComputational Linguistics, 19(2): 1993


Apr 18

Neural MT


Apr 23 

Final Project



Apr 25

Final Project



May 1

Final Project Presentation 

10:00-12:00pm EB2205



Final Project Report Due (May 3, 11:59pm)


Academic Honesty:

It is your responsibility to follow MSU's policy on academic integrity. Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment in which the copying or paraphrasing was done. Violation of academic integrity policy will result in a Grade F in the course. 

Alternative Testing:

Alternative testing is available to those with a documented disability affecting performance on tests. Students with documented disabilities requiring some form of accommodation receive a Verified Individualized Services and Accommodations (VISA) document which displays verified testing accommodations when appropriate. Please visit Alternative Testing Guidelines if applied. 

Notes: The instructor reserves the right to modify course policies and the course calendar according to the progress and needs of the class.