CSE842: Natural Language Processing

Spring 2017

Location:  1260 Anthony Hall
Time:  Monday and Wednesday, 3:00-4:20pm
Professor: Joyce Chai, 2138 Engineering Building, (517)432-9239, jchai AT cse DOT msu DOT edu
Office Hours: Tuesday: 2:00-4:00pm, or by appointment

Course Description:

The field of Natural Language Processing (NLP) is primarily concerned with computational models and computer algorithms to process  human languages, for example, automatically interpret, generate, and learn natural language. In the past twenty years, the rise of the world wide web, mobile devices, and social media have created tremendous opportunities for exciting NLP applications. This course provides an introduction to the state of the art in modern NLP technologies. In particular, the topics to be discussed include: syntax, semantics, discourse, and their applications in information extraction, machine translation, and sentiment analysis.  These topics will be examined through reading, discussion, and hands-on experience with NLP systems. 

This course will be mainly focused on  text-based language processing. Topics related to spoken language processing, dialogue, and language-based human agent communication are covered in CSE843: Language and Interaction.

Text book:

Required:  Speech and Language Processing, an introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, second edition, by Daniel Jurafsky and James Martin, Prentice Hall.  ISBN-13: 978-0131873216. 

We will also use draft chapters for the third edition: http://web.stanford.edu/~jurafsky/slp3/

 Optional: Foundations of Statistical Natural Language Processing, by Christopher D. Manning and Hinrich Schutze. MIT Press. ISBN 0-262-13360-1



Course Grades:


Homework Assignments 50%

Paper discussion and presentation 

Final Project  40%

Homework and Final Project:

The work in this course consists of three homework assignments (including both a written part and a programming part). The written portion must be turned in at the beginning of the lecture on the day it is due. The programming part is due at 11:59pm of the due date (submitted through handin facility). No late homework will be accepted.  A set of topics will be provided for the final projects.

Assigned date Due date
Homework 1 Jan. 18 Feb. 8
Homework 2 Feb. 8 Feb. 27
Homework 3 Feb. 27 March 15
Project Proposal - March 22
Project Report - May 8

Tentative Schedule of Topics

Week Class Date Topic Readings
1 Jan 9 Introduction and Basic Text Processing


  Jan 11 Morphological Parsing
Chapter 2 & 3
2 Jan 16 Martin Luther King Day, no class  
  Jan 18

Chapter 4.
S. Katz, Estimation of probabilities from sparse data for the language model component of a speech recogniser.

Homework 1 Assigned

3 Jan 23 Classification and Sentiment Analysis


Jan 25 Linear Regression, Logistic Regression, and Neural Network
4 Jan 30 NN and Nueral Probabilistic Language Models

Y. Bengio, R. Jucharme, P. Vincent, and C. Jauvin. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3 (2003) 1137-1155.

  Feb 1 Hidden Markov Model

 Chapter 6

Lawrence R. Rabiner, 1989. A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE 77(2), pp. 257-286.

5 Feb 6 POS Tagging Chapter 5
  Feb 8 Context Free Grammar Homework 1 Due, Homework 2 Assigned
6 Feb 13 Parsing

  Feb 15
Probabilistic Parsing  &
Dependency Parsing
Michael Collins, Three Generative, Lexicalised Models for Statistical Parsing 
Dan Klein and Christopher D. Manning. 2003. Accurate Unlexicalized Parsing. ACL 2003, pp. 423-430.
7 Feb 20 Expectation Maximization


Feb 22 Meaning Representation   

Chapter 17

8 Feb 27 Semantic Analysis

 Chapter 18

Homework 2  Due, Homework 3 Assigned

  Mar 1 Lexical Semantics http://web.stanford.edu/~jurafsky/slp3/15.pdf
9 Mar 6 Spring Break, no class  
  Mar 8 Spring Break, no class  
10 Mar 13 Computational Lexical Semantics

Elements of Information Theory, Chapter 2, by Cover and Thomas

  Mar 15 Distributional Semantics
Dense Vectors,  Skip-grams

T. Mikolov, K. Chen, G. Corrado, J. Dean. Efficient Estimation of Word Representation in Vector Space

Homework 3 Due

11 Mar 20 Semantic Role Labeling Daniel Gildea and Daniel Jurafsky Automatic Labeling of Semantic Roles
  Mar 22 Discourse Processing Chapter 21
Final Project Proposal Due
12 Mar 27 Information Extraction Chapter 22,
  Mar 29 No Lecture


13 Apr 3 QA and Summarization

Chapter 23

  Apr 5 Machine Translation I

Chapter 25

Papineni, K. et al., BLEU: A Method for Automatic Evaluation of Machine Translation, ACL 2002

14 Apr 10 Machine Translation II, NMT

P. Brown et al., The Mathematics of Statistical Machine Translation: Parameter EstimationComputational Linguistics, 19(2): 1993

  Apr 12 Recent Advances in NLP Paper presentation and discussion  (Session 1)
15 Apr 17 Recent Advances in NLP Paper presentation and discussion  (Session 2)
  Apr 19 Recent Advances in NLP

Paper presentation and discussion  (Session 3)

16 Apr 24 Final Project Presentation  
  Apr 26 Final Project Presentation  
17 May 3 Final Project Presentation 3:00-5:00pm
  May 5
  Final Project Report Due (May 5, 11:59pm)

Academic Honesty:

It is your responsibility to follow MSU's policy on academic integrity. Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment in which the copying or paraphrasing was done. Violation of academic integrity policy will result in a Grade F in the course. 

Alternative Testing:

Alternative testing is available to those with a documented disability affecting performance on tests. Students with documented disabilities requiring some form of accommodation receive a Verified Individualized Services and Accommodations (VISA) document which displays verified testing accommodations when appropriate. Please visit Alternative Testing Guidelines if applied. 

Notes: The instructor reserves the right to modify course policies and the course calendar according to the progress and needs of the class.