CSE842: Natural Language Processing

Spring 2019

Location: 

2205 Engineering Building

Time:  

Tuesday and Thursday, 12:40-2:00 pm 

Professor: 

Joyce Chai, 2138 Engineering Building, (517)432-9239, jchai AT cse DOT msu DOT edu

Office Hours: 

Wednesday: 1:00-3:00pm, or by appointment


Course Description: 

The field of Natural Language Processing (NLP) is primarily concerned with computational models and computer algorithms to process  human languages, for example, automatically interpret, generate, and learn natural language. In the past twenty years, the rise of the world wide web, mobile devices, and social media have created tremendous opportunities for exciting NLP applications. The advances in machine learning have also paved the way to tackle many NLP problems in the real world. This course provides an introduction to the state of the art in modern NLP technologies. In particular, the topics to be discussed include: syntax, semantics, discourse, and their applications in information extraction, machine translation, and sentiment analysis.  These topics will be examined through reading, discussion, and hands-on experience with NLP systems. 

This course will be focused on  text-based language processing. Topics related to spoken language processing, dialogue, and language-based human agent communication are covered in CSE843: Language and Interaction. 

Text book:

Optional:

(1)  Speech and Language Processing, an introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, third edition (draft), by Daniel Jurafsky and James Martin, Prentice Hall.  (JM)

(2)  Foundations of Statistical Natural Language Processing, by Christopher D. Manning and Hinrich Schutze. MIT Press. ISBN 0-262-13360-1 

(3)  Neural Network Methods for Natural Language Processing, by Yoav Goldberg, Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers. 

 

Course Grades: 

The work in this course consists of three programming assignments, a midterm written exam, a paper presentation/discussion, and a final project.

Homework assignments: 45% (15% each)

Midterm exam: 15%

Paper presentation and discussion: 10%

Final Project: 30% 


Tentative Schedule for Homework Assignment and Final Project:


Assigned date

Due date

Homework 1 

Jan. 17

Feb. 3

Homework 2 

Feb. 7

March 1

Homework 3 

March 1

March 21

Project  Progress report

-

April 12

Project Final Report 

-

May 3


The assignment is due at 11:59pm of the due date (submitted through handin facility). No late homework will be accepted.  

Tentative Schedule of Topics

Week

Class Date

Topic

Suggested Readings

1

Jan 8

Introduction 

JM: Chapter 2

 

Jan 10

Language Modeling N-Grams

JM: Chapter 3

S. Katz, Estimation of probabilities from sparse data for the language model component of a speech recognizer.

2

Jan 15

Classification and Sentiment Analysis 

JM: Chapter 4  

 

Jan 17

Logistic Regression 

JM: Chapter 5

3

Jan 22

Vector Semantics 

JM: Chapter 6

Elements of Information Theory, Chapter 2, by Cover and Thomas


Jan 24

Neural Network and Neural Language Model 

JM: Chapter 7

Y. Bengio, R. Jucharme, P. Vincent, and C. Jauvin. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3 (2003) 1137-1155. 

T. Mikolov, K. Chen, G. Corrado, J. Dean. Efficient Estimation of Word Representation in Vector Space

4

Jan 29

Recurrent Networks

JM: Chapter 9

 

Jan 31

Introduction to Programming using Pytorch, etc.


5

Feb 5

Part-of-Speech Tagging and HMM

JM: Chapter 8, JM: Appendix A

Lawrence R. Rabiner, 1989. A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE 77(2), pp. 257-286.

 

Feb 7

Formal Grammars of English

JM: Chapter 10

6

Feb 12

Syntactic Parsing

JM: Chapter 11

 

Feb 14

Statistical Parsing 

JM: Chapter 12 

7

Feb 19

Dependency Parsing  

JM: Chapter 13


Feb 21

Meaning Representation 

JM: Chapter 14

8

Feb 26

Semantic Parsing


 

  Semantic Role Labeling

JM: Chapter 18

9

Mar 5

Spring Break, no class


 

Mar 7

Spring Break, no class

 

10

Mar 12

Midterm

Open Notes

 

Mar 14

Discourse Processing 


11

Mar 19

Recent Advances (1)

Paper presentation and discussion  (Session 1) 

 

Mar 21

Recent Advances (2)

Paper presentation and discussion  (Session 2) 

12

Mar 26

Information extraction 

JM: Chapter 17

 

Mar 28

Question Answering

JM: Chapter 23

13

Apr 2

Recent Advances (3)

Paper presentation and discussion  (Session 3) 

 

Apr 4

Recent Advances (4) 

Paper presentation and discussion  (Session 4) 

14

Apr 9

Generation and summarization


 

Apr 11

Recent Advances (5)

Paper presentation and discussion  (Session 5) 

15

Apr 16

Intro to MT

Papineni, K. et al., BLEU: A Method for Automatic Evaluation of Machine Translation, ACL 2002

P. Brown et al., The Mathematics of Statistical Machine Translation: Parameter EstimationComputational Linguistics, 19(2): 1993

 

Apr 18

Neural MT


16

Apr 23 

Final Project

 

 

Apr 25

Final Project

  

17

May 1

Final Project Presentation 

10:00-12:00pm EB2205

 


 

Final Project Report Due (May 3, 11:59pm)

 

Academic Honesty:

It is your responsibility to follow MSU's policy on academic integrity. Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment in which the copying or paraphrasing was done. Violation of academic integrity policy will result in a Grade F in the course. 

Alternative Testing:

Alternative testing is available to those with a documented disability affecting performance on tests. Students with documented disabilities requiring some form of accommodation receive a Verified Individualized Services and Accommodations (VISA) document which displays verified testing accommodations when appropriate. Please visit Alternative Testing Guidelines if applied. 

Notes: The instructor reserves the right to modify course policies and the course calendar according to the progress and needs of the class.