CSE842: Natural Language Processing
Spring 2019
Location: |
2205 Engineering Building |
Time: |
Tuesday and Thursday, 12:40-2:00 pm |
Professor: |
Joyce Chai, 2138 Engineering Building, (517)432-9239, jchai AT cse DOT msu DOT edu |
Office Hours: |
Wednesday: 1:00-3:00pm, or by appointment |
Course Description:
The field of Natural Language Processing (NLP) is primarily concerned with computational models and computer algorithms to process human languages, for example, automatically interpret, generate, and learn natural language. In the past twenty years, the rise of the world wide web, mobile devices, and social media have created tremendous opportunities for exciting NLP applications. The advances in machine learning have also paved the way to tackle many NLP problems in the real world. This course provides an introduction to the state of the art in modern NLP technologies. In particular, the topics to be discussed include: syntax, semantics, discourse, and their applications in information extraction, machine translation, and sentiment analysis. These topics will be examined through reading, discussion, and hands-on experience with NLP systems.
This course will be focused on text-based language processing. Topics related to spoken language processing, dialogue, and language-based human agent communication are covered in CSE843: Language and Interaction.
Text book:
Optional:
(1) Speech and Language Processing, an introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, third edition (draft), by Daniel Jurafsky and James Martin, Prentice Hall. (JM)
(2) Foundations of Statistical Natural Language Processing, by Christopher D. Manning and Hinrich Schutze. MIT Press. ISBN 0-262-13360-1
(3) Neural Network Methods for Natural Language Processing, by Yoav Goldberg, Synthesis Lectures on Human Language Technologies, Morgan & Claypool Publishers.
Course Grades:
The work in this course consists of three programming assignments, a midterm written exam, a paper presentation/discussion, and a final project.
Homework assignments: 45% (15% each)
Midterm exam: 15%
Paper presentation and discussion: 10%
Final Project: 30%
Tentative Schedule for Homework Assignment and Final Project:
|
Assigned date |
Due date |
Homework 1 |
Jan. 17 |
Feb. 3 |
Homework 2 |
Feb. 7 |
March 1 |
Homework 3 |
March 1 |
March 21 |
Project Progress report |
- |
April 12 |
Project Final Report |
- |
May 3 |
The assignment is due at 11:59pm of the due date (submitted through handin facility). No late homework will be accepted.
Tentative Schedule of Topics
Week |
Class Date |
Topic |
Suggested Readings |
1 |
Jan 8 |
Introduction |
|
|
Jan 10 |
Language Modeling N-Grams |
S. Katz, Estimation of probabilities from sparse data for the language model component of a speech recognizer. |
2 |
Jan 15 |
Classification and Sentiment Analysis |
|
|
Jan 17 |
Logistic Regression |
|
3 |
Jan 22 |
Vector Semantics |
Elements of Information Theory, Chapter 2, by Cover and Thomas |
|
Jan 24 |
Neural Network and Neural Language Model |
Y. Bengio, R. Jucharme, P. Vincent, and C. Jauvin. A Neural Probabilistic Language Model. Journal of Machine Learning Research 3 (2003) 1137-1155. T. Mikolov, K. Chen, G. Corrado, J. Dean. Efficient Estimation of Word Representation in Vector Space |
4 |
Jan 29 |
Recurrent Networks |
|
|
Jan 31 |
Introduction to Programming using Pytorch, etc. |
|
5 |
Feb 5 |
Part-of-Speech Tagging and HMM |
Lawrence R. Rabiner, 1989. A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE 77(2), pp. 257-286. |
|
Feb 7 |
Formal Grammars of English |
|
6 |
Feb 12 |
Syntactic Parsing |
|
|
Feb 14 |
Statistical Parsing |
|
7 |
Feb 19 |
Dependency Parsing |
|
|
Feb 21 |
Meaning Representation |
|
8 |
Feb 26 |
Semantic Parsing |
|
|
Semantic Role Labeling | ||
9 |
Mar 5 |
Spring Break, no class |
|
|
Mar 7 |
Spring Break, no class |
|
10 |
Mar 12 |
Midterm |
Open Notes |
|
Mar 14 |
Discourse Processing |
|
11 |
Mar 19 |
Recent Advances (1) |
Paper presentation and discussion (Session 1) |
|
Mar 21 |
Recent Advances (2) |
Paper presentation and discussion (Session 2) |
12 |
Mar 26 |
Information extraction |
|
|
Mar 28 |
Question Answering |
|
13 |
Apr 2 |
Recent Advances (3) |
Paper presentation and discussion (Session 3) |
|
Apr 4 |
Recent Advances (4) |
Paper presentation and discussion (Session 4) |
14 |
Apr 9 |
Generation and summarization |
|
|
Apr 11 |
Recent Advances (5) |
Paper presentation and discussion (Session 5) |
15 |
Apr 16 |
Intro to MT |
Papineni, K. et al., BLEU: A Method for Automatic Evaluation of Machine Translation, ACL 2002 P. Brown et al., The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2): 1993 |
|
Apr 18 |
Neural MT |
|
16 |
Apr 23 |
Final Project |
|
|
Apr 25 |
Final Project |
|
17 |
May 1 |
Final Project Presentation |
10:00-12:00pm EB2205 |
|
|
|
Final Project Report Due (May 3, 11:59pm) |
Academic Honesty:
It is your responsibility to follow MSU's policy on academic integrity. Copying or paraphrasing someone's work (code included), or permitting your own work to be copied or paraphrased, even if only in part, is not allowed, and will result in an automatic grade of 0 for the entire assignment in which the copying or paraphrasing was done. Violation of academic integrity policy will result in a Grade F in the course.
Alternative Testing:
Alternative testing is available to those with a documented disability affecting performance on tests. Students with documented disabilities requiring some form of accommodation receive a Verified Individualized Services and Accommodations (VISA) document which displays verified testing accommodations when appropriate. Please visit Alternative Testing Guidelines if applied.
Notes: The instructor reserves the right to modify course policies and the course calendar according to the progress and needs of the class.