Planning du Introduction to Natural Language Processing (CS-431)

TENTATIVE SCHEDULE (may change [a bit])

(Lectures and exercises take place in room CM 5.)

Except for weeks 1 and 10 to 14, all the other classes (weeks 2 to 9) is "flipped" and organized as follows:

we expect you to watch the video of the lecture before the class (thus before Wednesday 08:00);
we'll do a 2 hours (8:15-10:00) "review" of the lecture content, made of, first, some review slides, then some "hands-on" (which can be seen as some kind of preparation for quizzes);
then will follow 2 hours of either pratical sessions (PS) or exercices/"office hours"(= free questions).

When there is a quiz (weeks 5, 7, 10, 14), everything is shifted by 1 hour. (Notice, btw, the quizzes are not meant to be a preparation for the exam; exercises are; quizzes are only a feedback providing evaluation/comprehension test.)

Week 1 being a "welcome week", it starts with 2 hours of usual lecture, followed by 2 hours of introductory practical session.

Starting from week 10, Pr. Antoine Bosselut will give a series of three lectures on Modern NLP (Deep Learning). These lecture are not pre-recorded and will be taught live. The lessons will be recorded live and available some time afterwards.

Texts in bold (weekly updated) are links to the corresponding course material.

		Wednesday 08h15 - 09h00	Wednesday 09h15 - 10h00	Wednesday 10h15 - 11h00	Wednesday 11h15 - 12h00
1	11/09/2024	Introduction to NLP Linguistics Levels in Natural Language Processing (MR)		PS: Machine Translation (MR+DB)
2	18/09/2024	Evaluation in NLP (MR): [slides] [video] [review slides]		PS: Evaluation (MR+DB)
3	25/09/2024	Evaluation in NLP 2 (MR): [slides] [video] [review slides]	Hands-on evaluation (MR)		Exercises (exam preparation) & QAS/"office hours" (MR)
4	02/10/2024	Word, tokens, n-grams and Language Models (JCC): [slides] [video] [review slides]	Hands-on n-grams/language models (JCC)	Exercises & QAS (JCC)
5	09/10/2024	Quiz: NLP and evaluation (weeks 1-3, online, 4%)	Tagging (JCC): [slides] [video] [review slides]	Hands-on POS tagging (1) (JCC)	Exercises + QAS (JCC)
6	16/10/2024	HMM decoding and learning (Viterbi and EM) (JCC): [slides] [video] [review slides]	Hands-on POS tagging (2) (JCC)	PS: PoS Tagging (JCC+DB)
-	23/10/2024	Fall break
7	30/10/2024	Quiz: n-grams + Tagging (weeks 2-5, online, 4%)	Textual data analysis and classification (JCC): [slides] [video] [review slides]	Hands-on textual classification (JCC)	Exercises + QAS (JCC)
8	06/11/2024	Vector space Semantics (and Information Retrieval) (JCC): [slides] [video] [review slides]	Hands-on information retrieval (JCC)	PS: Text classification (JCC+DB)
9	13/11/2024	Quiz: up to classification (weeks 1-7, online, 4%)	Semantics (MR): [slides] [video] [review slides]	Hands-on semantics (MR)	Exercises + QAS (MR)
10	20/11/2024	Deep Learning for NLP (no pre-recording) (Pr. A. Bosselut): [slides] [video]			Transformers Attention Visualization (Pr. A. Bosselut)
11	27/11/2024	PS: Semantics with LLMs (1/2) (MR+DB)
12	04/12/2024	Generation (no pre-recording) (Pr. A. Bosselut): [slides] [video]			coding assignment (Pr. A. Bosselut)
13	11/12/2024	PS: Semantics with LLMs (2/2) (MR+DB)
14	18/12/2024	Quiz: from classification to Generation (incl.) (weeks 7-13, online, 4%)	Ethics in NLP (no pre-recording) (Pr. A. Bosselut): [slides] [video]		Future of NLP (no pre-recording) (Pr. A. Bosselut): [slides] [video]

Videos of the lectures

Week 1: Introduction to the course and to NLP (2024 version) [01:32:24]
- 00:00:00 - General presentation of the course (admin, grading, ...)
- 00:05:00 - What is Natural Language Processing (NLP)?
- 00:14:05 - Is this course worth it? (modern vs traditional approaches)
- 00:30:29 - Applications and contraints in NLP
- 00:45:37 - [keypoint] Natural vs. formal languages
- 00:53:10 - Natural language functions
- 01:00:52 - [keypoint] Why is NLP difficult?
  1. Importance of appropriate resources
- 01:04:01 - 2. Power laws
- 01:07:32 - 3. & 4. other reasons why NLP is difficult
- 01:12:26 - simplified NLP architecture
- 01:19:18 - [keypoint] Linguistic Processing Levels
Week 2: Evaluation in NLP [01:46:24]
- 00:00:00 - Introduction
- 00:01:29 - [keypoint] NLP evaluation protocol
- 00:02:41 - Evaluation campaigns
- 00:04:31 - Example task: classification of linguistic entities
- 00:12:04 - Reference (Gold standard)
- 00:21:10 - [keypoint] Gold standard creation process
- 00:27:17 - Assess the quality of the reference: Inter-annotator agreement
- 00:35:15 - [keypoint] Dealing with chance agreement -- Cohen’s kappa
- 00:41:55 - Inter-annotator agreement -- practices
- 00:44:14 - NLP Systems evaluation: evaluation metrics
- 00:58:47 - [keypoint] Precision and Recall
- 01:14:00 - Example of NLP evaluation on parsing
- 01:25:32 - Other NLP evaluation metrics
- 01:26:34 - [keypoint] Discuss the result: variability of the results
- 01:28:42 - [keypoint] Separating the data: training, validation and test sets
- 01:37:13 - [keypoint] Statistical significance
- 01:43:22 - Conclusion
Week 4: Lexicons, n-grams and Language Models [01:37:46]
- 00:00:00 - Introduction
- 00:01:22 - Word or token?
- 00:19:21 - n-grams -- introduction
- 00:26:51 - Probabilities (reminder)
- 00:31:36 - [keypoint] the n-gram approach
- 00:35:28 - Example
- 00:39:33 - Caveat!
- 00:43:12 - Parameters estimation
- 00:48:37 - [keypoint] Smoothing
- 00:52:47 - Additive smoothing = Dirichlet prior
- 01:03:38 - [keypoint] Additive smoothing = Dirichlet prior (summary)
- 01:06:29 - Examples (of Dirichlet distributions)
- 01:14:51 - Language Identification
- 01:21:24 - Spelling Error Correction
- 01:33:14 - [keypoint] Summary of the keypoints
Week 5: PoS tagging (and sequence labeling) [01:30:13]
- 00:00:00 - Introduction
- 00:04:43 - [keypoint] Lemmatization (definition)
- 00:12:54 - [keypoint] PoS tagging (definition)
- 00:19:37 - PoS tagging (formalization and examples)
- 00:30:18 - Other sequence labeling tasks
- 00:33:56 - Probabilistic PoS tagging
- 00:47:43 - [keypoint] Probabilistic PoS tagging (simplifying hypotheses, HMM)
- 01:08:59 - HMM PoS tagging (example)
- 01:14:43 - HMMs (algorithms and parameter estimation)
- 01:21:00 - Other approaches and performance
- 01:25:56 - [keypoint] Summary of the keypoints
Week 6: Hidden Makov Model (HMM) Primer [01:12:06]
- 00:00:00 - Introduction
- 00:00:39 - Recap example
- 00:04:43 - definition of Makov Models
- 00:09:35 - [keypoint] definition of HMM
- 00:11:45 - examples
- 00:18:28 - [keypoint] The 3 basic problems for HMMs
- 00:27:10 - Presentation of the 1st problem: compute P(w)
- 00:29:32 - [keypoint] Forward-Backward algorithms (solution to the first problem)
- 00:40:31 - Presentation of the 2nd problem: compute argmax_T P(T|w)
- 00:43:11 - [keypoint] Viterbi algorithm (solution to the second problem)
- 00:49:05 - example (Viterbi algorithm)
- 00:54:49 - Presentation of the 3rd problem: unsupervised learning: compute argmax_params P(params|w)
- 00:54:55 - [keypoint] Expectation-Maximization
- 01:03:04 - Baum-Welch Algorithm (presentation)
- 01:07:01 - [keypoint] Baum-Welch Algorithm (summary)
- 01:09:08 - Conclusion: other models
- 01:10:43 - [keypoint] Summary of the keypoints
- 01:11:42 - Appendix
Week 7: Textual Data Analysis and Classification [01:11:36]
- 00:00:00 - Introduction
- 00:02:09 - Classification: framework
- 00:07:26 - [keypoint] Textual Data Classification
- 00:12:24 - Dissimilarity matric
- 00:13:18 - [keypoint] Usual metrics/similarities
- 00:16:17 - Classification: complexity
- 00:17:13 - Classification methods
- 00:20:13 - [keypoint] Classification methods: how to choose?
- 00:23:47 - Classification methods: Bayesian approach
- 00:25:39 - [keypoint] Naive Bayes classifier
- 00:28:12 - Logistic regression
- 00:31:37 - K nearest neighbors - Parzen window
- 00:33:10 - [keypoint] Dendrograms
- 00:40:10 - [keypoint] K-means
- 00:45:58 - about Word embedings and Deep Learning
- 00:47:22 - Classification: evaluation
- 00:50:53 - Dimensionality reduction: framework
- 00:56:28 - [keypoint] Principal Components Analysis
- 01:02:26 - Non-linera projection
- 01:04:03 - Projection Pursuit
- 01:04:58 - [keypoint] Mapping: Multidimensional Scaling (+ t-SNE + UMAP)
- 01:09:03 - [keypoint] Summary of the keypoints
Week 8: Information Retrieval [01:00:50]
- 00:00:00 - Introduction
- 00:01:37 - [keypoint] Vector-space model
- 00:03:12 - [keypoint] Indexing
- 00:07:20 - Indexing: choice
- 00:11:08 - Bag of words and weighting schemes
- 00:14:10 - [keypoint] tf-idf
- 00:17:15 - [keypoint] cosine similarity
- 00:20:45 - Information Retrieval
- 00:22:12 - Relevance?
- 00:27:26 - Okapi BM25
- 00:28:40 - Queries
- 00:32:21 - [keypoint] Precision and Recall
- 00:41:23 - Limitations of the Vector Space model
- 00:43:21 - Topic-based models
- 00:47:10 - Word embeddings
- 00:57:10 - Evolution of NLP
- 00:59:52 - [keypoint] Summary of the keypoints
Week 9: Lexical Semantics [01:21:12]
- 00:00:00 - Introduction
- 00:01:15 - [keypoint] Lexical vs. Compositional Semantics
- 00:02:10 - Compositional Semantics
- 00:03:44 - Usual representations
- 00:09:32 - Lexical Semantics
- 00:12:30 - [keypoint] Word sense
- 00:14:42 - Lexemes
- 00:19:08 - [keypoint] Semantic Relations
- 00:20:19 - Homonymy, homophony, homography
- 00:22:40 - Polysemy
- 00:28:30 - Synonymy
- 00:31:19 - Hyponymy/Hypernymy
- 00:34:46 - Meronymy/Holonymy
- 00:41:10 - Defining word senses with semantic relations
- 00:47:57 - Resources for Lexical Semantics: WordNet
- 00:53:23 - [keypoint] WordNet Synsets
- 01:02:25 - Application of lexical semantics in language engineering
- 01:18:54 - [keypoint] Summary of the keypoints
Week 10: Deep-Learning for NLP [02:00:32]
- 00:00:00 - Introduction
- 00:09:29 - Neural Word Embeddings
- 00:43:28 - Recurrent Neural Networks for Sequence Modeling
- 01:23:53 - Attentive Neural Modeling with Transformers
Week 12: Natural Language Generation [02:07:41]
- 00:00:00 - Introduction
- 00:09:27 - Text Generation Task
- 00:27:09 - Decoding
- 01:36:15 - Evaluation
Week 14a: Ethics in NLP [01:26:421]
Week 14b: What comes next? [00:36:24]

Annotated slides

Week 4: Words? Tokens! [50 MB]
Week 6: HMM primer [80 MB]

Dernière mise à jour le 5 décembre 2024
Last modified: Thu Dec 5, 2024