TENTATIVE SCHEDULE (may change [a bit])
(Lectures and exercises take place in room CM 5.)
Except for weeks 1 and 10 to 14, all the other classes (weeks 2 to 9) is "flipped" and organized as follows:
- we expect you to watch the video of the lecture before the class (thus before Wednesday 08:00);
we'll do a 2 hours (8:15-10:00) "review" of the lecture content, made of, first, some review slides, then some "hands-on" (which can be seen as some kind of preparation for quizzes);
- then will follow 2 hours of either pratical
sessions (PS) or exercices/"office hours"(= free questions).
When there is a quiz (weeks 5, 7, 10, 14), everything is shifted by 1 hour. (Notice, btw, the quizzes are not meant to be a preparation for the exam; exercises are; quizzes are only a feedback providing evaluation/comprehension test.)
Week 1 being a "welcome week", it starts with 2 hours of usual lecture, followed by 2 hours of introductory practical session.
Starting from week 10, Pr. Antoine Bosselut will give a series of three lectures on Modern NLP (Deep Learning). These lecture are not pre-recorded and will be taught live. The lessons will be recorded live and available some time afterwards.
Texts in bold (weekly updated) are links to the corresponding course material.
Wednesday 08h15 - 09h00 |
Wednesday 09h15 - 10h00 |
Wednesday 10h15 - 11h00 |
Wednesday 11h15 - 12h00 |
1 | 11/09/2024 |
Introduction to NLP
Linguistics Levels in Natural Language Processing (MR) |
PS: Machine Translation (MR+DB) |
2 | 18/09/2024 |
Evaluation in NLP (MR): [slides] [video] [review slides] |
PS: Evaluation (MR+DB) |
3 | 25/09/2024 |
Evaluation in NLP 2 (MR): [slides] [video] [review slides] |
Hands-on evaluation (MR) |
Exercises (exam preparation) & QAS/"office hours" (MR) |
4 | 02/10/2024 |
Word, tokens, n-grams and Language Models (JCC): [slides] [video] [review slides] |
Hands-on n-grams/language models (JCC) |
Exercises & QAS (JCC) |
5 | 09/10/2024 |
Quiz: NLP and evaluation (weeks 1-3, online, 4%) |
Tagging (JCC): [slides] [video] [review slides] |
Hands-on POS tagging (1) (JCC) |
Exercises + QAS (JCC) |
6 | 16/10/2024 |
HMM decoding and learning (Viterbi and EM) (JCC): [slides] [video] [review slides] |
Hands-on POS tagging (2) (JCC) |
PS: PoS Tagging (JCC+DB) |
- | 23/10/2024
Fall break |
7 | 30/10/2024 |
Quiz: n-grams + Tagging (weeks 2-5, online, 4%) |
Textual data analysis and classification (JCC): [slides] [video] [review slides] |
Hands-on textual classification (JCC) |
Exercises + QAS (JCC) |
8 | 06/11/2024 |
Vector space Semantics (and Information Retrieval) (JCC): [slides] [video] [review slides] |
Hands-on information retrieval (JCC) |
PS: Text classification (JCC+DB) |
9 | 13/11/2024 |
Quiz: up to classification (weeks 1-7, online, 4%) |
Semantics (MR): [slides] [video] [review slides] |
Hands-on semantics (MR) |
Exercises + QAS (MR) |
10 | 20/11/2024 |
Deep Learning for NLP (no pre-recording) (Pr. A. Bosselut): [slides] [video] |
Transformers Attention Visualization (Pr. A. Bosselut) |
11 | 27/11/2024 |
PS: Semantics with LLMs (1/2) (MR+DB) |
12 | 04/12/2024 |
Generation (no pre-recording) (Pr. A. Bosselut): [slides] [video] |
coding assignment (Pr. A. Bosselut) |
13 | 11/12/2024 |
PS: Semantics with LLMs (2/2) (MR+DB) |
14 | 18/12/2024 |
Quiz: from classification to Generation (incl.) (weeks 7-13, online, 4%) |
Ethics in NLP (no pre-recording) (Pr. A. Bosselut): [slides] [video] |
Future of NLP (no pre-recording) (Pr. A. Bosselut): [slides] [video] |
Videos of the lectures
Week 1: Introduction to the course and to NLP (2024 version) [01:32:24]
00:00:00 - General presentation of the course (admin, grading, ...)
00:05:00 - What is Natural Language Processing (NLP)?
00:14:05 - Is this course worth it? (modern vs traditional approaches)
00:30:29 - Applications and contraints in NLP
00:45:37 - [keypoint] Natural vs. formal languages
00:53:10 - Natural language functions
01:00:52 - [keypoint] Why is NLP difficult?
1. Importance of appropriate resources
01:04:01 - 2. Power laws
01:07:32 - 3. & 4. other reasons why NLP is difficult
01:12:26 - simplified NLP architecture
01:19:18 - [keypoint] Linguistic Processing Levels
Week 2: Evaluation in NLP [01:46:24]
00:00:00 - Introduction
00:01:29 - [keypoint] NLP evaluation protocol
00:02:41 - Evaluation campaigns
00:04:31 - Example task: classification of linguistic entities
00:12:04 - Reference (Gold standard)
00:21:10 - [keypoint] Gold standard creation process
00:27:17 - Assess the quality of the reference: Inter-annotator agreement
00:35:15 - [keypoint] Dealing with chance agreement -- Cohen’s kappa
00:41:55 - Inter-annotator agreement -- practices
00:44:14 - NLP Systems evaluation: evaluation metrics
00:58:47 - [keypoint] Precision and Recall
01:14:00 - Example of NLP evaluation on parsing
01:25:32 - Other NLP evaluation metrics
01:26:34 - [keypoint] Discuss the result: variability of the results
01:28:42 - [keypoint] Separating the data: training, validation and test sets
01:37:13 - [keypoint] Statistical significance
01:43:22 - Conclusion
Week 4: Lexicons, n-grams and Language Models [01:37:46]
00:00:00 - Introduction
00:01:22 - Word or token?
00:19:21 - n-grams -- introduction
00:26:51 - Probabilities (reminder)
00:31:36 - [keypoint] the n-gram approach
00:35:28 - Example
00:39:33 - Caveat!
00:43:12 - Parameters estimation
00:48:37 - [keypoint] Smoothing
00:52:47 - Additive smoothing = Dirichlet prior
01:03:38 - [keypoint] Additive smoothing = Dirichlet prior (summary)
01:06:29 - Examples (of Dirichlet distributions)
01:14:51 - Language Identification
01:21:24 - Spelling Error Correction
01:33:14 - [keypoint] Summary of the keypoints
Week 5: PoS tagging (and sequence labeling) [01:30:13]
00:00:00 - Introduction
00:04:43 - [keypoint] Lemmatization (definition)
00:12:54 - [keypoint] PoS tagging (definition)
00:19:37 - PoS tagging (formalization and examples)
00:30:18 - Other sequence labeling tasks
00:33:56 - Probabilistic PoS tagging
00:47:43 - [keypoint] Probabilistic PoS tagging (simplifying hypotheses, HMM)
01:08:59 - HMM PoS tagging (example)
01:14:43 - HMMs (algorithms and parameter estimation)
01:21:00 - Other approaches and performance
01:25:56 - [keypoint] Summary of the keypoints
Week 6: Hidden Makov Model (HMM) Primer [01:12:06]
00:00:00 - Introduction
00:00:39 - Recap example
00:04:43 - definition of Makov Models
00:09:35 - [keypoint] definition of HMM
00:11:45 - examples
00:18:28 - [keypoint] The 3 basic problems for HMMs
00:27:10 - Presentation of the 1st problem: compute P(w)
00:29:32 - [keypoint] Forward-Backward algorithms (solution to the first problem)
00:40:31 - Presentation of the 2nd problem: compute argmax_T P(T|w)
00:43:11 - [keypoint] Viterbi algorithm (solution to the second problem)
00:49:05 - example (Viterbi algorithm)
00:54:49 - Presentation of the 3rd problem: unsupervised learning: compute argmax_params P(params|w)
00:54:55 - [keypoint] Expectation-Maximization
01:03:04 - Baum-Welch Algorithm (presentation)
01:07:01 - [keypoint] Baum-Welch Algorithm (summary)
01:09:08 - Conclusion: other models
01:10:43 - [keypoint] Summary of the keypoints
01:11:42 - Appendix
Week 7: Textual Data Analysis and Classification [01:11:36]
00:00:00 - Introduction
00:02:09 - Classification: framework
00:07:26 - [keypoint] Textual Data Classification
00:12:24 - Dissimilarity matric
00:13:18 - [keypoint] Usual metrics/similarities
00:16:17 - Classification: complexity
00:17:13 - Classification methods
00:20:13 - [keypoint] Classification methods: how to choose?
00:23:47 - Classification methods: Bayesian approach
00:25:39 - [keypoint] Naive Bayes classifier
00:28:12 - Logistic regression
00:31:37 - K nearest neighbors - Parzen window
00:33:10 - [keypoint] Dendrograms
00:40:10 - [keypoint] K-means
00:45:58 - about Word embedings and Deep Learning
00:47:22 - Classification: evaluation
00:50:53 - Dimensionality reduction: framework
00:56:28 - [keypoint] Principal Components Analysis
01:02:26 - Non-linera projection
01:04:03 - Projection Pursuit
01:04:58 - [keypoint] Mapping: Multidimensional Scaling (+ t-SNE + UMAP)
01:09:03 - [keypoint] Summary of the keypoints
Week 8: Information Retrieval [01:00:50]
00:00:00 - Introduction
00:01:37 - [keypoint] Vector-space model
00:03:12 - [keypoint] Indexing
00:07:20 - Indexing: choice
00:11:08 - Bag of words and weighting schemes
00:14:10 - [keypoint] tf-idf
00:17:15 - [keypoint] cosine similarity
00:20:45 - Information Retrieval
00:22:12 - Relevance?
00:27:26 - Okapi BM25
00:28:40 - Queries
00:32:21 - [keypoint] Precision and Recall
00:41:23 - Limitations of the Vector Space model
00:43:21 - Topic-based models
00:47:10 - Word embeddings
00:57:10 - Evolution of NLP
00:59:52 - [keypoint] Summary of the keypoints
Week 9: Lexical Semantics [01:21:12]
00:00:00 - Introduction
00:01:15 - [keypoint] Lexical vs. Compositional Semantics
00:02:10 - Compositional Semantics
00:03:44 - Usual representations
00:09:32 - Lexical Semantics
00:12:30 - [keypoint] Word sense
00:14:42 - Lexemes
00:19:08 - [keypoint] Semantic Relations
00:20:19 - Homonymy, homophony, homography
00:22:40 - Polysemy
00:28:30 - Synonymy
00:31:19 - Hyponymy/Hypernymy
00:34:46 - Meronymy/Holonymy
00:41:10 - Defining word senses with semantic relations
00:47:57 - Resources for Lexical Semantics: WordNet
00:53:23 - [keypoint] WordNet Synsets
01:02:25 - Application of lexical semantics in language engineering
01:18:54 - [keypoint] Summary of the keypoints
Week 10: Deep-Learning for NLP [02:00:32]
00:00:00 - Introduction
00:09:29 - Neural Word Embeddings
00:43:28 - Recurrent Neural Networks for Sequence Modeling
01:23:53 - Attentive Neural Modeling with Transformers
Week 12: Natural Language Generation [02:07:41]
00:00:00 - Introduction
00:09:27 - Text Generation Task
00:27:09 - Decoding
01:36:15 - Evaluation
Week 14a: Ethics in NLP [01:26:421]
Week 14b: What comes next? [00:36:24]
Annotated slides
Dernière mise à jour le 5 décembre 2024
Last modified: Thu Dec 5, 2024