TENTATIVE SCHEDULE (may change [a bit])
(Lectures and exercises take place in room CM 5.)
Except for weeks 1 and 10 to 14, all the other classes (weeks 2 to 9) is "flipped" and organized as follows:
- we expect you to watch the video of the lecture before the class (thus before Wednesday 08:00);
-
we'll do a 2 hours (8:15-10:00) "review" of the lecture content, made of, first, some review slides, then some "hands-on" (which can be seen as some kind of preparation for quizzes);
- then will follow 2 hours of either pratical
sessions (PS) or exercices/"office hours"(= free questions).
When there is a quiz (weeks 5, 7, 10, 14), everything is shifted by 1 hour. (Notice, btw, the quizzes are not meant to be a preparation for the exam; exercises are; quizzes are only a feedback providing evaluation/comprehension test.)
Week 1 being a "welcome week", it starts with 2 hours of usual lecture, followed by 2 hours of introductory practical session.
Starting from week 10, Pr. Antoine Bosselut will give a series of three lectures on Modern NLP (Deep Learning). These lecture are not pre-recorded and will be taught live. The lessons will be recorded live and available some time afterwards.
Texts in bold (weekly updated) are links to the corresponding course material.
|
Wednesday 08h15 - 09h00 |
Wednesday 09h15 - 10h00 |
Wednesday 10h15 - 11h00 |
Wednesday 11h15 - 12h00 |
1 | 11/09/2024 |
Introduction to NLP
Linguistics Levels in Natural Language Processing (MR) |
PS: Machine Translation (MR+DB) |
2 | 18/09/2024 |
Evaluation in NLP (MR): [slides] [video] [review slides] |
PS: Evaluation (MR+DB) |
3 | 25/09/2024 |
Evaluation in NLP 2 (MR): [slides] [video] [review slides] |
Hands-on evaluation (MR) |
Exercises (exam preparation) & QAS/"office hours" (MR) |
4 | 02/10/2024 |
Word, tokens, n-grams and Language Models (JCC): [slides] [video] [review slides] |
Hands-on n-grams/language models (JCC) |
Exercises & QAS (JCC) |
5 | 09/10/2024 |
Quiz: NLP and evaluation (weeks 1-3, online, 4%) |
Tagging (JCC): [slides] [video] [review slides] |
Hands-on POS tagging (1) (JCC) |
Exercises + QAS (JCC) |
6 | 16/10/2024 |
HMM decoding and learning (Viterbi and EM) (JCC): [slides] [video] [review slides] |
Hands-on POS tagging (2) (JCC) |
PS: PoS Tagging (JCC+DB) |
- | 23/10/2024
|
Fall break |
7 | 30/10/2024 |
Quiz: n-grams + Tagging (weeks 2-5, online, 4%) |
Textual data analysis and classification (JCC): [slides] [video] [review slides] |
Hands-on textual classification (JCC) |
Exercises + QAS (JCC) |
8 | 06/11/2024 |
Vector space Semantics (and Information Retrieval) (JCC): [slides] [video] [review slides] |
Hands-on information retrieval (JCC) |
PS: Text classification (JCC+DB) |
9 | 13/11/2024 |
Quiz: up to classification (weeks 1-7, online, 4%) |
Semantics (MR): [slides] [video] [review slides] |
Hands-on semantics (MR) |
Exercises + QAS (MR) |
10 | 20/11/2024 |
Deep Learning for NLP (no pre-recording) (Pr. A. Bosselut): [slides] [video] |
Transformers Attention Visualization (Pr. A. Bosselut) |
11 | 27/11/2024 |
PS: Semantics with LLMs (1/2) (MR+DB) |
12 | 04/12/2024 |
Generation (no pre-recording) (Pr. A. Bosselut): [slides] [video] |
coding assignment (Pr. A. Bosselut) |
13 | 11/12/2024 |
PS: Semantics with LLMs (2/2) (MR+DB) |
14 | 18/12/2024 |
Quiz: from classification to Generation (incl.) (weeks 7-13, online, 4%) |
Ethics in NLP (no pre-recording) (Pr. A. Bosselut): [slides] [video] |
Future of NLP (no pre-recording) (Pr. A. Bosselut): [slides] [video] |
Videos of the lectures
-
Week 1: Introduction to the course and to NLP (2024 version) [01:32:24]
-
00:00:00 - General presentation of the course (admin, grading, ...)
-
00:05:00 - What is Natural Language Processing (NLP)?
-
00:14:05 - Is this course worth it? (modern vs traditional approaches)
-
00:30:29 - Applications and contraints in NLP
-
00:45:37 - [keypoint] Natural vs. formal languages
-
00:53:10 - Natural language functions
-
01:00:52 - [keypoint] Why is NLP difficult?
1. Importance of appropriate resources
-
01:04:01 - 2. Power laws
-
01:07:32 - 3. & 4. other reasons why NLP is difficult
-
01:12:26 - simplified NLP architecture
-
01:19:18 - [keypoint] Linguistic Processing Levels
-
Week 2: Evaluation in NLP [01:46:24]
-
00:00:00 - Introduction
-
00:01:29 - [keypoint] NLP evaluation protocol
-
00:02:41 - Evaluation campaigns
-
00:04:31 - Example task: classification of linguistic entities
-
00:12:04 - Reference (Gold standard)
-
00:21:10 - [keypoint] Gold standard creation process
-
00:27:17 - Assess the quality of the reference: Inter-annotator agreement
-
00:35:15 - [keypoint] Dealing with chance agreement -- Cohen’s kappa
-
00:41:55 - Inter-annotator agreement -- practices
-
00:44:14 - NLP Systems evaluation: evaluation metrics
-
00:58:47 - [keypoint] Precision and Recall
-
01:14:00 - Example of NLP evaluation on parsing
-
01:25:32 - Other NLP evaluation metrics
-
01:26:34 - [keypoint] Discuss the result: variability of the results
-
01:28:42 - [keypoint] Separating the data: training, validation and test sets
-
01:37:13 - [keypoint] Statistical significance
-
01:43:22 - Conclusion
-
Week 4: Lexicons, n-grams and Language Models [01:37:46]
-
00:00:00 - Introduction
-
00:01:22 - Word or token?
-
00:19:21 - n-grams -- introduction
-
00:26:51 - Probabilities (reminder)
-
00:31:36 - [keypoint] the n-gram approach
-
00:35:28 - Example
-
00:39:33 - Caveat!
-
00:43:12 - Parameters estimation
-
00:48:37 - [keypoint] Smoothing
-
00:52:47 - Additive smoothing = Dirichlet prior
-
01:03:38 - [keypoint] Additive smoothing = Dirichlet prior (summary)
-
01:06:29 - Examples (of Dirichlet distributions)
-
01:14:51 - Language Identification
-
01:21:24 - Spelling Error Correction
-
01:33:14 - [keypoint] Summary of the keypoints
-
Week 5: PoS tagging (and sequence labeling) [01:30:13]
-
00:00:00 - Introduction
-
00:04:43 - [keypoint] Lemmatization (definition)
-
00:12:54 - [keypoint] PoS tagging (definition)
-
00:19:37 - PoS tagging (formalization and examples)
-
00:30:18 - Other sequence labeling tasks
-
00:33:56 - Probabilistic PoS tagging
-
00:47:43 - [keypoint] Probabilistic PoS tagging (simplifying hypotheses, HMM)
-
01:08:59 - HMM PoS tagging (example)
-
01:14:43 - HMMs (algorithms and parameter estimation)
-
01:21:00 - Other approaches and performance
-
01:25:56 - [keypoint] Summary of the keypoints
-
Week 6: Hidden Makov Model (HMM) Primer [01:12:06]
-
00:00:00 - Introduction
-
00:00:39 - Recap example
-
00:04:43 - definition of Makov Models
-
00:09:35 - [keypoint] definition of HMM
-
00:11:45 - examples
-
00:18:28 - [keypoint] The 3 basic problems for HMMs
-
00:27:10 - Presentation of the 1st problem: compute P(w)
-
00:29:32 - [keypoint] Forward-Backward algorithms (solution to the first problem)
-
00:40:31 - Presentation of the 2nd problem: compute argmax_T P(T|w)
-
00:43:11 - [keypoint] Viterbi algorithm (solution to the second problem)
-
00:49:05 - example (Viterbi algorithm)
-
00:54:49 - Presentation of the 3rd problem: unsupervised learning: compute argmax_params P(params|w)
-
00:54:55 - [keypoint] Expectation-Maximization
-
01:03:04 - Baum-Welch Algorithm (presentation)
-
01:07:01 - [keypoint] Baum-Welch Algorithm (summary)
-
01:09:08 - Conclusion: other models
-
01:10:43 - [keypoint] Summary of the keypoints
-
01:11:42 - Appendix
-
Week 7: Textual Data Analysis and Classification [01:11:36]
-
00:00:00 - Introduction
-
00:02:09 - Classification: framework
-
00:07:26 - [keypoint] Textual Data Classification
-
00:12:24 - Dissimilarity matric
-
00:13:18 - [keypoint] Usual metrics/similarities
-
00:16:17 - Classification: complexity
-
00:17:13 - Classification methods
-
00:20:13 - [keypoint] Classification methods: how to choose?
-
00:23:47 - Classification methods: Bayesian approach
-
00:25:39 - [keypoint] Naive Bayes classifier
-
00:28:12 - Logistic regression
-
00:31:37 - K nearest neighbors - Parzen window
-
00:33:10 - [keypoint] Dendrograms
-
00:40:10 - [keypoint] K-means
-
00:45:58 - about Word embedings and Deep Learning
-
00:47:22 - Classification: evaluation
-
00:50:53 - Dimensionality reduction: framework
-
00:56:28 - [keypoint] Principal Components Analysis
-
01:02:26 - Non-linera projection
-
01:04:03 - Projection Pursuit
-
01:04:58 - [keypoint] Mapping: Multidimensional Scaling (+ t-SNE + UMAP)
-
01:09:03 - [keypoint] Summary of the keypoints
-
Week 8: Information Retrieval [01:00:50]
-
00:00:00 - Introduction
-
00:01:37 - [keypoint] Vector-space model
-
00:03:12 - [keypoint] Indexing
-
00:07:20 - Indexing: choice
-
00:11:08 - Bag of words and weighting schemes
-
00:14:10 - [keypoint] tf-idf
-
00:17:15 - [keypoint] cosine similarity
-
00:20:45 - Information Retrieval
-
00:22:12 - Relevance?
-
00:27:26 - Okapi BM25
-
00:28:40 - Queries
-
00:32:21 - [keypoint] Precision and Recall
-
00:41:23 - Limitations of the Vector Space model
-
00:43:21 - Topic-based models
-
00:47:10 - Word embeddings
-
00:57:10 - Evolution of NLP
-
00:59:52 - [keypoint] Summary of the keypoints
Week 9: Lexical Semantics [01:21:12]
-
00:00:00 - Introduction
-
00:01:15 - [keypoint] Lexical vs. Compositional Semantics
-
00:02:10 - Compositional Semantics
-
00:03:44 - Usual representations
-
00:09:32 - Lexical Semantics
-
00:12:30 - [keypoint] Word sense
-
00:14:42 - Lexemes
-
00:19:08 - [keypoint] Semantic Relations
-
00:20:19 - Homonymy, homophony, homography
-
00:22:40 - Polysemy
-
00:28:30 - Synonymy
-
00:31:19 - Hyponymy/Hypernymy
-
00:34:46 - Meronymy/Holonymy
-
00:41:10 - Defining word senses with semantic relations
-
00:47:57 - Resources for Lexical Semantics: WordNet
-
00:53:23 - [keypoint] WordNet Synsets
-
01:02:25 - Application of lexical semantics in language engineering
-
01:18:54 - [keypoint] Summary of the keypoints
-
Week 10: Deep-Learning for NLP [02:00:32]
-
00:00:00 - Introduction
-
00:09:29 - Neural Word Embeddings
-
00:43:28 - Recurrent Neural Networks for Sequence Modeling
-
01:23:53 - Attentive Neural Modeling with Transformers
-
Week 12: Natural Language Generation [02:07:41]
-
00:00:00 - Introduction
-
00:09:27 - Text Generation Task
-
00:27:09 - Decoding
-
01:36:15 - Evaluation
-
Week 14a: Ethics in NLP [01:26:421]
-
Week 14b: What comes next? [00:36:24]
Annotated slides
Dernière mise à jour le 5 décembre 2024
Last modified: Thu Dec 5, 2024