Spring 2026 @ Fudan University
This course covers the foundations and modern frontiers of Natural Language Processing (NLP), with a heavy emphasis on Large Language Models (LLMs). You will learn the modern pipeline of building effective LLMs from basic tokenization to training, fine-tuning, and deploying modern LLM architectures.
All standard homework assignments are completed by Week 12. The final month (Weeks 13–16) is dedicated exclusively to the Course Project. Please submit your homework at https://elearning.fudan.edu.cn/
In our first lecture, we introduce text preprocessing, including tokenization (BPE/WordPiece), and vocab design.
- Thu., 1:30 pm – 4:10 pm, 03/05/2026
- Slides: Lecture 01 slides
- Readings:
- Excercise: lecture-01-exercise-tokenization.ipynb
Release Assignment 1
In this lecture, we introduce the concept of MLE, Smoothing, Perplexity, and Language Modeling basics.
In this lecture, we introduce text classification, Word2Vec, Distributional Hypothesis, and Intrinsic/Extrinsic evaluations.
- Slides: Lecture 03 slides
- Readings:
- Excercise: lecture-03-embeddings.ipynb
In this lecture, we introduce neural networks and how to build NN models for sequence learning problems. We will discuss some classic models like LSTM and how the encoder-decoder style models developed and why the attention is a effective component adding to encoder-decoder model.
- Slides: Lecture 04 slides
- Readings:
- Excercise: lecture-04-neural-lms.ipynb
In this lecture, we introduce the Transformer architecture.
In this lecture, we introduce tpyical pretrained LLMs such as GPT-series.
- Slides: Lecture 06 slides
- Readings:
- Excercise: lecture-06-gpts.ipynb
In this lecture, we introduce evaluation datasets and tasks for LLMs.
- Slides: Lecture 07 slides
- Readings:
- Excercise: lecture-07-benchmarks.ipynb
Project Proposal Due
In this lecture, we introduce BERT and the bidirectional Transformer encoder architecture trained via masked language modeling. Unlike causal LMs, BERT produces contextual embeddings that can be fine-tuned for downstream tasks such as classification and named entity recognition.
In this lecture, we introduce post-training techniques including supervised fine-tuning (SFT), reward modeling (RM), and reinforcement learning from human feedback via PPO. We also cover alignment, instruction tuning, and test-time compute scaling.
In this lecture, we introduce dense retrieval and Retrieval-Augmented Generation (RAG). RAG augments a language model with a non-parametric memory — a retrievable document index — so that answers can be grounded in up-to-date, verifiable sources without retraining the model.
- Slides: Lecture 10 slides (PDF)
- Readings:
- Excercise: lecture-10-rag.ipynb
Each group presents preliminary results in class on 05/21/2026.
- Format: up to 8 minutes per group — 7 min presentation + 1 min Q&A
- Content: related work investigation, your approach, and preliminary results
- Full project details
Final Report — Due 06/25/2026, 23:59 (Week 17)
- Written in English or Chinese
- Maximum 7 pages of main content (excluding references)
- Format: ACL template
- Submit via elearning
Grading breakdown
- Proposal (5%): clarity, feasibility, relevance, innovation
- Presentation (20%): clarity, related work coverage
- Programming & algorithm (25%): reasonableness and soundness
- Performance (20%): results and analysis
- Report (30%): organization, analysis, discussion
RLHF (PPO/DPO), Safety barriers, Red-teaming
KV Caching, Quantization (Int8/FP4), Latency/Throughput
Multimodal LLMs, Diffusion LMs, Future Directions