NLP & LLMs

Outline

Course Overview
- Introduction
- Basic Info
Assignments and Course Project
- Assignment 1. Foundations of Text
- Course Project
Coursework
Resources and References
GPU Resources
Weekly Schedule

Course Overview

Introduction

This course covers the foundations and modern frontiers of Natural Language Processing (NLP), with a heavy emphasis on Large Language Models (LLMs). You will learn the modern pipeline of building effective LLMs from basic tokenization to training, fine-tuning, and deploying modern LLM architectures.

Prerequisites: No formal prerequisites. For Fudan students, it’s safe to take this course in your second year.

Basic Info

Course ID: CS40008.01: NLP and LLMs
Course Info: Spring 2026, Fudan University
Instructor: Baojian Zhou, Email: bjzhou AT fudan.edu.cn
TAs:
- Binbin Huang
- Runze Wang
Time: 03/05/2026–06/18/2026, Thu., 1:30 pm – 4:10 pm, Fudan Calendar
Location: HGX104
Office Hours: 10:00 am – 1:00 pm, Wed & C611

Assignments and Course Project

All standard homework assignments are completed by Week 12. The final month (Weeks 13–16) is dedicated exclusively to the Course Project. Please submit your homework at https://elearning.fudan.edu.cn/

Assignment 1. Foundations of Text

Due: Week 5 (04/02/2026 23:59 pm)
Deliverable: Please check out it at elearning.

Course Project

Timeline: Weeks 10–16 (Dedicated time)
Teams: 3-5 Students (3 members are strongly recommended).
Deliverables:
- Week 7: Make your team members ready
- Week 9: 1-page Project Proposal (Problem, Dataset, Baselines).
- Week 12: Status Check / Preliminary Results (Presentation).
- Week 17: Final Report / Code submission
Course project details

Coursework

Participation (5%)
Exercise (15%)
Assignments (40-45%)
Final Project (35-40%)

Resources and References

Textbook & Readings
- Speech and Language Processing (Jurafsky & Martin, 3rd Ed. Draft)
- Natural Language Processing: Neural Networks and Large Language Models
Computing & Communication
- University Cluster: https://cfff.fudan.edu.cn/home
- We will use Qizhi: http://qz.cfff.fudan.edu.cn/
- Course Repo: baojian/llm-26
- Academic Integrity: Please check our policies.
AI Policy Use of AI coding assistants is permitted. However, you should explicitly attribute the use of AI in your assignment.

GPU Resources

Please register CFFF an account at Fudan CFFF.

Weekly Schedule

Week 1 Introduction to LLMs

In our first lecture, we introduce text preprocessing, including tokenization (BPE/WordPiece), and vocab design.

Thu., 1:30 pm – 4:10 pm, 03/05/2026

Slides: Lecture 01 slides

Readings:

JM Book Chapter 2

Advances in NLP

Human Language Understanding and Reasoning

Scaling Laws with Vocabulary (2407.13623v1)

Getting the most out of your tokenizer for pre-training

Excercise: lecture-01-exercise-tokenization.ipynb

Release Assignment 1

Week 2 N-Gram Language Models

In this lecture, we introduce the concept of MLE, Smoothing, Perplexity, and Language Modeling basics.

Slides: Lecture 02 slides

Readings:

JM Book Chapter 3

N-gram smoothing

First version of Google Translate (2006)

Progress of language modeling (2009)

Application of Infinity-gram model

Excercise: lecture-02-ngram-lms.ipynb

Week 3 Word Embeddings

In this lecture, we introduce text classification, Word2Vec, Distributional Hypothesis, and Intrinsic/Extrinsic evaluations.

Slides: Lecture 03 slides

Readings:

JM Book Chapter 4-5

Word2vec (2013)

CBOW (2013)

Qwen Embeddings (2025)

Gemini Embeddings (2025)

Excercise: lecture-03-embeddings.ipynb

Week 4 Neural LMs

In this lecture, we introduce neural networks and how to build NN models for sequence learning problems. We will discuss some classic models like LSTM and how the encoder-decoder style models developed and why the attention is a effective component adding to encoder-decoder model.

Slides: Lecture 04 slides

Readings:

JM Book Chapter 6,13

NPLM (2003)

Revisiting NPLM (2021)

LSTM (1997)

Attention RNN (2015)

Excercise: lecture-04-neural-lms.ipynb

Week 5 & 6 Attention Mechanisms and Transformer

In this lecture, we introduce the Transformer architecture.

Slides: Lecture 05 slides

Readings:

JM Book Chapter 7,8

Decomposable Attention (2017)

Transformer (2017)

Transformer in Equation (2019)

Excercise: lecture-05-transformers.ipynb

Week 7 LLM Pretraining (GPT)

In this lecture, we introduce tpyical pretrained LLMs such as GPT-series.

Slides: Lecture 06 slides

Readings:

JM Book Chapter 7

GPT1 (2018)

GPT2 (2019)

GPT3 (2020)

GPT4 (2023)

GPT5 (2025)

Excercise: lecture-06-gpts.ipynb

Week 8 Evaluations and Benchmarks

In this lecture, we introduce evaluation datasets and tasks for LLMs.

Slides: Lecture 07 slides

Readings:

JM Book Chapter 7

Large-Scale Adversarial Dataset (SWAG) (2018)

Conversational QA (CoQA) (2018)

General Language Understanding Evaluation benchmark (GLUE) (2018)

Supper GLUE (2019)

HellaSwag Dataset (2019)

Measuring Massive Multitask Language Understanding (MMLU) (2020)

Benchmarks Survey (2025)

Excercise: lecture-07-benchmarks.ipynb

Project Proposal Due

Week 9 BERT and Post-training

In this lecture, we introduce BERT and the bidirectional Transformer encoder architecture trained via masked language modeling. Unlike causal LMs, BERT produces contextual embeddings that can be fine-tuned for downstream tasks such as classification and named entity recognition.

Readings:

JM Book Chapter 9: Masked Language Models

BERT (2018)

SpanBERT (2020)

Excercise: lecture-07-bert-benchmarks.ipynb

Week 10 Post-training (SFT, RM, and PPO)

In this lecture, we introduce post-training techniques including supervised fine-tuning (SFT), reward modeling (RM), and reinforcement learning from human feedback via PPO. We also cover alignment, instruction tuning, and test-time compute scaling.

Readings:

JM Book Chapter 10: Post-training: Instruction Tuning, Alignment, and Test-Time Compute

Chain-of-Thought Prompting (2022)

InstructGPT (2022)

Direct Preference Optimization (DPO) (2023)

DeepSeekMath (2024)

DeepSeek-R1 (2025)

Excercise: lecture-08-sft-rm-ppo.ipynb

Week 11 Information Retrieval and Retrieval-Augmented Generation

In this lecture, we introduce dense retrieval and Retrieval-Augmented Generation (RAG). RAG augments a language model with a non-parametric memory — a retrievable document index — so that answers can be grounded in up-to-date, verifiable sources without retraining the model.

Slides: Lecture 10 slides (PDF)

Readings:

JM Book Chapter 11: Information Retrieval

Excercise: lecture-10-rag.ipynb

Week 12 Course Project Presentation

Each group presents preliminary results in class on 05/21/2026.

Format: up to 8 minutes per group — 7 min presentation + 1 min Q&A

Content: related work investigation, your approach, and preliminary results

Full project details

Final Report — Due 06/25/2026, 23:59 (Week 17)

Written in English or Chinese

Maximum 7 pages of main content (excluding references)

Format: ACL template

Submit via elearning

Grading breakdown

Proposal (5%): clarity, feasibility, relevance, innovation

Presentation (20%): clarity, related work coverage

Programming & algorithm (25%): reasonableness and soundness

Performance (20%): results and analysis

Report (30%): organization, analysis, discussion

Week 13 Diffusion Language Models

Week 14 Alignment & Safety

RLHF (PPO/DPO), Safety barriers, Red-teaming

Week 15 Efficiency & Systems

KV Caching, Quantization (Int8/FP4), Latency/Throughput

Week 16 Agents and Frontiers

Multimodal LLMs, Diffusion LMs, Future Directions