Transformers Deep Dive — AI 工程课程

01 Why Transformers — The Problems with RNNs

✓ → 02 Self-Attention from Scratch

CODE QUIZ 1 OUTPUTS

✓ → 03 Multi-Head Attention

✓ → 04 Positional Encoding — Sinusoidal, RoPE, ALiBi

✓ → 05 The Full Transformer — Encoder + Decoder

✓ → 06 BERT — Masked Language Modeling

✓ → 07 GPT — Causal Language Modeling

✓ → 08 T5, BART — Encoder-Decoder Models

✓ → 09 Vision Transformers (ViT)

✓ → 10 Audio Transformers — Whisper Architecture

✓ → 11 Mixture of Experts (MoE)

✓ → 12 KV Cache, Flash Attention & Inference Optimization

✓ → 13 Scaling Laws

✓ → 14 Build a Transformer from Scratch — The Capstone

✓ → 15 Attention Variants — Sliding Window, Sparse, Differential

✓ → 16 Speculative Decoding — Draft, Verify, Repeat