Transformers Deep Dive
16 个课时
01 Why Transformers — The Problems with RNNs
CODE 1 OUTPUTS
✓ → 02 Self-Attention from Scratch CODE QUIZ 1 OUTPUTS
✓ → 03 Multi-Head Attention CODE 1 OUTPUTS
✓ → 04 Positional Encoding — Sinusoidal, RoPE, ALiBi CODE 1 OUTPUTS
✓ → 05 The Full Transformer — Encoder + Decoder CODE 1 OUTPUTS
✓ → 06 BERT — Masked Language Modeling CODE 1 OUTPUTS
✓ → 07 GPT — Causal Language Modeling CODE 1 OUTPUTS
✓ → 08 T5, BART — Encoder-Decoder Models CODE 1 OUTPUTS
✓ → 09 Vision Transformers (ViT) CODE 1 OUTPUTS
✓ → 10 Audio Transformers — Whisper Architecture CODE 1 OUTPUTS
✓ → 11 Mixture of Experts (MoE) CODE 1 OUTPUTS
✓ → 12 KV Cache, Flash Attention & Inference Optimization CODE 1 OUTPUTS
✓ → 13 Scaling Laws CODE 1 OUTPUTS
✓ → 14 Build a Transformer from Scratch — The Capstone CODE 1 OUTPUTS
✓ → 15 Attention Variants — Sliding Window, Sparse, Differential CODE 1 OUTPUTS
✓ → 16 Speculative Decoding — Draft, Verify, Repeat CODE 1 OUTPUTS
✓ →