Reinforcement Learning
12 个课时
01 MDPs, States, Actions & Rewards
CODE 1 OUTPUTS
✓ → 02 Dynamic Programming — Policy Iteration & Value Iteration CODE 1 OUTPUTS
✓ → 03 Monte Carlo Methods — Learning from Complete Episodes CODE 1 OUTPUTS
✓ → 04 Temporal Difference — Q-Learning & SARSA CODE 1 OUTPUTS
✓ → 05 Deep Q-Networks (DQN) CODE 1 OUTPUTS
✓ → 06 Policy Gradient — REINFORCE from Scratch CODE 1 OUTPUTS
✓ → 07 Actor-Critic — A2C and A3C CODE 1 OUTPUTS
✓ → 08 Proximal Policy Optimization (PPO) CODE 1 OUTPUTS
✓ → 09 Reward Modeling & RLHF CODE 1 OUTPUTS
✓ → 10 Multi-Agent RL CODE 1 OUTPUTS
✓ → 11 Sim-to-Real Transfer CODE 1 OUTPUTS
✓ → 12 RL for Games — AlphaZero, MuZero, and the LLM-Reasoning Era CODE 1 OUTPUTS
✓ →