Posts tagged rl
-
Sutton RL: Chapter 6 - Temporal-Difference Learning
TLDR: TD learning updates from partial experience by bootstrapping current value estimates, combining Monte Carlo sampling with dynamic-programming-style updates.
-
Sutton RL: Chapter 5 - Monte Carlo Methods
TLDR: Monte Carlo methods learn value from complete sampled episodes, trading model-free simplicity for delayed updates and return variance.
-
Sutton RL: Day 2 - Multi-Armed Bandits
TLDR: Multi-armed bandits isolate the exploration/exploitation problem by removing state transitions and making action-value estimation the center.
-
Sutton RL: Day 3 - Dynamic Programming
Dynamic programming is the model-based starting point of reinforcement learning: with known MDP dynamics, Bellman equations become iterative value and policy update rules.
-
Sutton RL: Day 1 - RL Problem and MDP Basics
TLDR: RL is interaction for long-term reward: policy chooses actions, reward gives feedback, value estimates future return, and Bellman equations connect the pieces.