Category: learning
-
Sutton RL Chapter 6:Temporal-Difference Learning
TLDR: TD learning updates from partial experience by bootstrapping current value estimates, combining Monte Carlo sampling with dynamic-programming-style updates.
-
Sutton RL Chapter 5:Monte Carlo Methods
TLDR: Monte Carlo methods learn value from complete sampled episodes, trading model-free simplicity for delayed updates and return variance.
-
Sutton RL Day 2:Multi-Armed Bandits
TLDR: Multi-armed bandits isolate the exploration/exploitation problem by removing state transitions and making action-value estimation the center.
-
Sutton RL Day 3:Dynamic Programming
TLDR: Dynamic programming turns known MDP dynamics into iterative policy evaluation and improvement through Bellman updates.
-
Sutton RL Day 1:RL Problem 与 MDP 基础
TLDR: RL is interaction for long-term reward: policy chooses actions, reward gives feedback, value estimates future return, and Bellman equations connect the pieces.
-
Crafting Interpreters 第 2 章笔记:A Map of the Territory
TLDR: This note maps the interpreter pipeline from source text through tokens, parsing, semantic analysis, code generation, and runtime choices.
-
Crafting Interpreters 第 3 章笔记:The Lox Language
TLDR: Lox is the small language that carries the book: expressive enough for classes, closures, and control flow, but compact enough to implement twice.
-
Crafting Interpreters 第 4 章笔记:Scanning
TLDR: Scanning is the first hard boundary in an interpreter: raw characters become tokens, and the rest of the language pipeline finally has structure to work with.