← All posts

Category: systems

Crafting Interpreters: Chapter 4 - Scanning
Scanning is the first structural boundary in an interpreter: raw characters become tokens, so the parser can work with language units instead of individual bytes.
5min read · May 25, 2026
2026 · crafting-interpreters · interpreters · scanning · learning · systems
CS336: Lecture 3 - LM Architecture and Hyperparameters
TLDR: LM architecture is a stack of trade-offs across normalization, activations, attention, positional encoding, hyperparameters, stability, and inference cost.
18min read · May 22, 2026
2026 · cs336 · lm-architecture · learning · systems
CS336: Lecture 4 - Mixture of Experts
TLDR: MoE scales parameter count through sparse expert routing, but the real work is balancing tokens, capacity, communication cost, and specialization.
26min read · May 22, 2026
2026 · cs336 · moe · learning · systems
Compression Is All You Need: measuring mathematical progress
TLDR: A mathematical abstraction is valuable when it compresses downstream work: proofs become shorter, repeated patterns disappear, and the library becomes easier to extend.
3min read · May 21, 2026
2026 · mathematical-progress · evaluation · reading · systems
Heuristic Learning: maintaining a learning system in code
TLDR: Heuristic Learning treats iterative agent work as maintaining a verifiable software system. Feedback updates code, tests, rules, state representations, and memory rather than neural network weights.
4min read · May 21, 2026
2026 · heuristic-learning · learning-systems · reading · systems
CS336: Lecture 1 - Language Modeling as Engineering
TLDR: Modern LM work is easiest to understand by building the stack yourself, because tokenization, data, compute, and evaluation are all leaky engineering choices.
2min read · May 18, 2026
2026 · cs336 · language-modeling · learning · systems
CS336: Lecture 2 - PyTorch and resource accounting
Lecture 2 is about making training cost concrete: tensors, dtypes, memory, FLOPs, autograd, optimizers, data loading, checkpoints, and mixed precision all have resource prices.
8min read · May 18, 2026
2026 · cs336 · resource-accounting · learning · systems
AMP: automatic mixed precision as a dispatch policy
TLDR: AMP is not "turn the model into half precision." It is a runtime policy that runs safe, high-throughput ops in lower precision while protecting numerically sensitive paths.
4min read · May 18, 2026
2026 · mixed-precision · gpu-systems · reading · systems