Posts tagged cs336

CS336: Lecture 3 - LM Architecture and Hyperparameters
TLDR: LM architecture is a stack of trade-offs across normalization, activations, attention, positional encoding, hyperparameters, stability, and inference cost.
18min read · May 22, 2026
2026 · cs336 · lm-architecture · learning · systems
CS336: Lecture 4 - Mixture of Experts
TLDR: MoE scales parameter count through sparse expert routing, but the real work is balancing tokens, capacity, communication cost, and specialization.
26min read · May 22, 2026
2026 · cs336 · moe · learning · systems
CS336: Lecture 1 - Language Modeling as Engineering
TLDR: Modern LM work is easiest to understand by building the stack yourself, because tokenization, data, compute, and evaluation are all leaky engineering choices.
2min read · May 18, 2026
2026 · cs336 · language-modeling · learning · systems
CS336: Lecture 2 - PyTorch and resource accounting
Lecture 2 is about making training cost concrete: tensors, dtypes, memory, FLOPs, autograd, optimizers, data loading, checkpoints, and mixed precision all have resource prices.
8min read · May 18, 2026
2026 · cs336 · resource-accounting · learning · systems