Posts tagged cs336
-
CS336 Lecture 3:LM Architecture 与 Hyperparameters
TLDR: LM architecture is a stack of trade-offs across normalization, activations, attention, positional encoding, hyperparameters, stability, and inference cost.
-
CS336 Lecture 4:Mixture of Experts
TLDR: MoE scales parameter count through sparse expert routing, but the real work is balancing tokens, capacity, communication cost, and specialization.
-
CS336 Lecture 1:Language Modeling as Engineering
TLDR: Modern LM work is easiest to understand by building the stack yourself, because tokenization, data, compute, and evaluation are all leaky engineering choices.
-
CS336 Lecture 2:PyTorch 与 Resource Accounting
TLDR: Before training a model, PyTorch tensors, memory, FLOPs, and profiling have to become concrete enough that architecture choices have real resource prices.