Posts tagged lm-architecture
-
CS336: Lecture 3 - LM Architecture and Hyperparameters
TLDR: LM architecture is a stack of trade-offs across normalization, activations, attention, positional encoding, hyperparameters, stability, and inference cost.
TLDR: LM architecture is a stack of trade-offs across normalization, activations, attention, positional encoding, hyperparameters, stability, and inference cost.