I wrote this doc in December 2021, while working at Redwood Research. It summarizes a handful of observations about GPT-2’s weights — mostly the embedding matrix, but also the LayerNorm gain parameters — that I found while doing some open-ended investigation of the model. I wanted to see how much I could learn by studying just those parameters, without looking at the attention layers, MLP layers, or activations.
The rest of this post is available on Alignment Forum and LessWrong.