An exploration of GPT-2’s embedding weights

I wrote this doc in December 2021, while working at Redwood Research. It summarizes a handful of observations about GPT-2’s weights — mostly the embedding matrix, but also the LayerNorm gain parameters — that I found while doing some open-ended investigation of the model. I wanted to see how much I could learn by studying just those parameters, without looking at the attention layers, MLP layers, or activations.

The rest of this post is available on Alignment Forum and LessWrong.

Leave a comment