An exploration of GPT-2’s embedding weights

I wrote this doc in December 2021, while working at Redwood Research. It summarizes a handful of observations about GPT-2’s weights — mostly the embedding matrix, but also the LayerNorm gain parameters — that I found while doing some open-ended investigation of the model. I wanted to see how much I could learn by studying just those parameters, without looking at the attention layers, MLP layers, or activations.

The rest of this post is available on Alignment Forum and LessWrong.

This entry was posted in Uncategorized. Bookmark the permalink.

An exploration of GPT-2’s embedding weights

Leave a comment Cancel reply

Recent Posts

Top Posts Today

Archives

Tweets

Email Subscription

Meta

An exploration of GPT-2’s embedding weights

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Top Posts Today

Archives

Tweets

Email Subscription