Tag Archives: alignment

[Linkpost] Gradient Hacking via Schelling Goals

This is a somewhat technical / context-heavy AI alignment post:

https://www.alignmentforum.org/posts/A9eAPjpFjPwNW2rku/gradient-hacking-via-schelling-goals

There are some comments on the mirrored post on LessWrong:

https://www.lesswrong.com/posts/A9eAPjpFjPwNW2rku/gradient-hacking-via-schelling-goals