Adam Scherlis

Parameter Space: The Final Frontier

Skip to content

Home
About
Contact

← A Generalization of ROC AUC for Binary Classifiers

Fun with bitstrings and bijections →

[Linkpost] Gradient Hacking via Schelling Goals

Posted on December 28, 2021 | Leave a comment

This is a somewhat technical / context-heavy AI alignment post:

https://www.alignmentforum.org/posts/A9eAPjpFjPwNW2rku/gradient-hacking-via-schelling-goals

There are some comments on the mirrored post on LessWrong:

https://www.lesswrong.com/posts/A9eAPjpFjPwNW2rku/gradient-hacking-via-schelling-goals

Share this:

Facebook
X

Like Loading...

Related

This entry was posted in Uncategorized and tagged alignment, ML. Bookmark the permalink.

← A Generalization of ROC AUC for Binary Classifiers

Fun with bitstrings and bijections →

Leave a comment Cancel reply

Δ

Search for:
RSS - Posts
Recent Posts
- New blog March 25, 2025
- Two Percolation Puzzles July 4, 2023
- GPT-175bee February 8, 2023
- How to export Android Chrome tabs to an HTML file in Linux (as of February 2023) February 1, 2023
- Inner Misalignment in “Simulator” LLMs February 1, 2023
- Fun math facts about 2023 January 1, 2023
- A hundredth of a bit of extra entropy December 24, 2022
- An exploration of GPT-2’s embedding weights December 16, 2022
- A brainteaser for language models December 11, 2022
- New Frontiers in Mojibake November 25, 2022
- Cryptic symbols October 28, 2022
- One is (almost) normal in base π June 30, 2022
- Numbers in base φ, π, and so on June 30, 2022
- Non-unique decimals, 0.999…, and normal numbers (in base ten) June 30, 2022
- [Linkpost] Understanding the two-head strategy for teaching ML to answer questions honestly January 11, 2022
- Fun with bitstrings and bijections January 1, 2022
- [Linkpost] Gradient Hacking via Schelling Goals December 28, 2021
- A Generalization of ROC AUC for Binary Classifiers December 4, 2021
- What’s the weirdest way to win this game? November 14, 2021
- Defining the New Kilogram November 16, 2018
Top Posts Today
Archives
- March 2025 (1)
- July 2023 (1)
- February 2023 (3)
- January 2023 (1)
- December 2022 (3)
- November 2022 (1)
- October 2022 (1)
- June 2022 (3)
- January 2022 (2)
- December 2021 (2)
- November 2021 (1)
- November 2018 (1)
- August 2018 (2)
- January 2013 (1)
- October 2011 (1)
- June 2011 (1)
- April 2011 (1)
- February 2011 (1)
- December 2010 (1)
- October 2010 (2)
- September 2010 (3)
- August 2010 (13)
Tweets
Tweets by ascherlis
Email Subscription

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Email Address:

Join 7 other subscribers

Meta

Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy

Comment
Reblog
Subscribe Subscribed
- Adam Scherlis
- Already have a WordPress.com account? Log in now.

%d