I came up with the following puzzle the other day:
Q: Solve the puzzle: 63 = x = 65536 A: x =
The intended answer is in the form of a number.
text-davinci-003 guesses my intended answer at 11.8% probability, which is the second-highest probability for any answer.
(This is somewhat cherry-picked; small changes to the phrasing give worse results. ChatGPT gave the intended answer the third time I asked it, but this appears to have been dumb luck. The true rate for ChatGPT is probably below 10%, and maybe below 5%.)
So far, friends have found it fairly difficult. About two dozen people made at least one guess, and at least six spent a while on it. So far, two people have figured it out, in both cases after being told that GPT-3.5 could do it.
For hints, the answer, and an explanation of why GPT is better at this than people are, see the LessWrong version of this post.
(WordPress doesn’t have spoiler tags.)