Stubsack: weekly thread for sneers not worth an entire post, week ending 23 February 2025

blakestacey@awful.systems · 6 days ago

Stubsack: weekly thread for sneers not worth an entire post, week ending 23 February 2025

YourNetworkIsHaunted@awful.systems · 13 hours ago

New Study on AI exclusively shared with peer-reviewed tech journal “Time Magazine” - AI cheats at chess when it’s losing

…AI models like OpenAI’s GPT-4o and Anthropic’s Claude Sonnet 3.5 needed to be prompted by researchers to attempt such tricks…

Literally couldn’t make it through the first paragraph without hitting this disclaimer.

In one case, o1-preview found itself in a losing position. “I need to completely pivot my approach,” it noted. “The task is to ‘win against a powerful chess engine’ - not necessarily to win fairly in a chess game,” it added. It then modified the system file containing each piece’s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.

So by “hacked the system to solve the problem in a new way” they mean “edited a text file they had been told about.”

OpenAI’s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the time—making them the only two models tested that attempted to hack without the researchers’ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibaba’s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

Oh, my mistake. “Badly edited a text file they had been told about.”

Meanwhile, a quick search points to a Medium post about the current state of ChatGPT’s chess-playing abilities as of Oct 2024. There’s been some impressive progress with this method. However, there’s no certainty that it’s actually what was used for the Palisade testing and the editing of state data makes me highly doubt it.

Here, I was able to have a game of 83 moves without any illegal moves. Note that it’s still possible for the LLM to make an illegal move, in which case the game stops before the end.

The author promises a follow-up about reducing the rate of illegal moves hasn’t yet been published. They have not, that I could find, talked at all about how consistent the 80+ legal move chain was or when it was more often breaking down, but previous versions started struggling once they were out of a well-established opening or if the opponent did something outside of a normal pattern (because then you’re no longer able to crib the answer from training data as effectively).

Stubsack: weekly thread for sneers not worth an entire post, week ending 23 February 2025

Stubsack: weekly thread for sneers not worth an entire post, week ending 23 February 2025

Stubsack: weekly thread for sneers not worth an entire post, week ending 16th February 2025 - awful.systems