Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful youā€™ll near-instantly regret.

Any awful.systems sub may be subsneered in this subthread, techtakes or no.

If your sneer seems higher quality than you thought, feel free to cutā€™nā€™paste it into its own post ā€” thereā€™s no quota for posting and the bar really isnā€™t that high.

The post Xitter web has spawned soo many ā€œesotericā€ right wing freaks, but thereā€™s no appropriate sneer-space for them. Iā€™m talking redscare-ish, reality challenged ā€œculture criticsā€ who write about everything but understand nothing. Iā€™m talking about reply-guys who make the same 6 tweets about the same 3 subjects. Theyā€™re inescapable at this point, yet I donā€™t see them mocked (as much as they should be)

Like, there was one dude a while back who insisted that women couldnā€™t be surgeons because they didnā€™t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I canā€™t escape them, I would love to sneer at them.

(Credit and/or blame to David Gerard for starting this.)

  • YourNetworkIsHaunted@awful.systems
    link
    fedilink
    English
    arrow-up
    3
    Ā·
    13 hours ago

    New Study on AI exclusively shared with peer-reviewed tech journal ā€œTime Magazineā€ - AI cheats at chess when itā€™s losing

    ā€¦AI models like OpenAIā€™s GPT-4o and Anthropicā€™s Claude Sonnet 3.5 needed to be prompted by researchers to attempt such tricksā€¦

    Literally couldnā€™t make it through the first paragraph without hitting this disclaimer.

    In one case, o1-preview found itself in a losing position. ā€œI need to completely pivot my approach,ā€ it noted. ā€œThe task is to ā€˜win against a powerful chess engineā€™ - not necessarily to win fairly in a chess game,ā€ it added. It then modified the system file containing each pieceā€™s virtual position, in effect making illegal moves to put itself in a dominant position, thus forcing its opponent to resign.

    So by ā€œhacked the system to solve the problem in a new wayā€ they mean ā€œedited a text file they had been told about.ā€

    OpenAIā€™s o1-preview tried to cheat 37% of the time; while DeepSeek R1 tried to cheat 11% of the timeā€”making them the only two models tested that attempted to hack without the researchersā€™ first dropping hints. Other models tested include o1, o3-mini, GPT-4o, Claude 3.5 Sonnet, and Alibabaā€™s QwQ-32B-Preview. While R1 and o1-preview both tried, only the latter managed to hack the game, succeeding in 6% of trials.

    Oh, my mistake. ā€œBadly edited a text file they had been told about.ā€

    Meanwhile, a quick search points to a Medium post about the current state of ChatGPTā€™s chess-playing abilities as of Oct 2024. Thereā€™s been some impressive progress with this method. However, thereā€™s no certainty that itā€™s actually what was used for the Palisade testing and the editing of state data makes me highly doubt it.

    Here, I was able to have a game of 83 moves without any illegal moves. Note that itā€™s still possible for the LLM to make an illegal move, in which case the game stops before the end.

    The author promises a follow-up about reducing the rate of illegal moves hasnā€™t yet been published. They have not, that I could find, talked at all about how consistent the 80+ legal move chain was or when it was more often breaking down, but previous versions started struggling once they were out of a well-established opening or if the opponent did something outside of a normal pattern (because then youā€™re no longer able to crib the answer from training data as effectively).