ijeff@lemdro.idM to AI Stuff@lemdro.idEnglish · edit-21 year agoUniversal and Transferable Attacks on Aligned Language Models - Carnegie Mellon Universityllm-attacks.orgexternal-linkmessage-square2fedilinkarrow-up112arrow-down12file-textcross-posted to: ai_infosec@infosec.pubhackernews@derp.footechnews@radiation.party
arrow-up110arrow-down1external-linkUniversal and Transferable Attacks on Aligned Language Models - Carnegie Mellon Universityllm-attacks.orgijeff@lemdro.idM to AI Stuff@lemdro.idEnglish · edit-21 year agomessage-square2fedilinkfile-textcross-posted to: ai_infosec@infosec.pubhackernews@derp.footechnews@radiation.party
Coverage: A New Attack Impacts Major AI Chatbots—and No One Knows How to Stop It - Wired Researchers discover new vulnerability in large language models - TechXplore Keeping the Baby While Losing the Bathwater: AI’s Efficiencies and Concerns Collide - Pymnts
minus-squarefidodo@lemmy.worldlinkfedilinkEnglisharrow-up1·1 year agoCouldn’t you just do a simple input classifier step to detect if there’s nonsense strings in the user input and then not respond? You could even just use a simplistic algorithm to detect weird input strings.
minus-squareijeff@lemdro.idOPMlinkfedilinkEnglisharrow-up1·1 year agoBing has a separate layer that attempts to step in to filter things, but false positives end up being pretty disruptive.
Couldn’t you just do a simple input classifier step to detect if there’s nonsense strings in the user input and then not respond? You could even just use a simplistic algorithm to detect weird input strings.
Bing has a separate layer that attempts to step in to filter things, but false positives end up being pretty disruptive.