'Indiana Jones' jailbreak approach highlights the vulnerabilities of existing LLMs

ooli2@lemm.ee · 11 hours ago

'Indiana Jones' jailbreak approach highlights the vulnerabilities of existing LLMs

Hylactor · 10 hours ago

Why is it called Indiana Jones? I read the whole article and the abstract to the research paper and did not see an answer?

Uruanna@lemmy.world · edit-2 1 hour ago

Based on not reading anything but the title of this post and the image, I figure that it refers to the “swapping the golden idol with a bag of dust” scene, swapping the real question with a decoy to get away with what you want while the LLM thinks it has followed the rules.

Don’t worry about the giant boulder.

ooli2@lemm.ee · 1 hour ago

May be because, like Indy replace the mayan artefact by a stone to avoid the trap, this jailbreak replace the request about crime, with similarly loaded one about crime history

Donkter@lemmy.world · 10 hours ago

Maybe just a silly name referencing him “breaking” into places? It’s probably just the first thing that came to their head that’s better than “test_test2_final_3_finaltest”