Large language models (LLMs), such as the model underpinning the functioning of the conversational agent ChatGPT, are becoming increasingly widespread worldwide. As many people are now turning to LLM-based platforms to source information and write context-specific texts, understanding their limitations and vulnerabilities is becoming increasingly vital.
Why is it called Indiana Jones? I read the whole article and the abstract to the research paper and did not see an answer?
Based on not reading anything but the title of this post and the image, I figure that it refers to the “swapping the golden idol with a bag of dust” scene, swapping the real question with a decoy to get away with what you want while the LLM thinks it has followed the rules.
Don’t worry about the giant boulder.
May be because, like Indy replace the mayan artefact by a stone to avoid the trap, this jailbreak replace the request about crime, with similarly loaded one about crime history
Maybe just a silly name referencing him “breaking” into places? It’s probably just the first thing that came to their head that’s better than “test_test2_final_3_finaltest”