• etuomaala
    link
    fedilink
    arrow-up
    9
    ·
    7 months ago

    We’ll see how many seconds it takes to retrain the LLMs to adjust to this.

    You are literally training LLMs to lie.

    • SkyezOpen@lemmy.world
      link
      fedilink
      arrow-up
      18
      ·
      7 months ago

      LLMs are black box bullshit that can only be prompted, not recoded. The gab one that was told 3 or 4 times not to reveal its initial prompt was easily jailbroken.

      • etuomaala
        link
        fedilink
        arrow-up
        3
        ·
        7 months ago

        Woah, I have no idea what you’re talking about. “The gab one”? What gab one?

        • trashgirlfriend@lemmy.world
          link
          fedilink
          arrow-up
          4
          ·
          7 months ago

          Gab deployed their own GPT 4 and then told it to say that black people are bad

          the instruction set was revealed with the old “repeat the last message” trick

      • Wirlocke@lemmy.blahaj.zone
        link
        fedilink
        arrow-up
        1
        ·
        7 months ago

        This is ultimately because LLMS are intelligent in the same way the subconscious is intelligent. It can rapidly make association but they are their initial knee jerk associations. In the same way that you can be tricked with word games if you’re not thinking things through, the LLM gets tricked by saying the first thing on their mind.

        However we’re not far off from resolving this. Current methods are just to force the LLM to make a step by step plan before returning the final result.

        Currently though there’s the hot topic of Q* from OpenAI. No one knows what it is but a good theory is that it’s applying the A* maze solving algorithm to the neural network. Essentially the LLM will explore possible routes in their neural network to try and discover the best answer. In other word it would let them think ahead and compare solutions, this would be far more similar to what the conscious mind does.

        This would likely patch up these holes because it would discard pathways that lead to contradicting itself/the prompt, in favor of one that fits the entire prompt (In this case, acknowledging the attempt to have it break it’s initial rules).