• MentalEdge
    link
    fedilink
    arrow-up
    4
    ·
    2 years ago

    On this I agree entirely. The potential for corporate espionage because of unwitting employees using an LLM through unofficial means is huge.

    At the very least, the corporation itself would have to be the customer, so that watertight terms might be negotiated, not the employee.

    • Ulu-Mulu-no-die@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      2 years ago

      I don’t think being a customer would work either, language models are still on the training, noone knows exactly how users queries are used, that’s a big no no for every company having to protect their secrets.

      A self-hosted instance is a much better solution, if not the only “safe” one from that point of view, we’ll get there.

      • MentalEdge
        link
        fedilink
        arrow-up
        3
        ·
        edit-2
        2 years ago

        Interaction data does not become training data, unless you want it to.

        I know that how a piece of software created using machine learning works, is an unknowable, but training data and interaction data are not the same thing. ChatGPT in particular is designed to be restored to a known good start state, only using query data for context awareness within a given sessions. Not to train itself.

        Each query simply includes all previous queries, for context. That’s part of why it becomes increasingly erratic the longer a session goes on.

        And unless you do train with a given piece of data, that data is not entered into the LLM in any way. Not even the undefined unknowable way.