• barsquid@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    7 months ago

    I don’t think the license will do anything legally, but I hope the inclusion of the license poisons some data for LLM training. Unfortunately, it is all really uniform across all the people doing it and all their comments, so it will be easy to strip out.

      • ninpnin
        link
        fedilink
        English
        arrow-up
        1
        ·
        7 months ago

        It’s like not throwing garbage on the streets in a polluted city. Doesn’t really change the big picture, but if everybody did it it would make a difference.

        See: users avoiding their content getting shadow banned using alternate spellings like s3x.

    • ninpnin
      link
      fedilink
      English
      arrow-up
      1
      ·
      7 months ago

      Any individual action can be combatted easily. A million different signatures and headers is a whole different .

      Mind you, LLM training data is polluted with anything and everything, including other languages. Recently, the best performance has been reached using higher quality data.