A long form response to the concerns and comments and general principles many people had in the post about authors suing companies creating LLMs.

  • flyingowlfox@beehaw.org
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    1 year ago

    Regardless of intent, let’s not pretend that the scale at which LLMs “process” information to generate new content is comparable to humans. That is obviously what was intended for copyright laws (so far).

    • Sas [she/her]@beehaw.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      We don’t need to pretend though. People with speed reading skills are faster than most humans as well and could read a lot more books.

      It’s very probable that you read at least one writers whole library, even if it’s as many stories as Terry Pratchett got published which will always be true for human written books as writing them takes longer than reading.

      Obviously the acquirement of those stories has to be made in a legal way and no actual passages should be stored in the model but the amount of data processed should have no say on if it can be used.

      And as written by others here. Making copyright law more strict puts big corps at an advantage because they have big legal teams and money to just pay the copyright fee while your regular user would not be able to.

    • Peanut
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      It’s comparing a bird to a plane, but I still think the process constitutes “learning,” which may sound anthropomorphic to some, but I don’t think we have a more accurate synonym. I think the plane is flying even if the wings aren’t flapping and the plane doesn’t do anything else birds do. I think LLMs, while different, reflect the subconscious aspect of human speech, and reflect the concept of learning from the data more than “copying” the data. It’s not copying and selling content unless you count being prompted into repeating something it was trained on heavily enough for accurate verbatim reconstruction. To me, that’s no more worrying than Disney being able to buy writers that have memorized some of their favorite material, and can reconstruct it on demand. If you ask your intern to reproduce something verbatim with the intent of selling it. I still don’t think the training or “learning” were the issues.

      To accurately address the differences, we probably need new language and ideals for the specific situations that arise in the building of neural nets, but I still consider much of the backlash completely removed from any understanding of what has been done with the “copywrited material.”

      I tend to view it thinking about naturally training these machines in the future with real world content. Should a neural net built to act in the real world be sued if an image of a coca-cola can was in the training data somewhere, and some of the machines end up being used to make cans for a competitor?

      How many layers of abstraction, or how much mixture with other training data do you need to not consider that bit of information to be comparable to the crime of someone intentionally and directly creating an identical logo and product to sell?

      Copyright laws already need an overhaul prior to a.i.

      It’s no coincidence that warner and Disney are so giant right now, and own so much of other people’s ideas. That they have the money to control what ideas get funded or not. How long has Disney been dead? More than half a century. So why does his business own the rights of so many artists who came after?

      I don’t think the copywrite system is ready to handle the complexity of artificial minds at any stage, whether it is the pareidolic aspect of retrieving visual concepts of images in diffusion models, or the complex abilities that arise from current scale LLMs? which again, I believe are able to resemble the subconscious aspect of word predictions that exists in our minds

      We can’t even get people to confidently legislate a simple ethical issue like letting people have consensual relationships with the gender of their own choice. I don’t have hope we can accurately adjust at each stage of development of a technology so complex we don’t even have the language to properly describe the functioning. I just believe that limiting our future and important technology for such grotesquely misdirected egoism would be far more harmful than good

      The greater focus should be in guaranteeing that technological or creative developments benefit the common people, not just the rich. This should have been the focus for the past half century. People refuse this conceptually because they’ve been convinced that any economic re-balancing is evil when it benefits the poor. Those with the ability to change anything are only incentivized to help themselves.

      But everyone is just mad at the machine because “what if it learned from my property?”

      I think the article even promotes Adobe as the ethical alternative. Congrats, you’ve limited the environment so that only the existing owners of everything can advance. I don’t want to pay Adobe a subscription for the rest of my life for the right to create on par with more wealthy individuals. How is this helping the world or creation of art?