George R.R. Martin and other authors sue OpenAI for copyright infringement

Voyager@psychedelia.ink · 1 year ago

George R.R. Martin and other authors sue OpenAI for copyright infringement

just another dev@lemmy.my-box.dev · 1 year ago

Because that is far harder to prove than showing OpenAI used his IP without permission.

In my opinion, it should not be allowed to train a generative model on data without permission of the rights holder. So at the very least, OpenAI should publish (references to) the training data they used so far, and probably restrict the dataset to public domain–and opt-in works for future models.

Aezora@lemm.ee · edit-2 1 year ago

I don’t see why they (authors/copyright holders) have any right to prevent use of their product beyond purchasing. If I legally own a copy of Game of Thrones, I should be able to do whatever the crap I want with it.

And basically, I can. I can quote parts of it, I can give it to a friend to read, I can rip out a page and tape it to the wall, I can teach my kid how to read with it.

Why should I not be allowed to train my AI with it? Why do you think it’s unethical?

Anonymousllama@lemmy.world · 1 year ago

Next if you come up with some ideas of your own fantasy environment after watching game of thrones, they’ll want to chase you down considering they didn’t give you expressed permission to be “inspired” by their work 🙄

just another dev@lemmy.my-box.dev · edit-2 1 year ago

And basically, I can. I can quote parts of it, I can give it to a friend to read, I can rip out a page and tape it to the wall, I can teach my kid how to read with it.

These are things you’re allowed to do with your copy of the book. But you are not allowed to, for example create a copy of it and give that to a friend, create a play or a movie out of it. You don’t own the story, you own a copy of it on a specific medium.

As to why it’s unethical, see my comment here.

koljarhr@discuss.tchncs.de · 1 year ago

I agree, the ownership is not absolute.

However, just as a person does not own the work of an author, the authors do not own words, grammar, sentences or even their own style. Similarly, they do not own the names of the characters in their books or the universe in which the plot is happening. They even do not “own” their own name.

So the only question remaining becomes whether is AI allowed to “read” a book. In the future authors might prohibit it, but hey, we’re just going to end up with a slightly more archaic-speaking GPT over time because it will not train on new releases. And that’s fine by me.

just another dev@lemmy.my-box.dev · 1 year ago

I think that in the end it should be a matter of licenseship (?). The author might give you the right to train a model on it, if you pay them for it. Just like you’d have get permission if you want to turn their work into a play or a show.

I don’t think the argument (not yours, but often seen in discussions like these) about “humans can be inspired by a work, so a computer should be allowed to be as well” holds any ground. For it would take a human much more time to make a style their own, as well as to recreate large amounts of it. For a ai model the same is a matter of minutes and seconds, respectively. So any comparison is moot, imho.

Aezora@lemm.ee · 1 year ago

But the thing is, it’s not similar to turning their work into a play or a TV show. You aren’t replicating their story at all, they put words in a logical order and you are using that to teach the AI what the next word logically could be.

As for humans taking much more time to properly mimic style, of course that’s true (assuming untrained). But an AI requires far more memory and data to do that. A human can replicate a style with just examples of that style given time. An AI needs to scrape basically the entire internet (and label it, which takes quite some time) to be able to do so. They may need different things but it’s ridiculous to say that they’re completely incomparable. Besides, you make it sound like AI is it’s own entity that wasn’t created, trained, and used by humans in the first place.

just another dev@lemmy.my-box.dev · edit-2 1 year ago

It’s not the same as turning it into a play, but it’s doing something with it beyond its intended purpose, specifically with the intention to produce derivatives of it at an enormous scale.

Whether or not a computer needs more or less of it than a human is not a factor, in my opinion. Actually, the fact that more input is required than for a human only makes it worse, since more of the creators work has to be used without their permission.

Again, the reason why I think it’s incomparable is that when a human learns to do this, the damage is relatively limited. Even the best writer can only produce so many pages per day. But when a model learns to do it, the ability to apply it is effectively unlimited. The scale of the infraction is so exponentially more extreme, that I don’t think it’s reasonable to compare them.

Lastly, if I made it sound like that, I apologise, that was not my intention. I don’t think it’s the models fault, but the people who decided to (directly or indirectly by not vetting their input data) take somebody’s copyrighted work and train an LLM on it.

Aezora@lemm.ee · 1 year ago

I don’t think the potential difference between how much damage can be caused is a reasonable argument. After all, economic damages to writers from others copying, plagiarizing their work or style or world is limited not because it’s hard for humans to do so, but because we made it illegal to make something so similar to another person’s copyrighted work.

For example, Harry Potter has absolutely been copied to the extent legally allowed, but no one cares about any of those books because they’re not so similar that they affect the sales of Harry Potter at all. And that’s also true for AI. It doesn’t matter how closely it can replicate someone’s style or story if that replication can never be used or sold due to copyright infringement, which is already the case right now. Sure you can use it to generate thousands of books that are just different enough to not get struck down, but that wouldn’t affect the original book at all.

Now, to be fair, with art you can be more similar to others art, because of how art works. But also, to be fair, the art market was never about how good an artist was, it was about how expensive the rich people who bought your art wanted it to be for tax purposes. And I doubt AI art is valuable for that.

koljarhr@discuss.tchncs.de · 1 year ago

Ownership is never absolute. Just like with music - you are not allowed to use it commercially i.e. in your restaurant, club, beauty salon, etc. without paying extra. You are also not allowed to do the same with books - for example, you shouldn’t share scans online, although it’s “your” book.

However, it is not clear how AI infringes on the rights of authors in this case. Because a human may read a book and produce a similar book in the same style legally.

koljarhr@discuss.tchncs.de · 1 year ago

Assuming that books used for GPT training were indeed purchased, not pirated, and since “AI training” was not prohibited at the time of the purchase, the engineers had every right to use them. Maybe authors in the future could prohibit “AI training” but for the books purchased before they do, “AI training” is a fair usage.

just another dev@lemmy.my-box.dev · edit-2 1 year ago

I think we’ll find our whether or not that is true will be decided in a trial like this.

Grimy@lemmy.world · 1 year ago

Okay, the problem is there are only about three companies with either enough data or enough money to buy it. Any open source or small time AI model is completely dead in the water. Since our economy is quickly moving towards being AI driven, it would basically guarantee our economy is completely owned by a handful of companies like Getty Images.

Any artist with less weight than GRR and Taylor Swift is still screwed, they might get a peanut or two at most.

I’d rather get an explosion of culture, even if it mean GRR doesn’t get a last fat paycheck and Hollywood loses control of its monopoly.

just another dev@lemmy.my-box.dev · 1 year ago

I get it. I download movies without paying for it too. It’s super convenient, and much cheaper than doing it the right thing.

But I don’t pretend it’s ethical. And I certainly don’t charge other people money to benefit from it.

Either there are plenty of people who are fine with their work being used for AI purposes (especially in a open source model), or they don’t agree to it - in which case it would be unethical to do so.

Just because something is practical, doesn’t mean it’s right.

Grimy@lemmy.world · 1 year ago

There’s so much more at stake, it’s not remotely the same as pirating. AI is poised to take over any kind of job that requires only a computer and a telephone. I’d rather have robust open source options that a handful of companies exerting a subscription tax on half the economy.

Any overt legislation will only hurt us the consumer while 99.9% of the actual artists and contributers won’t see any benefit whatsoever.

Short of aggressively nationalizing any kind of AI endeavour, making it as free and accessible as possible is the best option imo.

Touching_Grass@lemmy.world · 1 year ago

We could get Elon musk to develop a corpus and train all AI on that instead of training AI on a corpus from scraping websites.

Fraylor@lemm.ee · 1 year ago

Elon can’t develop shit.