Camera makers and pencil makers (and the users of those devices) aren’t making massive server farms that spy on every drop of information they can get ahold of.
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
Now when that’s the case, well where did the devs get the training data?.. 🤔
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
That’s not how generative AI works. It’s capable of creating images that include novel elements that weren’t in the training set.
Go ahead and ask one to generate a bonkers image description that doesn’t exist in its training data and there’s a good chance it’ll be able to make one for you. The classic example is an “avocado chair”, which an early image generator was able to produce many plausible images of despite only having been trained on images of avocados and chairs. It understood the two general concepts and was able to figure out how to meld them into a common depiction.
Yes, I’ve tried similar silly things. I’ve asked AI to render an image of Mr. Bean hugging Pennywise the clown. And it delivered, something randomly silly looking, but still not far off base.
But when it comes to inappropriate material, well the AI shouldn’t be able to generate any such thing in the first place, unless the developers have allowed it to train from inappropriate sources…
The trainers didn’t train the image generator on images of Mr. Bean hugging Pennywise, and yet it’s able to generate images of Mr. Bean hugging Pennywise. Yet you insist that it can’t generate inappropriate images without having been specifically trained on inappropriate images? Why is that suddenly different?
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
Who is responsible then? Cuz the devs basically gotta let the AI go to town on many websites and documents for any sort of training set.
So you mean to say, you can’t blame the developers, because they just made a tool (one that scrapes data from everywhere possible), can’t blame the tool (don’t mind that AI is scraping all your data), and can’t blame the end users, because some dirty minded people search or post inappropriate things…?
First, you need to figure out exactly what it is that the “blame” is for.
If the problem is the abuse of children, well, none of that actually happened in this case so there’s no blame to begin with.
If the problem is possession of CSAM, then that’s on the guy who generated them since they didn’t exist at any point before then. The trainers wouldn’t have needed to have any of that in the training set so if you want to blame them you’re going to need to do a completely separate investigation into that, the ability of the AI to generate images like that doesn’t prove anything.
If the problem is the creation of CSAM, then again, it’s the guy who generated them.
If it’s the provision of general-purpose art tools that were later used to create CSAM, then sure, the AI trainers are in trouble. As are the camera makers and the pencil makers, as I mentioned sarcastically in my first comment.
Sadly that’s what most of the gun laws are designed about. Book banning and anti-abortion both are limiting tools because of what a small minority choose to do with the tool.
AI image generation shouldn’t be considered in obscenity laws. His distribution or pornography to minor should be the issue, because not everyone stuck with that disease should be deprived tools that can be used to keep them away from hurting others.
Using AI images to increase charges should be wrong. A pedophile contacting and distributing pornography to children should be all that it takes to charge a person. This will just setup new precedent that is beyond the scope of the judiciary.
It would be more like outlawing ivory grand pianos because they require dead elephants to make - the AI models under question here were trained on abuse.
A person (the arrested software engineer from the article) acquired a tool (a copy of Stable Diffusion, available on github) and used it to commit crime (trained it to generate CSAM + used it to generate CSAM).
That has nothing to do with the developer of the AI, and everything to do with the person using it. (hence the arrest…)
Unfortunately the developer trained it on some CSAM which I think means they’re not free of guilt - we really need to rebuild these models from the ground up to be free of that taint.
Given it’s public dataset not owned or maintained by the developers of Stable Diffusion; I wouldn’t consider that their fault either.
I think it’s reasonable to expect a dataset like that should have had screening measures to prevent that kind of data being imported in the first place. It shouldn’t be on users (here meaning the devs of Stable Diffusion) of that data to ensure there’s no illegal content within the billions of images in a public dataset.
That’s a different story now that users have been informed of the content within this particular data, but I don’t think it should have been assumed to be their responsibility from the beginning.
Sounds to me it would be more like outlawing grand pianos because of all of the dead elephants - while some people are claiming that it is possible to make a grand piano without killing elephants.
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
No, I’m not - I still have ethical objections and I don’t believe CSAM could be generated without some CSAM in the training set. I think it’s generally problematic to sexually fantasize about underage persons though I know that’s an extremely unpopular opinion here.
So why are you posting all over this thread about how CSAM was included in the training set if that is in your opinion ultimately irrelevant with regards to the topic of the post and discussion, the morality of using AI to generate CSAM?
Because all over this thread are claims that AI CSAM doesn’t need actual CSAM to generate. We currently don’t have AI CSAM that is taint free and it’s unlikely we ever will due to how generative AI works.
That’s not the point. You don’t train a hammer from millions of user inputs.
You gotta ask, if the AI can produce inappropriate material, then where did the developers get the training data, and what exactly did they train those AI models for?
Do… Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
Or are you arguing that we should be allowed to do what’s been done in the article? (arrest and charge the individual responsible for training their copy of an AI model to generate CSAM)
One, AI image generators can and will spit out content vastly different than anything in the training dataset (this ofc can be influenced greatly by user input). This can be fed back into the training data to push the model towards the desired outcome. Examples of the desired outcome are not required at all. (IE you don’t have to feed it CSAM to get CSAM, you just have to consistently push it more and more towards that goal)
Two, anyone can host an AI model; it’s not reserved for big corporations and their server farms. You can host your own copy and train it however you’d like on whatever material you’ve got. (that’s literally how Stable Diffusion is used) This kind of explicit material is being created by individuals using AI software they’ve downloaded/purchased/stolen and then trained themselves. They aren’t buying a CSAM generator ready to use off the open market… (nor are they getting this material from publicly operating AI models)
They are acquiring a tool and moulding it into a weapon of their own volition.
Some tools you can just use immediately, others have a setup process first. AI is just a tool, like a hammer. It can be used appropriately, or not. The developer isn’t responsible for how you decide to use it.
Do… Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
I think that’s a bit of a stretch. If it was being marketed as “make your fantasy, no matter how illegal it is,” then yeah. But just because I use a tool someone else made doesn’t mean they should be held liable.
And if I prompted AI for something inappropriate, and it gave me a relevant image, then that means the AI had inappropriate material in it’s training data.
No, you keep repeating this but it remains untrue no matter how many times you say it. An image generator is able to create novel images that are not directly taken from its training data. That’s the whole point of image AIs.
An image generator is able to create novel images that are not directly taken from its training data. That’s the whole point of image AIs.
I just want to clarity that you’ve bought the silicon valley hype for AI but that is very much not the truth. It can create nothing novel - it can merely combine concepts and themes and styles in an incredibly complex manner… but it can never create anything novel.
What it’s able and intended to do is besides the point, if it’s also capable of generating inappropriate material.
Let me spell it more clearly. AI wouldn’t know what a pussy looked like if it was never exposed to that sort of data set. It wouldn’t know other inappropriate things if it wasn’t exposed to that data set either.
Do you see where I’m going with this? AI only knows what people allow it to learn…
You realize that there are perfectly legal photographs of female genitals out there? I’ve heard it’s actually a rather popular photography subject on the Internet.
Do you see where I’m going with this? AI only knows what people allow it to learn…
Yes, but the point here is that the AI doesn’t need to learn from any actually illegal images. You can train it on perfectly legal images of adults in pornographic situations, and also perfectly legal images of children in non-pornographic situations, and then when you ask it to generate child porn it has all the concepts it needs to generate novel images of child porn for you. The fact that it’s capable of that does not in any way imply that the trainers fed it child porn in the training set, or had any intention of it being used in that specific way.
As others have analogized in this thread, if you murder someone with a hammer that doesn’t make the people who manufactured the hammer guilty of anything. Hammers are perfectly legal. It’s how you used it that is illegal.
I’m not sure why you’re picking this situation for an anti-AI rant. Of course there are a lot of ways that large companies will try to use AI that will harm society. But this is a situation where we already have laws on the books to lock up the people who are specifically doing terrible things. Good.
If you want to try to stand up and tell us about how AI is going to damage society, pick an area where people are using it legally and show us the harms there. Find something that’s legal but immoral and unethical, and then you’ll get a lot of support.
Then we should be able to charge AI (the developers moreso) for the same disgusting crime, and shut AI down.
Camera-makers, too. And people who make pencils. Lock the whole lot up, the sickos.
Camera makers and pencil makers (and the users of those devices) aren’t making massive server farms that spy on every drop of information they can get ahold of.
If AI has the means to generate inappropriate material, then that means the developers have allowed it to train from inappropriate material.
Now when that’s the case, well where did the devs get the training data?.. 🤔
That’s not how generative AI works. It’s capable of creating images that include novel elements that weren’t in the training set.
Go ahead and ask one to generate a bonkers image description that doesn’t exist in its training data and there’s a good chance it’ll be able to make one for you. The classic example is an “avocado chair”, which an early image generator was able to produce many plausible images of despite only having been trained on images of avocados and chairs. It understood the two general concepts and was able to figure out how to meld them into a common depiction.
Yes, I’ve tried similar silly things. I’ve asked AI to render an image of Mr. Bean hugging Pennywise the clown. And it delivered, something randomly silly looking, but still not far off base.
But when it comes to inappropriate material, well the AI shouldn’t be able to generate any such thing in the first place, unless the developers have allowed it to train from inappropriate sources…
The trainers didn’t train the image generator on images of Mr. Bean hugging Pennywise, and yet it’s able to generate images of Mr. Bean hugging Pennywise. Yet you insist that it can’t generate inappropriate images without having been specifically trained on inappropriate images? Why is that suddenly different?
The trainers taught it what Mr. Bean looks like and what Pennywise looks like - it took those concepts and combined them to create your image. To make CSAM it was, unfortunately, trained on CSAM https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
Who is responsible then? Cuz the devs basically gotta let the AI go to town on many websites and documents for any sort of training set.
So you mean to say, you can’t blame the developers, because they just made a tool (one that scrapes data from everywhere possible), can’t blame the tool (don’t mind that AI is scraping all your data), and can’t blame the end users, because some dirty minded people search or post inappropriate things…?
So where’s the blame go?
First, you need to figure out exactly what it is that the “blame” is for.
If the problem is the abuse of children, well, none of that actually happened in this case so there’s no blame to begin with.
If the problem is possession of CSAM, then that’s on the guy who generated them since they didn’t exist at any point before then. The trainers wouldn’t have needed to have any of that in the training set so if you want to blame them you’re going to need to do a completely separate investigation into that, the ability of the AI to generate images like that doesn’t prove anything.
If the problem is the creation of CSAM, then again, it’s the guy who generated them.
If it’s the provision of general-purpose art tools that were later used to create CSAM, then sure, the AI trainers are in trouble. As are the camera makers and the pencil makers, as I mentioned sarcastically in my first comment.
You obviously don’t understand squat about AI.
AI only knows what has gone through it’s training data, both from the developers and the end users.
Hell, back in 2003 I wrote an adaptive AI for optical character recognition (OCR). I designed it for English, but also with a crude ability to learn.
I could have taught that thing hieroglyphics if I wanted to. But AI will never generate things that it’s never seen before.
Funny that AI has an easier time rendering inappropriate material than it does human hands…
…no
That’d be like outlawing hammers because someone figured out they make a great murder weapon.
Just because you can use a tool for crime, doesn’t mean that tool was designed/intended for crime.
Sadly that’s what most of the gun laws are designed about. Book banning and anti-abortion both are limiting tools because of what a small minority choose to do with the tool.
AI image generation shouldn’t be considered in obscenity laws. His distribution or pornography to minor should be the issue, because not everyone stuck with that disease should be deprived tools that can be used to keep them away from hurting others.
Using AI images to increase charges should be wrong. A pedophile contacting and distributing pornography to children should be all that it takes to charge a person. This will just setup new precedent that is beyond the scope of the judiciary.
It would be more like outlawing ivory grand pianos because they require dead elephants to make - the AI models under question here were trained on abuse.
A person (the arrested software engineer from the article) acquired a tool (a copy of Stable Diffusion, available on github) and used it to commit crime (trained it to generate CSAM + used it to generate CSAM).
That has nothing to do with the developer of the AI, and everything to do with the person using it. (hence the arrest…)
I stand by my analogy.
Unfortunately the developer trained it on some CSAM which I think means they’re not free of guilt - we really need to rebuild these models from the ground up to be free of that taint.
Reading that article:
Given it’s public dataset not owned or maintained by the developers of Stable Diffusion; I wouldn’t consider that their fault either.
I think it’s reasonable to expect a dataset like that should have had screening measures to prevent that kind of data being imported in the first place. It shouldn’t be on users (here meaning the devs of Stable Diffusion) of that data to ensure there’s no illegal content within the billions of images in a public dataset.
That’s a different story now that users have been informed of the content within this particular data, but I don’t think it should have been assumed to be their responsibility from the beginning.
Sounds to me it would be more like outlawing grand pianos because of all of the dead elephants - while some people are claiming that it is possible to make a grand piano without killing elephants.
There’s CSAM in the training set[1] used for these models so some elephants have been murdered to make this piano.
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
I know. So to confirm, you’re saying that you’re okay with AI generated CSAM as long as the training data for the model didn’t include any CSAM?
No, I’m not - I still have ethical objections and I don’t believe CSAM could be generated without some CSAM in the training set. I think it’s generally problematic to sexually fantasize about underage persons though I know that’s an extremely unpopular opinion here.
So why are you posting all over this thread about how CSAM was included in the training set if that is in your opinion ultimately irrelevant with regards to the topic of the post and discussion, the morality of using AI to generate CSAM?
Because all over this thread are claims that AI CSAM doesn’t need actual CSAM to generate. We currently don’t have AI CSAM that is taint free and it’s unlikely we ever will due to how generative AI works.
That’s not the point. You don’t train a hammer from millions of user inputs.
You gotta ask, if the AI can produce inappropriate material, then where did the developers get the training data, and what exactly did they train those AI models for?
Do… Do you really think the creators/developers of Stable Diffusion (the AI art tool in question here) trained it on CSAM before distributing it to the public?
Or are you arguing that we should be allowed to do what’s been done in the article? (arrest and charge the individual responsible for training their copy of an AI model to generate CSAM)
One, AI image generators can and will spit out content vastly different than anything in the training dataset (this ofc can be influenced greatly by user input). This can be fed back into the training data to push the model towards the desired outcome. Examples of the desired outcome are not required at all. (IE you don’t have to feed it CSAM to get CSAM, you just have to consistently push it more and more towards that goal)
Two, anyone can host an AI model; it’s not reserved for big corporations and their server farms. You can host your own copy and train it however you’d like on whatever material you’ve got. (that’s literally how Stable Diffusion is used) This kind of explicit material is being created by individuals using AI software they’ve downloaded/purchased/stolen and then trained themselves. They aren’t buying a CSAM generator ready to use off the open market… (nor are they getting this material from publicly operating AI models)
They are acquiring a tool and moulding it into a weapon of their own volition.
Some tools you can just use immediately, others have a setup process first. AI is just a tool, like a hammer. It can be used appropriately, or not. The developer isn’t responsible for how you decide to use it.
Then that settles it. It’s whoever allows bad data into the training data.
Yes. Because they did (not intentionally though)
https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
3,226 suspected images out of 5.8 billion. About 0.00006%. And probably mislabeled to boot, or it would have been caught earlier. I doubt it had any significant impact on the model’s capabilities.
I think that’s a bit of a stretch. If it was being marketed as “make your fantasy, no matter how illegal it is,” then yeah. But just because I use a tool someone else made doesn’t mean they should be held liable.
Check my other comments. My thought was compared to a hammer.
Hammers aren’t trained to act or respond on their own from millions of user inputs.
Image AIs also don’t act or respond on their own. You have to prompt them.
And if I prompted AI for something inappropriate, and it gave me a relevant image, then that means the AI had inappropriate material in it’s training data.
No, you keep repeating this but it remains untrue no matter how many times you say it. An image generator is able to create novel images that are not directly taken from its training data. That’s the whole point of image AIs.
I just want to clarity that you’ve bought the silicon valley hype for AI but that is very much not the truth. It can create nothing novel - it can merely combine concepts and themes and styles in an incredibly complex manner… but it can never create anything novel.
What it’s able and intended to do is besides the point, if it’s also capable of generating inappropriate material.
Let me spell it more clearly. AI wouldn’t know what a pussy looked like if it was never exposed to that sort of data set. It wouldn’t know other inappropriate things if it wasn’t exposed to that data set either.
Do you see where I’m going with this? AI only knows what people allow it to learn…
You realize that there are perfectly legal photographs of female genitals out there? I’ve heard it’s actually a rather popular photography subject on the Internet.
Yes, but the point here is that the AI doesn’t need to learn from any actually illegal images. You can train it on perfectly legal images of adults in pornographic situations, and also perfectly legal images of children in non-pornographic situations, and then when you ask it to generate child porn it has all the concepts it needs to generate novel images of child porn for you. The fact that it’s capable of that does not in any way imply that the trainers fed it child porn in the training set, or had any intention of it being used in that specific way.
As others have analogized in this thread, if you murder someone with a hammer that doesn’t make the people who manufactured the hammer guilty of anything. Hammers are perfectly legal. It’s how you used it that is illegal.
Yes, I get all that, duh. Did you read the original post title? CSAM?
I thought you could catch a clue when I said inappropriate.
I learned how to write by reading. The AI did the same, more or less, no?
The AI didn’t learn to draw or generate photos from blind words though…
Oh, it learned from art? Like how human artists learn?
AI hasn’t exactly kicked out a Picasso with a naked young girl missing an ear yet has it?
I sure hope not!
But if it can, then that seriously indicates it must have some bad training data in the system…
I won’t be testing these hypotheses.
It in fact does have bad training data! https://cyber.fsi.stanford.edu/news/investigation-finds-ai-image-generation-models-trained-child-abuse
Thank you for posting a relevant link. It’s disappointing that such data is any part of any public AI systems… ☹️
Can we do guns next?
I’d rather not fart bullets, but thank you for inviting me to the party.
I’m not sure why you’re picking this situation for an anti-AI rant. Of course there are a lot of ways that large companies will try to use AI that will harm society. But this is a situation where we already have laws on the books to lock up the people who are specifically doing terrible things. Good.
If you want to try to stand up and tell us about how AI is going to damage society, pick an area where people are using it legally and show us the harms there. Find something that’s legal but immoral and unethical, and then you’ll get a lot of support.
Totally dismissing inappropriate usage, AI can be funny and entertaining, but on the flip side it’s also taking people’s jobs.
It shouldn’t take a book, let alone 3 seconds of common sense thought, to realize that.