China has a huge advantage in AI models because of how lax they are on intellectual property rights. US companies are fighting over API licensing costs, while china is just going to scrape everything and use it for free.
The US has a lead now, but I don’t think they can maintain it without giving up on ethical training. Then again it may not matter if the US models are ethical if everyone will eventually just uses the superior unethically trained chinese models instead.
China has a huge advantage in AI models because of how lax they are on intellectual property rights. US companies are fighting over API licensing costs, while china is just going to scrape everything and use it for free.
lolwat
did corporate provide you with these talking points?
I mean, they are right. Asside the question of whether we can even make meaningfully better models by just using LLMs and more data and what the future of AI will look like, and whether it’s ethical or not to steal the data, it is quite possible that OpenAI and the like will get into legal trouble because of the methods they use for acquiring data, but Chinese companies won’t have to worry about that. If more data = better models then China has an obvious advantage.
OpenAI and the like aren’t going to get into trouble anytime soon. They already provide their latest tech to US gov and military.
OpenAI is like a goose that laid a golden egg, they need to fuck up really really badly to face any consequences.
I doubt any of these US government and oligarch backed companies are gonna get any trouble. They essentially robbed the commons and got away with it. But sure Sam Altman has to pay spezz some money for my shitposts… the horror, what a hurdle!
Quickly give them more taxpayer money so they can compete with china!
The US companies already scraped the data while they could. If anything, data scraping is far far more difficult now for everyone due to technical reasons.
Most of the new models are trained on synthetic data or higher quality of data or with RLHF. The reason deepseek is able to perform is likely because LLMs are very very new things, there are many low hanging fruits. Its no longer just about the data we already hit that limit for quite some time.
Honestly, even from the beginning it’s pretty obvious scraped data is going to have a ton of issues. There’s too much nonsense out there, both from misinformation and people just not able to communicate.
That’s before you get into the ethical aspects of stealing other people’s content and the way these things are being misused.
China has a huge advantage in AI models because of how lax they are on intellectual property rights. US companies are fighting over API licensing costs, while china is just going to scrape everything and use it for free.
The US has a lead now, but I don’t think they can maintain it without giving up on ethical training. Then again it may not matter if the US models are ethical if everyone will eventually just uses the superior unethically trained chinese models instead.
lolwat
did corporate provide you with these talking points?
I mean, they are right. Asside the question of whether we can even make meaningfully better models by just using LLMs and more data and what the future of AI will look like, and whether it’s ethical or not to steal the data, it is quite possible that OpenAI and the like will get into legal trouble because of the methods they use for acquiring data, but Chinese companies won’t have to worry about that. If more data = better models then China has an obvious advantage.
OpenAI and the like aren’t going to get into trouble anytime soon. They already provide their latest tech to US gov and military. OpenAI is like a goose that laid a golden egg, they need to fuck up really really badly to face any consequences.
I doubt any of these US government and oligarch backed companies are gonna get any trouble. They essentially robbed the commons and got away with it. But sure Sam Altman has to pay spezz some money for my shitposts… the horror, what a hurdle!
Quickly give them more taxpayer money so they can compete with china!
The US companies already scraped the data while they could. If anything, data scraping is far far more difficult now for everyone due to technical reasons.
Most of the new models are trained on synthetic data or higher quality of data or with RLHF. The reason deepseek is able to perform is likely because LLMs are very very new things, there are many low hanging fruits. Its no longer just about the data we already hit that limit for quite some time.
Honestly, even from the beginning it’s pretty obvious scraped data is going to have a ton of issues. There’s too much nonsense out there, both from misinformation and people just not able to communicate.
That’s before you get into the ethical aspects of stealing other people’s content and the way these things are being misused.