Excerpt from a message I just posted in a #diaspora team internal forum category. The context here is that I recently get pinged by slowness/load spikes on the diaspora* project web infrastructure (Discourse, Wiki, the project website, ...), and looking at the traffic logs makes me impressively angry.
In the last 60 days, the diaspora* web assets received 11.3 million requests. That equals to 2.19 req/s - which honestly isn't that much. I mean, it's more than your average personal blog, but nothing that my infrastructure shouldn't be able to handle.
However, here's what's grinding my fucking gears. Looking at the top user agent statistics, there are the leaders:
2.78 million requests - or 24.6% of all traffic - is coming from Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot).
1.69 million reuqests - 14.9% - Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonb...
["ideas guy" tier shite]
If I had a site in 2024 like I had in 2004 I’d almost certainly set up some honeypot against those.
It would be a single page. The content would be, funnily enough, generated by ChatGPT; the text would be the output of the prompt “provide me eight paragraphs of grammatically correct but meaningless babble”. If you access that page, you’re unable to access any other resource in my site.
And that page would be linked by every single other page of my site. In a way that humans cannot reasonably click it, but bots would all the time. Perhaps white text on a white background at the start of the real text? [/"ideas guy" tier shite]
Don’t get me wrong. I’m not opposed to the generative A"I" technology itself. My issue is this “might makes right” mindset that permeates the companies behind it - “we have GoOD InTeNsHuNs so this filth complaining about us being a burden might lick a cactus lol lmao haha”.
["ideas guy" tier shite]
If I had a site in 2024 like I had in 2004 I’d almost certainly set up some honeypot against those.
It would be a single page. The content would be, funnily enough, generated by ChatGPT; the text would be the output of the prompt “provide me eight paragraphs of grammatically correct but meaningless babble”. If you access that page, you’re unable to access any other resource in my site.
And that page would be linked by every single other page of my site. In a way that humans cannot reasonably click it, but bots would all the time. Perhaps white text on a white background at the start of the real text?
[/"ideas guy" tier shite]
Don’t get me wrong. I’m not opposed to the generative A"I" technology itself. My issue is this “might makes right” mindset that permeates the companies behind it - “we have GoOD InTeNsHuNs so this filth complaining about us being a burden might lick a cactus lol lmao haha”.
From the comments in “Reddit LARPs as h4x0rz News”:
Bingo - forbid them from accessing your content.
And in the meantime make robots.txt legally enforceable across multiple countries.
This looks fun. I wish I thought about it before writing my shitty idea.