- cross-posted to:
- technology@beehaw.org
- privacy@programming.dev
- cross-posted to:
- technology@beehaw.org
- privacy@programming.dev
This video shows that Reddit refused to delete all comments and posts of its users when they close their account via a CCPA / GDPR request.
The creator of tildes.net is a former Reddit backend developer, and believes this behavior is likely due to how Reddit caching works (or doesn’t work), rather than an intentional subversion of user intent:
Yes, this is almost certainly a technical issue. The way reddit caches things probably isn’t the standard way you’re thinking of, like a short-term cache that expires and refreshes itself. There are multiple layers of “cached” listings and items for almost everything, and a lot of these caches are actually data that’s stored permanently and kept up to date individually.
For example, when you view your comments page, Reddit uses a cached (permanent) list of which comments are in that page. There is a separate list stored for each sorting method. For example, maybe you’d have something like this with some made-up comment IDs:
Deimos’s comments by new: 948, 238, 153
Deimos’s comments by hot: 238, 153, 948
Deimos’s comments by controversial: 153, 238, 948
If I post a new comment, it will go through each list and add the new ID in the right spot (for example, in the “new” list it always just goes at the start). If I delete a comment, it goes through every list, and removes the ID if it can find it in there.One of the problems with this system (which is probably what’s causing @phedre’s issues, and affecting many other people trying to delete their whole history) is that all of these listings are capped at 1000 items. If you already have more than 1000 comments and you post a new one, the 1000th comment currently in the new list gets “pushed off the end”. The comment still exists, but you won’t be able to see it by looking through your comments page, because it’s no longer in that listing.
Deleting comments also doesn’t cause previously “pushed off” ones to get re-added. If you have 5000 comments, your listing will only include 1000 of them. If you delete 50 of the ones in the listing, your listing now has 950 comments in it. If you delete all 1000 from the listing, your comments page will appear empty, but you actually still have 4000 comments that will be visible in the comments pages they were posted in.
And this is only one aspect of it. There are also multiple other places and ways that comments are cached—comment trees are cached (order and nesting of comments on a comments page, for all the different sorting methods), rendered HTML versions of comments are cached, API data is probably cached, and so on.
All of these issues are probably just some combination of all of your posts being difficult to find and access due to the listing limits or certain cached representations of posts not being cleared or updated properly.
Luckily GDPR deletion requests don’t care about how they are implemented. And failures to comply en masse tends to get really expensive.
Yup. I’m waiting for Reddit to come back with my GDPR data request (which has a time limit of 30 days, after which they can tell their excuses to extend it by another 30 days I believe), and assuming they have not reversed the API decision I’m ordering them to delete it all afterwards. And they even now have a handy list, the one they just gave me, of everything they have to purge - if they didn’t, it wouldn’t be on that list in the first place :)
Still waiting for the GDPR request i made at the start of this shitshow, will be funny to witness the mass GDPR deletion requests of accounts at the start of July
It’s been 3-4 weeks since I submitted my CCPA request, and I still haven’t gotten my data yet. CCPA has a time limit of 45 days.
That’s what’s so awful about this. Prices were announced May 31st, so for a CCPA request that was done that very instant, they can delay until mid July, when the API changes will make it much more difficult to delete your data, and there’s no recourse.
Even for GDPR, maybe you’d get it the day before, for the shorter 30 day limit. But a day of a few hours could easily mean you’ve gone past and API is also a problem for you.
This is some messed up timing, mates.
I would hope that someone reaching out to press from ModCoord would pass these concerns on to journalists. A persistent journalist can uncover the extent of compliance to the GDPR and CCPA through proper questions. “Have you seen an increase in GDPR/CCPA requests wince the controversy started? What percent of those have you completed? What about reports that users are unable to delete their data?” etc. (only better because I’m not a journalist and probably oversimplifying).
Reddit stopped answering requests for comment from objective journalists.
People just need to start filling complains with their Data Protection Authority. Then the mainstream media will be forced to cover the stories to get the clicks.
Based on this, I’d say that Reddit fully deserves to be banned in Europe and California, and fined into potential bankruptcy. Having deeply flawed technology that prevents them from ever being in compliance of a very serious law is no excuse.
.
This sounds like malicious incompetence…
Not necessarily, although Reddit can definitely choose to play it that way.
A lot of systems made in the pre-GDPR era (which is most of them) were not designed with the capability to decouple and erase content at a moment’s notice.
Btw incompetence won’t hold up as a valid defence for violating GDPR. At most it can give them some stalling room.
Oh God. Somewhat unrelated, but I felt like I knew the name “Deimos” from somewhere. Couldn’t put my finger on it. Finally realized who he was.
Greek god of dread and terror. Also, the smaller and outermost of the two satellites of Mars, named after said god.
Also an infested planet torn apart by family drama in Warframe.
This is one of the many legal issues Reddit now has.
Reddit is very clearly eying an ipo, but who really wants to invest in this dumpster fire.I’m not an investor but I l personally wouldn’t invest in a website as shortsighted as Reddit.
In an industry as cutthroat as social media having a site as active as Reddit, for 18 years. Should be celebrated.
In this world, where platforms live and die in the span of single years, why would Reddit throw away a formula that has worked for nearly 20 years.
As an investor, I can say with near certainty that the objective is extremely “short” sided.
Easy question to answer: they aren’t profitable and the free money of years of near 0 or 0% interest rates is over. The constant VC dried up and the website is insolvent. They have a massively bloated staff roster. They’re going to die if they don’t make a major change.
And at the same time, all “traditional” monetization strategies for websites like these just… don’t work with the way Reddit works. Making the changes they need to make will kill the site.
They never cooked up a monetization strategy that would work for them. They procrastinated. They felt free money would continue forever and underestimated how reliant their site was on volunteer labor. They got distracted by stupid side projects instead of refining the core product.
Reddit will absolutely survive all this. I expect it to still exist, at the end of the day. But it’ll be smaller, and what remains will be a soulless shit hole. And it’ll still be borderline insolvent.
If I could get a controlling interest for fifty bucks I’d chip in on that.
I wonder at what point people start taking them to court. It seems like the usual idiot tech bro excuse of thinking terms of service/use somehow override the law which is hilariously naive.
You cannot override the law in a TOS.
Like if they wrote down that they were allowed to murder you written into their TOS and proceeded to murder you they’d still go to jail for murder.
Username’s research checks out.
(Sorry, I know people are kind of sick of funny tropes that were common on Reddit, but I couldn’t resist. I"ll see myself out now…)
- reddit in violation of privacy laws
- spez a pedophile
- subs closing
reddit is doomed
Wow, their legal department shot themselves in the foot putting that in writing. Idiots.
I submitted a CCPA request weeks ago and have yet to hear from them. They also restored tons of content I deleted.
Time for a class action yet?
Time for a massive fine from the EU. Something large enough to bankrupt them.
Sadly probably not. The GDPR fine can be “up to €20 million, or up to 4% of the annual worldwide turnover of the preceding financial year, whichever is greater” which would be around 26 million based on their 2022 revenue. The company has gathered over $1.3 billion in funding and was “valued” at around $10 billion quite recently.
And that’s only around what a year of API calls would have cost for Apollo so clearly by discontinuing the API they are going to save that amount back in no time!
Yes, but a fine does not exempt you from compliance. If they are unable or unwilling to comply, the EU can ban them.
And I’m not talking cutting off EU user access, it’s cutting off money dealing with EU customers, adverisers, etc.
Their explanation for restored content will likely be something about the nature of how their CDN works.
Granted, this excuse won’t hold up much, but it’s probably true and will limit their liability in the sense that it isn’t intentional.
I’ve deleted my comments multiple times with PowerDeleteSuite and had things come back, a couple times over. However now I’m going through with shreddit (github version) using my GDPR files. It’s taking a long time because things panic every so many comments (I’m backing up everything, on file 75 so far but still 46,000 lines left from a 75,000 line file, however it’s panicking less now that the comments are more recent) and I haven’t had it restore any of the links I’ve checked from that process.
Reddit changed the way they display comments in the profile a few months back. Now, you only see a limited number of comments under New, Hot, Top & Controversial. These are the lists that most deletion services access. So, if you use PowerDeleteSuite or any other service it will likely miss things. In particular, I opened up links to my older Top comments, ran the script, then found it had completely ignored replies underneath my comment that had low but positive karma - these wouldn’t have appeared on the lists. My new list only went back about 3 months (although I think it’s about number of comments rather than time).
You really need to use the GDPR files to get everything. These contain CSV files with links to every single post and comment you have. However, it seems that reddit are delaying following through with most requests until after 1 July, when API requests (such as those that shreddit uses) will be blocked.
Also PSA don’t use the shreddit website, they charge you $15. The github version is free and will take CSV files with the appropriate tag. But, again, in my experience it panics and hangs fairly often, so it will take a lot of work to use. I’ve had to run it, back up the terminal output, use the last link and delete everything in posts.csv and comments.csv before the one it stuck on, then resume with ammended files.
Reddit really isn’t making it easy to follow through with your rights. Make records of this, then this can be used to convince local Data Protection Authorities to collectively throw down a bigger hammer than Huffman ever wielded, or even imagined.
Also another PSA, reddit’s terms do not deny you ownership of your content. So even if they try to claim ownership themselves (as Steve Huffman has frequently publicly stated) they cannot deny you the right to edit your content and restrict what they do with it. It’s your information, and reddit hasn’t even paid for it.
You can’t sell a microwave without paying for the nuts and bolts.
it seems that reddit are delaying following through with most requests until after 1 July when API requests (such as those that shreddit uses) will be blocked.
I was sooo worried about this and thinking that something like that would be done, back when i saw someone warn in the save 3rd party apps sub that u should request your data. Still i tried making a request bc i thought maybe reddit did not catch on yet or maybe bc it was before the blackout there can still be a chance, but till now i never got the data. :(
probably i’ll just leave the comments and posts. I did not post a lot.
There is a way still if you can’t wait for a GDPR request, or if you live outside a place where something like CCPA/GDPR applies and you think reddit might actually say no.
Here’s what to do in that case: https://kbin.social/m/RedditMigration/t/65260/PSA-Here-s-exactly-what-to-do-if-you-hit-the
TL;DR get the 1.6GB Pushshift torrent, then edit a script to extract your data, then edit another script to use that data to overwrite your comments.
This did not get the traction it should have. It’s probably the best of the dozen-ish methods I’ve seen.
You can still use the GDPR files to get at all your comments, you just won’t be able to use existing API methods to automate it. However, perhaps it would be possible to use the links to automate via a scraping method or something - maybe the PowerDeleteSuite method could be expanded upon.
Yeah, you are right. It’d be tough to directly modify PDS as that’s javascript in a browser and there are strict restrictions on what JS can do on a filesystem in that case.
But maybe someone can create a browser extension that does the same job. Extensions have fewer restrictions so maybe it could be fueled by a file.
Or maybe someone will some up with some kind of shell script that can read the archive and copy & paste the URLs for each of your posts and comments, one by one, into the javascript console of your browser, allowing PDS to take care of the rest (visiting each one and simulating hitting the edit and delete buttons).
The other issue is that PDS depends on old dot reddit dot com currently from what I understand. If that ever gets dropped, PDS will break until it’s updated to work with new reddit.
Yeah I’m expecting old reddit to die on 1 July.
But they promised they wouldn’t! </sarcasm>
There’s also a semi-auto delete user script that doesn’t use the API called so-long-reddit-thanks-for-all-the-fish.
You go to your comments page, click a button, and it performs the actions within the browser. Without any further interaction, you’ll see the screen scroll to the bottom, click edit on the last comment, enter the text in the script (default is a link to the script, but you can change that to anything), click save, and move on to the next comment (pretty sure it can delete, too). For best results, use a neverending Reddit script and keep scrolling until there are no more pages loaded. Also, re-sort the comments by each option (top, newest, etc.) to check for any stragglers.
You can still use your browser, though I recommend keeping the task in it’s own window (in case your browser or an addon unloads pages you haven’t accessed in x minutes). If you do something that makes the browser lag a little, it can cause the script to miss a comment, so you might need to run it twice. I used this on one account and it worked flawlessly for several thousand comments and skipped ~10, or so.
If you’re using the main repo for PDS then you probably have the one that doesn’t pause fro 5 secs between API calls (Reddit’s limit). The first fork version has the pause and works correctly, though slowly. Just be aware that there’s a bug in PDS that stops adding to the exported file if it hits an error (If you have 100 comments and get an error on comment #15 it will continue to edit/delete, but the exported file will only have 14 comments.)
This is the comment I was looking for. A class action from European citizens, for example, under the European privacy law, would really be bad news for Reddit (and good news for the Internet)
Would love to see the FOSS community take down reddit, especially if there’s legitimate merit to it.
What I noticed is that when restoring your comments they prioritize the ones with the most upvotes. Some I even deleted manually before the blackout reappeared too.
I find this shit to be likely illegal. I understand that we gave Reddit permission to use our content by agreeing to their terms of service, but if my comment was “A” and I edit it so that it displays “B”, it is wrong for Reddit to still display “A” below my username without my authorization. They can exploit the content “A” however they want, but to show it under my username as if it were what I consented to display under my name feels like a breach to me.
I have been overwriting and deleting manually and I haven’t seen anything come back yet… But it’s also a nightmare to delete old comments that they have archived and don’t show up on your profile. I just gave up
deleted by creator
It is illegal under the EU law.
I love it, it makes their intentions so obvious. Milking our content for AI training. Nobody will read our old conversations, except for AI´s
deleted by creator
I mean ChatGPT was already very easily influenced by just stating a username of a heavy user of r/counting.
It’s been patched out by now, but it was very funny.
Can you share some sources to do some reading about that? Never heard about it and sounds hilarious
This video from computerphile is (partially) about this: https://youtu.be/WO2X3oZEJOA
Lovely thing is that there isn’t even a option to delete data via a LGPD (Lei Geral de Proteção de Dados) request. Well, for what I know at least
The product owners over at Reddit are going to be surprised to learn that Brazil exists.
“Brazil? Isn’t there where the Amazon offices are at”
Ditto for the Canadian PIPEDA.
I don’t think they are actually restoring posts/comments. This whole thing is based on confusion about the blackout and many subreddits going private. Most people would think you can see all of your own posts and comments if you are logged in and go to your profile page, but if a subreddit goes private you cannot even see your own submissions in that sub.
So after the blackout ended and most subreddits went public again, people who nuked their account history are now discovering that there’s still posts remaining. They think these posts were restored, but they weren’t even deleted in the first place.
This is obviously a huge oversight on how Reddit handles your data and your profile page, but don’t attribute to malice which is adequately explained by stupidity.
That still makes it impossible for a user to ever delete all their comments, which is the CCPA complaint
The caching issue would clear up eventually, just give it some time. The CCPA process is slower, so probably the caching issue would be resolved by the time the courts heard it.
Private is different. What if I posted in a sub like r/BasicIncomeUSA that went permanently private during the blackout and never came back? 30 days, 45 days, still private. Worse, what if it’s a sub where the mods all delete their accounts - or they are unresponsive (because they quit using reddit without deleting their accounts).
So yeah, private means that reddit has to be the one doing the deletion, as a regular user may not even have the tools to delete otherwise.
If you watch the video, you would understand that this individual is deleting specific comments, then saw the exact same comments that he deleted return some time later.
Yes, but if you look closely all of those submissions were made on the javascript subreddit. It’s entirely possible that this sub was still private on the 24th, and went public on 25th. I don’t know for sure but that seems to be the most likely scenario.
Edit: Looking at the blackout tracker, javascript was still private on June 24th, which is the day that the OP of the video was manually deleting his submissions.
If the sub were private that time, wouldnt it have prevented him from being able to delete the comment in there in the first place (bc he wouldnt be able to see them when its privet?) In this case he was able to see them i guess because he was able to delete them specific. But am not sure
Yup, I couldn’t see my own comments or posts on subs that were private. When I tried to delete them via API/script it got me an error too.
However, there’s an exception. If you are a mod or approved user for a sub, then you can see and edit/delete as normal. I have never tried this scenario but maybe in this case when it go public again, any deletions are undone (because of the caching issue).
deleted by creator
The law doesn’t care how they handle the data or if a subreddit is private. If someone requests their data to be deleted, everything must be deleted.
Correct, which is why reddit must ultimately do it. Only they would have access.
I agree, blacked.out subs is why comments are coming.back on the profile page…but there is another issue about the 1000 post limit on the profile page. That means you can Google you comments but never see them on profile
It often happens to be both, though.
Which seems particularly likely in this case, given Spez & co.'s track record of being both malicious and stupid, more often than not at the same time…
I’m honestly not surprised at all. The content you created for them is valuable and they’re expecting individual users not to fight back or even notice. They have the power and thicker wallets on their side.
Good thing about gdpr is the data commissioner will fight it for you
The question would be, are comments and posts personal data which is the only thing GDPR covers?
I made the script here to overwrite AND delete comments because this move was about unpredictability as gravity
It also has options to remove submissions, up/downvotes and subscriptions
deleted by creator
I hope he just manually deleted it for the recording and then switched to a tool to do it automatically at least. lol
Reddit’s a US company and GDPR is EU law. Why would an American company be expected to follow EU laws?
(Not a shill, just genuinely interested. It wouldn’t occur to me as a Brit to demand Reddit comply with GDPR.)
It seems that foreign companies still have to comply if they are offering goods or services to or monitoring data of people in the EU. I’m not sure if this applies to Reddit in this case but it can be necessary for American companies to comply with the GDPR.