Internet Archive: Difference between revisions
| Line 31: | Line 31: | ||
On 2023 the Internet Archive reportedly removed the ability for users to use the noindex function, which used to result in the items being hidden from internal search engines. Some users criticized the decision on the grounds that it may jeopardize users' rights, including privacy. When confronted about it, Jason Scott, who's a staffmember of the Internet Archive, reportedly responded with the following:<ref>https://old.reddit.com/r/DataHoarder/comments/156s7di/the_removal_of_noindex_from_the_internet_archive/</ref><ref>https://old.reddit.com/r/NoStupidQuestions/comments/142nm9h/internet_archive_ish/</ref> | On 2023 the Internet Archive reportedly removed the ability for users to use the noindex function, which used to result in the items being hidden from internal search engines. Some users criticized the decision on the grounds that it may jeopardize users' rights, including privacy. When confronted about it, Jason Scott, who's a staffmember of the Internet Archive, reportedly responded with the following:<ref>https://old.reddit.com/r/DataHoarder/comments/156s7di/the_removal_of_noindex_from_the_internet_archive/</ref><ref>https://old.reddit.com/r/NoStupidQuestions/comments/142nm9h/internet_archive_ish/</ref> | ||
''There is no bug or mistake in removing no-index settings for many Internet Archive items in the Community collection. | <code>''There is no bug or mistake in removing no-index settings for many Internet Archive items in the Community collection.'' | ||
At no point was the Archive contacted to arrange a situation of no-indexing (or Darking) items with an intention of later release; the no-index setting was not documented for this use, and represented a security hole that was closed. Tens of thousands of items were found, being used for encrypted files hidden from the search engine, and represented a major problem, so many items have been removed or set noindex quickly. | ''At no point was the Archive contacted to arrange a situation of no-indexing (or Darking) items with an intention of later release; the no-index setting was not documented for this use, and represented a security hole that was closed. Tens of thousands of items were found, being used for encrypted files hidden from the search engine, and represented a major problem, so many items have been removed or set noindex quickly.'' | ||
A number of people have contacted us explaining situations where items might need to be made no-indexed, in a collection for later or timed release for example, but they've done it with communication and discussing their needs, not just uploading files under disposable accounts and then assuming the archive would keep them un-accessible in perpetuity. In some cases their requests have gotten arrangements so that community items that were noindex are noindex again, in separate collections. | ''A number of people have contacted us explaining situations where items might need to be made no-indexed, in a collection for later or timed release for example, but they've done it with communication and discussing their needs, not just uploading files under disposable accounts and then assuming the archive would keep them un-accessible in perpetuity. In some cases their requests have gotten arrangements so that community items that were noindex are noindex again, in separate collections.'' | ||
A situation can theoretically exist where the original uploader can e-mail us from their e-mail address and discuss arrangements, but you've indicated you intentionally obfuscated your location and have disposed your addresses. If you're able to gain access again, you can mail through those addresses. | ''A situation can theoretically exist where the original uploader can e-mail us from their e-mail address and discuss arrangements, but you've indicated you intentionally obfuscated your location and have disposed your addresses. If you're able to gain access again, you can mail through those addresses.'' | ||
An additional situation is you can e-mail [email protected] if you want to report items at the archive (by identifier) that you believe might need to be removed from the archive; we receive a number of these requests throughout the months and respond according to policy.'' | ''An additional situation is you can e-mail [email protected] if you want to report items at the archive (by identifier) that you believe might need to be removed from the archive; we receive a number of these requests throughout the months and respond according to policy.''</code> | ||
===Data breaches (2012-2024)=== | ===Data breaches (2012-2024)=== | ||
Revision as of 16:33, 28 October 2025
❗Article Status Notice: This Article is a stub
This article is underdeveloped, and needs additional work to meet the wiki's Content Guidelines and be in line with our Mission Statement for comprehensive coverage of consumer protection issues. Learn more ▼
| Basic information | |
|---|---|
| Founded | 1996 |
| Legal structure | Private |
| Industry | Digital Library |
| Official website | https://archive.org/ |
The Internet Archive is an American non-profit digital library founded in 1996 to provide free "universal access to all knowledge" and preserve digital history.
Consumer-impact summary
The archive can be a useful resource for consumers to access information about discontinued products, companies which are no longer operating, and articles which are removed from web sites.
[TBA]
Incidents
Login-only items for legally dubious content (2016-present)
On January 13, 2016, Hank Bromley (hank_b) of the Internet Archive created a collection of uploads considered legally dubious and only viewable with an account.[1]
These uploads cannot be viewed by logged-out users and cannot be downloaded by anyone except the admins, making any of these pieces of content inaccessible.[2]
Archived website removal
The Archive accepts DMCA takedown requests of websites whose owners no longer want their sites archived[3] causing certain sites to be inaccessible.
The Internet Archive used to hide material covered by robots.txt restrictions but that was changed on April 17, 2017.[4]
Removal of noindex on uploaded items
On 2023 the Internet Archive reportedly removed the ability for users to use the noindex function, which used to result in the items being hidden from internal search engines. Some users criticized the decision on the grounds that it may jeopardize users' rights, including privacy. When confronted about it, Jason Scott, who's a staffmember of the Internet Archive, reportedly responded with the following:[5][6]
There is no bug or mistake in removing no-index settings for many Internet Archive items in the Community collection.
At no point was the Archive contacted to arrange a situation of no-indexing (or Darking) items with an intention of later release; the no-index setting was not documented for this use, and represented a security hole that was closed. Tens of thousands of items were found, being used for encrypted files hidden from the search engine, and represented a major problem, so many items have been removed or set noindex quickly.
A number of people have contacted us explaining situations where items might need to be made no-indexed, in a collection for later or timed release for example, but they've done it with communication and discussing their needs, not just uploading files under disposable accounts and then assuming the archive would keep them un-accessible in perpetuity. In some cases their requests have gotten arrangements so that community items that were noindex are noindex again, in separate collections.
A situation can theoretically exist where the original uploader can e-mail us from their e-mail address and discuss arrangements, but you've indicated you intentionally obfuscated your location and have disposed your addresses. If you're able to gain access again, you can mail through those addresses.
An additional situation is you can e-mail [email protected] if you want to report items at the archive (by identifier) that you believe might need to be removed from the archive; we receive a number of these requests throughout the months and respond according to policy.
Data breaches (2012-2024)
On May 19, 2017, The Archive's Development Manager made a blog post detailing that anyone who had created their account before 2012 had to change their password as the site had been breached with user's public information and lightly encrypted passwords being leaked.[7]
On October 9, 2024, users on the Internet Archive got pop-ups that the website had been hacked with notifications appearing from the perpetrators at around 9PM CST,[8] and an hour later Troy Hunt of HaveIBeenPwned confirmed the breach.[9]
Around 31 million users were affected with their user IDs, Emails, encrypted passwords and usernames being leaked.[10]
References
- ↑ "Download & Streaming : Log In Required : Internet Archive". Internet Archive. Archived from the original on 2025-08-16. Retrieved 2025-08-16.
- ↑ "Internet Archive Forums: Log In Required, after logging in". Internet Archive. Archived from the original on 2025-08-16. Retrieved 2025-08-16.
- ↑ Bixenspan, David (2018-11-28). "When the Internet Archive Forgets". Gizmodo. Retrieved 2025-08-31.
{{cite news}}: CS1 maint: url-status (link) - ↑ Graham, Mark (2017-04-17). "Robots.txt meant for search engines don't work well for web archives". Internet Archive. Archived from the original on 2017-04-17. Retrieved 2025-08-31.
- ↑ https://old.reddit.com/r/DataHoarder/comments/156s7di/the_removal_of_noindex_from_the_internet_archive/
- ↑ https://old.reddit.com/r/NoStupidQuestions/comments/142nm9h/internet_archive_ish/
- ↑ Barrett, Katie (2017-05-19). "Re: User account breach | Internet Archive Blogs". Internet Archive. Archived from the original on 2025-05-20. Retrieved 2025-08-16.
- ↑ "Dark Web Informer on X". Twitter. 2024-10-09. Archived from the original on 2024-10-12. Retrieved 2025-08-16.
- ↑ Hunt, Troy (2024-10-09). "Troy Hunt on X: "Hi folks, yes, I'm aware of this. I've been in communication with the Internet Archive over the last few days re the data breach, didn't know the site was defaced until people started flagging it with me just now. More soon." / X". Twitter. Archived from the original on 2024-08-10. Retrieved 2025-08-16.
- ↑ LeClair, Dave (2024-10-11). "31 million users impacted by Internet Archive data breach — what we know". Tom's Guide. Archived from the original on 2024-11-09. Retrieved 2025-08-16.