Jump to content

Internet Archive: Difference between revisions

From Consumer Rights Wiki
Thebirdwashere (talk | contribs)
m Add sidebar and logo
Clean-up; pass on style.
 
(30 intermediate revisions by 12 users not shown)
Line 1: Line 1:
{{InfoboxCompany
{{StubNotice}}
| Name = Netflix, Inc.
{{CompanyCargo
| Type = Private
|Description=American digital library hosting scanned books, music, videos, software, and archived websites.
| Founded = 1996
|Founded=1996
| Industry = Digital Library
|Industry=Archive, Library
| Official Website = https://archive.org/
|Logo=Internet Archive.png
| Logo = Internet Archive.png
|ParentCompany=
}}(This page is a placeholder made to remove a dead link)
|Type=Non-profit
|Website=https://archive.org/
}}
The '''{{Wplink|Internet Archive}}''' is an American non-profit digital library founded in 1996 to provide free "universal access to all knowledge" and preserve digital history.
 
The archive can be a useful resource for consumers to access information about discontinued products, companies which are no longer operating, and articles which are removed from web sites.
 
==Consumer impact summary==
{{Ph-C-CIS}}
 
==Incidents==
This is a list of all consumer-protection incidents this company is involved in. Any incidents not mentioned here can be found in the [[:Category:{{FULLPAGENAME}}|{{PAGENAME}} category]].
 
===Archived website removal===
{{Main|Internet Archive/Blocked companies}}
 
The Archive accepts [[DMCA]] take-down requests of websites whose owners no longer want their sites archived,<ref>{{Cite web |last=Bixenspan |first=David |date=28 Nov 2018 |title=When the Internet Archive Forgets |url=https://gizmodo.com/when-the-internet-archive-forgets-1830462131 |url-status=live |archive-url=https://web.archive.org/web/20250805030527/https://gizmodo.com/when-the-internet-archive-forgets-1830462131 |archive-date=5 Aug 2025 |access-date=31 Aug 2025 |website=Gizmodo}}</ref> causing certain sites to be inaccessible.
 
The Internet Archive ''used'' to hide material covered by robots.txt restrictions but that was changed on 17 April 2017.<ref>{{Cite web |last=Graham |first=Mark |date=17 Apr 2017 |title=Robots.txt meant for search engines don’t work well for web archives |url=https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |url-status=live |archive-url=https://web.archive.org/web/20170417131508/http://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |archive-date=17 Apr 2017 |access-date=31 Aug 2025 |website=Internet Archive}}</ref>
 
===Login-only items for legally dubious content (2016—Present)===
On 13 January 2016, Hank Bromley (hank_b) of the Internet Archive created a collection of uploads considered legally dubious and only viewable with an account.<ref>{{Cite web |title=Download & Streaming : Log In Required : Internet Archive |url=https://archive.org/details/loggedin?tab=about |url-status=live |archive-url=https://megalodon.jp/2024-0311-0532-32/https://archive.org:443/details/loggedin?tab=about |archive-date=11 Mar 2024 |access-date=16 Aug 2025 |website=Internet Archive}}</ref>
 
These uploads cannot be viewed or downloaded by logged-out users but can be accessed by anyone with an account.<ref>{{Cite web |title=Internet Archive Forums: Log In Required, after logging in. |url=https://archive.org/post/1092552/log-in-required-after-logging-in |url-status=live |archive-url=https://web.archive.org/web/20260222222400/https://archive.org/post/1092552/log-in-required-after-logging-in |archive-date=22 Feb 2026|access-date=16 Aug 2025 |website=Internet Archive}}</ref>
 
===Removal of noindex function on uploaded items (''2023'')===
In 2023 the Internet Archive reportedly removed the ability for users to use the noindex function, which used to result in the items being hidden from its internal search engine, while making the items whose noindex value is true to appear on the search engine. The decision was criticized on the grounds that it may jeopardize users' rights, including privacy. When confronted about it, Jason Scott, who was a staff member of the Internet Archive, reportedly responded with the following:<ref>{{Cite web |author=yopmailpublic |title=The removal of "noindex" from the Internet Archive, and associated risks. |url=https://old.reddit.com/r/DataHoarder/comments/156s7di/the_removal_of_noindex_from_the_internet_archive/ |website=[[Reddit]] |date=22 Jul 2023 |access-date=8 Jun 2026 |url-status=live |archive-url=https://web.archive.org/web/20241214121917/https://old.reddit.com/r/DataHoarder/comments/156s7di/the_removal_of_noindex_from_the_internet_archive/ |archive-date=14 Dec 2024}}</ref><ref>{{Cite web |author=Chronos-X4 |title=Internet Archive Ish |url=https://old.reddit.com/r/NoStupidQuestions/comments/142nm9h/internet_archive_ish/ |website=[[Reddit]] |date=6 Jun 2023 |access-date=8 Jun 2026 |url-status=live |archive-url=https://web.archive.org/web/20241215072041/https://old.reddit.com/r/NoStupidQuestions/comments/142nm9h/internet_archive_ish/ |archive-date=15 Dec 2024}}</ref>


<blockquote>''There is no bug or mistake in removing no-index settings for many Internet Archive items in the Community collection.''


https://archive.org/
''At no point was the Archive contacted to arrange a situation of no-indexing (or Darking) items with an intention of later release; the no-index setting was not documented for this use, and represented a security hole that was closed. Tens of thousands of items were found, being used for encrypted files hidden from the search engine, and represented a major problem, so many items have been removed or set noindex quickly.''


https://web.archive.org/
''A number of people have contacted us explaining situations where items might need to be made no-indexed, in a collection for later or timed release for example, but they've done it with communication and discussing their needs, not just uploading files under disposable accounts and then assuming the archive would keep them un-accessible in perpetuity. In some cases their requests have gotten arrangements so that community items that were noindex are noindex again, in separate collections.''


{{StubNotice}}
''A situation can theoretically exist where the original uploader can e-mail us from their e-mail address and discuss arrangements, but you've indicated you intentionally obfuscated your location and have disposed your addresses. If you're able to gain access again, you can mail through those addresses.''
 
''An additional situation is you can e-mail [email protected] if you want to report items at the archive (by identifier) that you believe might need to be removed from the archive; we receive a number of these requests throughout the months and respond according to policy.''</blockquote>
 
The following pseudo-code was shared by a user who criticized the decision, if the Internet Archive decides to reinstate the ability for users to use the noindex function while re-hiding all the formerly noindexed items from the search engines:
 
<blockquote>noindex items if:
 
(
 
items-noindexed-by-user-in-the-past = true;
 
OR items-noindexed-by-IA-in-the-past = true);
 
 
AND (
 
items-get-reindexed-voluntarily-by-USER-before-May-2023 = false;
 
OR
 
items-get-reindexed-voluntarily-by-IA-before-May-2023 = false;
 
)</blockquote>
 
===Data breaches (''2012—2024'')===
On 19 May 2017, The Archive's Development Manager made a blog post detailing that anyone who had created their account before 2012 had to change their password as the site had been breached with user's public information and lightly encrypted passwords being leaked.<ref>{{Cite web |last=Barrett |first=Katie |title=Re: User account breach |url=https://blog.archive.org/2017/05/19/re-user-account-breach/ |website=Internet Archive |date=19 May 2017 |access-date=16 Aug 2025 |url-status=live |archive-url=https://web.archive.org/web/20250520030556/https://blog.archive.org/2017/05/19/re-user-account-breach/ |archive-date=20 May 2025}}</ref>
 
On 9 October 2024, users on the Internet Archive received pop-ups that the website had been hacked, with the notifications appearing from the perpetrators at around 9PM CST.<ref>{{Cite web |author=Dark Web Informer |title=Dark Web Informer on X |url=https://x.com/DarkWebInformer/status/1844123206413943274 |website=[[X]] |date=9 Oct 2024 |access-date=8 Jun 2026 |url-status=live |archive-url=https://web.archive.org/web/20260321121941/https://nitter.us.catsarch.com/DarkWebInformer/status/1844123206413943274 |archive-date=21 Mar 2026}}</ref> An hour later, Troy Hunt of HaveIBeenPwned confirmed the breach.<ref>{{Cite web |last=Hunt |first=Troy |title=Troy Hunt on X |url=https://x.com/troyhunt/status/1844136762727448644 |website=[[X]] |date=9 Oct 2024 |access-date=8 Jun 2026 |url-status=live |archive-url=https://web.archive.org/web/20260321122129/https://nitter.us.catsarch.com/troyhunt/status/1844136762727448644 |archive-date=21 Mar 2026}}</ref>
 
Around 31 million users were affected with their user IDs, Emails, encrypted passwords and user names being leaked.<ref>{{Cite web |last=LeClair |first=Dave |title=31 million users impacted by Internet Archive data breach — what we know |url=https://www.tomsguide.com/computing/online-security/31-million-users-impacted-by-internet-archive-data-breach-what-we-know |website=Tom's Guide |date=11 Oct 2024 |access-date=8 Jun 2026 |url-status=live |archive-url=https://web.archive.org/web/20241109231711/https://www.tomsguide.com/computing/online-security/31-million-users-impacted-by-internet-archive-data-breach-what-we-know |archive-date=9 Nov 2024}}</ref>
 
===Website no longer usable without JavaScript (''2023—Present'')===
Up until 2022, Archive.org was one of the few remaining major websites that could be browsed and searched without [[JavaScript]]. JavaScript was mostly only used for features that couldn't be implemented otherwise, such as for enabling bottomless scrolling. This is known as progressive enhancement.<ref name=jakearchibald>{{Cite web |last=Archibald |first=Jake |title=Progressive enhancement is still important |url=https://jakearchibald.com/2013/progressive-enhancement-still-important/ |website=Jake Archibald |date=3 Jul 2013 |access-date=8 Jun 2026 |url-status=live |archive-url=https://web.archive.org/web/20130907024529/http://jakearchibald.com/2013/progressive-enhancement-still-important/ |archive-date=7 Sep 2013}}</ref>


==Censorship==
Since 2023 however, large parts of the Archive.org website (including the home page, collection pages, and the search engine) can no longer be browsed at all without JavaScript, because the legacy HTML-based user interface was replaced with a Google Lit web app. As of April 2026, only individual item pages remain viewable without JavaScript.<ref>Before change: [https://ghostarchive.org/archive/3vxC8 2023-06-28]. After change: [https://ghostarchive.org/archive/sdLIp 2023-09-28]</ref><!-- Editor note: I also know this from personal experience, but given that archive.org/details was excluded from the Wayback Machine and Archive Today converts everything to static HTML, there is not much of a historical record available for these changes. User account pages (archive.org/details/@...) were made JS-only in March 2024, but I'll have to find a source for this. -->
Topics:


Login-only items for legally dubious content,
This made it impossible to browse the site on legacy systems that do not support modern web browsers, as well as minimalist web browser alternatives to the [[Google]]-[[Mozilla]] duopoly, and slowed down loading on modern web browsers because lots of code has to be executed before any content can appear on screen, putting the content at the end of the rendering path.<ref>{{Cite web |author= |title=Critical rendering path – Mozilla Developer Network |url=https://developer.mozilla.org/en-US/docs/Web/Performance/Guides/Critical_rendering_path |website=[[Mozilla]] |date= |access-date=8 Jun 2026 |url-status=live |archive-url=https://web.archive.org/web/20260531101952/https://developer.mozilla.org/en-US/docs/Web/Performance/Guides/Critical_rendering_path |archive-date=31 May 2026}}</ref>


Obeys removal requests by site owners sometimes of entire domains,
==References==
{{Reflist}}


Retroactively removes (hides) material covered by robots.txt restrictions (They may have stopped doing this - also don't confuse this for the [https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ criteria for actually saving/archiving a page], I am talking about end user access to saved/archived content)
[[Category:{{PAGENAME}}]]

Latest revision as of 00:54, 9 June 2026

This article is a stub. You can help by expanding it.

A moderator needs to check the page before this notice can be removed. Visit the noticeboard or the #appeals channel in either Zulip or Discord to request removal.
More info ▼

An article may be flagged as a stub when it is missing major elements needed to make it useful to a reader. You can help by adding missing sections, verifiable sources, relevant company policies and communications, etc. to make the article more complete.

Internet Archive
Basic information
Founded 1996
Legal Structure Non-profit
Industry Archive, Library
Also known as
Official website https://archive.org/

The Internet Archive is an American non-profit digital library founded in 1996 to provide free "universal access to all knowledge" and preserve digital history.

The archive can be a useful resource for consumers to access information about discontinued products, companies which are no longer operating, and articles which are removed from web sites.

Consumer impact summary

[edit | edit source]

Overview of concerns that arise from the conduct towards users of the product (if applicable):

  • User freedom
  • User privacy
  • Business model
  • Market control

Add your text below this box. Once this section is complete, delete this box by clicking on it and pressing backspace.


Incidents

[edit | edit source]

This is a list of all consumer-protection incidents this company is involved in. Any incidents not mentioned here can be found in the Internet Archive category.

Archived website removal

[edit | edit source]
Main article: Internet Archive/Blocked companies

The Archive accepts DMCA take-down requests of websites whose owners no longer want their sites archived,[1] causing certain sites to be inaccessible.

The Internet Archive used to hide material covered by robots.txt restrictions but that was changed on 17 April 2017.[2]

Login-only items for legally dubious content (2016—Present)

[edit | edit source]

On 13 January 2016, Hank Bromley (hank_b) of the Internet Archive created a collection of uploads considered legally dubious and only viewable with an account.[3]

These uploads cannot be viewed or downloaded by logged-out users but can be accessed by anyone with an account.[4]

Removal of noindex function on uploaded items (2023)

[edit | edit source]

In 2023 the Internet Archive reportedly removed the ability for users to use the noindex function, which used to result in the items being hidden from its internal search engine, while making the items whose noindex value is true to appear on the search engine. The decision was criticized on the grounds that it may jeopardize users' rights, including privacy. When confronted about it, Jason Scott, who was a staff member of the Internet Archive, reportedly responded with the following:[5][6]

There is no bug or mistake in removing no-index settings for many Internet Archive items in the Community collection.

At no point was the Archive contacted to arrange a situation of no-indexing (or Darking) items with an intention of later release; the no-index setting was not documented for this use, and represented a security hole that was closed. Tens of thousands of items were found, being used for encrypted files hidden from the search engine, and represented a major problem, so many items have been removed or set noindex quickly.

A number of people have contacted us explaining situations where items might need to be made no-indexed, in a collection for later or timed release for example, but they've done it with communication and discussing their needs, not just uploading files under disposable accounts and then assuming the archive would keep them un-accessible in perpetuity. In some cases their requests have gotten arrangements so that community items that were noindex are noindex again, in separate collections.

A situation can theoretically exist where the original uploader can e-mail us from their e-mail address and discuss arrangements, but you've indicated you intentionally obfuscated your location and have disposed your addresses. If you're able to gain access again, you can mail through those addresses.

An additional situation is you can e-mail [email protected] if you want to report items at the archive (by identifier) that you believe might need to be removed from the archive; we receive a number of these requests throughout the months and respond according to policy.

The following pseudo-code was shared by a user who criticized the decision, if the Internet Archive decides to reinstate the ability for users to use the noindex function while re-hiding all the formerly noindexed items from the search engines:

noindex items if:

(

items-noindexed-by-user-in-the-past = true;

OR items-noindexed-by-IA-in-the-past = true);


AND (

items-get-reindexed-voluntarily-by-USER-before-May-2023 = false;

OR

items-get-reindexed-voluntarily-by-IA-before-May-2023 = false;

)

Data breaches (2012—2024)

[edit | edit source]

On 19 May 2017, The Archive's Development Manager made a blog post detailing that anyone who had created their account before 2012 had to change their password as the site had been breached with user's public information and lightly encrypted passwords being leaked.[7]

On 9 October 2024, users on the Internet Archive received pop-ups that the website had been hacked, with the notifications appearing from the perpetrators at around 9PM CST.[8] An hour later, Troy Hunt of HaveIBeenPwned confirmed the breach.[9]

Around 31 million users were affected with their user IDs, Emails, encrypted passwords and user names being leaked.[10]

Website no longer usable without JavaScript (2023—Present)

[edit | edit source]

Up until 2022, Archive.org was one of the few remaining major websites that could be browsed and searched without JavaScript. JavaScript was mostly only used for features that couldn't be implemented otherwise, such as for enabling bottomless scrolling. This is known as progressive enhancement.[11]

Since 2023 however, large parts of the Archive.org website (including the home page, collection pages, and the search engine) can no longer be browsed at all without JavaScript, because the legacy HTML-based user interface was replaced with a Google Lit web app. As of April 2026, only individual item pages remain viewable without JavaScript.[12]

This made it impossible to browse the site on legacy systems that do not support modern web browsers, as well as minimalist web browser alternatives to the Google-Mozilla duopoly, and slowed down loading on modern web browsers because lots of code has to be executed before any content can appear on screen, putting the content at the end of the rendering path.[13]

References

[edit | edit source]
  1. Bixenspan, David (28 Nov 2018). "When the Internet Archive Forgets". Gizmodo. Archived from the original on 5 Aug 2025. Retrieved 31 Aug 2025.
  2. Graham, Mark (17 Apr 2017). "Robots.txt meant for search engines don't work well for web archives". Internet Archive. Archived from the original on 17 Apr 2017. Retrieved 31 Aug 2025.
  3. "Download & Streaming : Log In Required : Internet Archive". Internet Archive. Archived from the original on 11 Mar 2024. Retrieved 16 Aug 2025.
  4. "Internet Archive Forums: Log In Required, after logging in". Internet Archive. Archived from the original on 22 Feb 2026. Retrieved 16 Aug 2025.
  5. yopmailpublic (22 Jul 2023). "The removal of "noindex" from the Internet Archive, and associated risks". Reddit. Archived from the original on 14 Dec 2024. Retrieved 8 Jun 2026.
  6. Chronos-X4 (6 Jun 2023). "Internet Archive Ish". Reddit. Archived from the original on 15 Dec 2024. Retrieved 8 Jun 2026.{{cite web}}: CS1 maint: numeric names: authors list (link)
  7. Barrett, Katie (19 May 2017). "Re: User account breach". Internet Archive. Archived from the original on 20 May 2025. Retrieved 16 Aug 2025.
  8. Dark Web Informer (9 Oct 2024). "Dark Web Informer on X". X. Archived from the original on 21 Mar 2026. Retrieved 8 Jun 2026.
  9. Hunt, Troy (9 Oct 2024). "Troy Hunt on X". X. Archived from the original on 21 Mar 2026. Retrieved 8 Jun 2026.
  10. LeClair, Dave (11 Oct 2024). "31 million users impacted by Internet Archive data breach — what we know". Tom's Guide. Archived from the original on 9 Nov 2024. Retrieved 8 Jun 2026.
  11. Archibald, Jake (3 Jul 2013). "Progressive enhancement is still important". Jake Archibald. Archived from the original on 7 Sep 2013. Retrieved 8 Jun 2026.
  12. Before change: 2023-06-28. After change: 2023-09-28
  13. "Critical rendering path – Mozilla Developer Network". Mozilla. Archived from the original on 31 May 2026. Retrieved 8 Jun 2026.