Internet Archive: Difference between revisions
m Corrected information |
→Website no longer usable without JavaScript (2023): + minimalist browsers |
||
| (10 intermediate revisions by 6 users not shown) | |||
| Line 1: | Line 1: | ||
{{StubNotice}} | {{StubNotice}} | ||
{{ | {{CompanyCargo | ||
| | |Description=American digital library hosting scanned books, music, videos, software, and archived websites. | ||
|Founded=1996 | |||
| Founded = 1996 | |Industry=Archive, Library | ||
| Industry = | |Logo=Internet Archive.png | ||
| | |ParentCompany= | ||
|Type=Non-profit | |||
|Website=https://archive.org/ | |||
}} | }} | ||
The '''{{Wplink|Internet Archive}}''' is an American non-profit digital library founded in 1996 to provide free "universal access to all knowledge" and preserve digital history. | The '''{{Wplink|Internet Archive}}''' is an American non-profit digital library founded in 1996 to provide free "universal access to all knowledge" and preserve digital history. | ||
==Consumer-impact summary== | ==Consumer-impact summary== | ||
| Line 19: | Line 19: | ||
===Login-only items for legally dubious content (2016-present)=== | ===Login-only items for legally dubious content (2016-present)=== | ||
On January 13, 2016, Hank Bromley (hank_b) of the Internet Archive created a collection of uploads considered legally dubious and only viewable with an account.<ref>{{Cite web |title=Download & Streaming : Log In Required : Internet Archive |url=https://archive.org/details/loggedin?tab=about |url-status=live |archive-url=https://archive. | On January 13, 2016, Hank Bromley (hank_b) of the Internet Archive created a collection of uploads considered legally dubious and only viewable with an account.<ref>{{Cite web |title=Download & Streaming : Log In Required : Internet Archive |url=https://archive.org/details/loggedin?tab=about |url-status=live |archive-url=https://megalodon.jp/2024-0311-0532-32/https://archive.org:443/details/loggedin?tab=about |archive-date=2024-03-11 |access-date=2025-08-16 |website=[[Internet Archive]]}}</ref> | ||
These uploads cannot be viewed or downloaded by logged-out users but can be accessed by anyone with an account.<ref>{{Cite web |title=Internet Archive Forums: Log In Required, after logging in. |url=https://archive.org/post/1092552/log-in-required-after-logging-in |url-status=live |archive-url=https://archive. | These uploads cannot be viewed or downloaded by logged-out users but can be accessed by anyone with an account.<ref>{{Cite web |title=Internet Archive Forums: Log In Required, after logging in. |url=https://archive.org/post/1092552/log-in-required-after-logging-in |url-status=live |archive-url=https://web.archive.org/web/20260222222400/https://archive.org/post/1092552/log-in-required-after-logging-in |archive-date=22 Feb 2026|access-date=2025-08-16 |website=[[Internet Archive]]}}</ref> | ||
===Archived website removal=== | ===Archived website removal=== | ||
The Archive accepts DMCA takedown requests of websites whose owners no longer want their sites archived<ref>{{Cite news |last=Bixenspan |first=David |date=2018-11-28 |title=When the Internet Archive Forgets |url=https://gizmodo.com/when-the-internet-archive-forgets-1830462131 |url-status=live |access-date=2025-08-31 |work=[[Gizmodo]]}}</ref> causing certain sites to be inaccessible. | {{Main|Internet Archive/Blocked companies}} | ||
The Archive accepts DMCA takedown requests of websites whose owners no longer want their sites archived<ref>{{Cite news |last=Bixenspan |first=David |date=2018-11-28 |title=When the Internet Archive Forgets |url=https://gizmodo.com/when-the-internet-archive-forgets-1830462131 |url-status=live |archive-url=https://web.archive.org/web/20250805030527/https://gizmodo.com/when-the-internet-archive-forgets-1830462131 |archive-date=2025-08-05 |access-date=2025-08-31 |work=[[Gizmodo]]}}</ref> causing certain sites to be inaccessible. | |||
The Internet Archive ''used'' to hide material covered by robots.txt restrictions but that was changed on April 17, 2017.<ref>{{Cite web |last=Graham |first=Mark |date=2017-04-17 |title=Robots.txt meant for search engines don’t work well for web archives |url=https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |url-status=live |archive-url=https://web.archive.org/web/20170417131508/http://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |archive-date=2017-04-17 |access-date=2025-08-31 |website=Internet Archive}}</ref> | The Internet Archive ''used'' to hide material covered by robots.txt restrictions but that was changed on April 17, 2017.<ref>{{Cite web |last=Graham |first=Mark |date=2017-04-17 |title=Robots.txt meant for search engines don’t work well for web archives |url=https://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |url-status=live |archive-url=https://web.archive.org/web/20170417131508/http://blog.archive.org/2017/04/17/robots-txt-meant-for-search-engines-dont-work-well-for-web-archives/ |archive-date=2017-04-17 |access-date=2025-08-31 |website=Internet Archive}}</ref> | ||
===Removal of noindex function on uploaded items=== | ===Removal of noindex function on uploaded items=== | ||
On 2023 the Internet Archive reportedly removed the ability for users to use the noindex function, which used to result in the items being hidden from its internal search engine, while making the items whose noindex value is true to appear on the search engine. The decision was criticized on the grounds that it may jeopardize users' rights, including privacy. When confronted about it, Jason Scott, who's a staffmember of the Internet Archive, reportedly responded with the following:<ref>https://old.reddit.com/r/DataHoarder/comments/156s7di/the_removal_of_noindex_from_the_internet_archive/</ref><ref>https://old.reddit.com/r/NoStupidQuestions/comments/142nm9h/internet_archive_ish/</ref> | On 2023 the Internet Archive reportedly removed the ability for users to use the noindex function, which used to result in the items being hidden from its internal search engine, while making the items whose noindex value is true to appear on the search engine. The decision was criticized on the grounds that it may jeopardize users' rights, including privacy. When confronted about it, Jason Scott, who's a staffmember of the Internet Archive, reportedly responded with the following:<ref>{{Cite web |date=2023-07-22 |title=The removal of "noindex" from the Internet Archive, and associated risks. |url=https://old.reddit.com/r/DataHoarder/comments/156s7di/the_removal_of_noindex_from_the_internet_archive/ |url-status=live |archive-url=https://web.archive.org/web/20241214121917/https://old.reddit.com/r/DataHoarder/comments/156s7di/the_removal_of_noindex_from_the_internet_archive/ |archive-date=2024-12-14 |access-date=2025-10-28 |website=Reddit}}</ref><ref>{{Cite web |date=2023-06-06 |title=Internet Archive Ish |url=https://old.reddit.com/r/NoStupidQuestions/comments/142nm9h/internet_archive_ish/ |url-status=live |archive-url=https://web.archive.org/web/20241215072041/https://old.reddit.com/r/NoStupidQuestions/comments/142nm9h/internet_archive_ish/ |archive-date=2024-12-15 |access-date=2025-10-28 |website=Reddit}}</ref> | ||
< | <blockquote>''There is no bug or mistake in removing no-index settings for many Internet Archive items in the Community collection.'' | ||
''At no point was the Archive contacted to arrange a situation of no-indexing (or Darking) items with an intention of later release; the no-index setting was not documented for this use, and represented a security hole that was closed. Tens of thousands of items were found, being used for encrypted files hidden from the search engine, and represented a major problem, so many items have been removed or set noindex quickly.'' | ''At no point was the Archive contacted to arrange a situation of no-indexing (or Darking) items with an intention of later release; the no-index setting was not documented for this use, and represented a security hole that was closed. Tens of thousands of items were found, being used for encrypted files hidden from the search engine, and represented a major problem, so many items have been removed or set noindex quickly.'' | ||
| Line 39: | Line 40: | ||
''A situation can theoretically exist where the original uploader can e-mail us from their e-mail address and discuss arrangements, but you've indicated you intentionally obfuscated your location and have disposed your addresses. If you're able to gain access again, you can mail through those addresses.'' | ''A situation can theoretically exist where the original uploader can e-mail us from their e-mail address and discuss arrangements, but you've indicated you intentionally obfuscated your location and have disposed your addresses. If you're able to gain access again, you can mail through those addresses.'' | ||
''An additional situation is you can e-mail [email protected] if you want to report items at the archive (by identifier) that you believe might need to be removed from the archive; we receive a number of these requests throughout the months and respond according to policy.''</ | ''An additional situation is you can e-mail [email protected] if you want to report items at the archive (by identifier) that you believe might need to be removed from the archive; we receive a number of these requests throughout the months and respond according to policy.''</blockquote> | ||
The following pseudocode was shared by a user who criticized the decision, if the Internet Archive decides to reinstate the ability for users to use the noindex function while re-hiding all the formerly noindexed items from the search engines: | The following pseudocode was shared by a user who criticized the decision, if the Internet Archive decides to reinstate the ability for users to use the noindex function while re-hiding all the formerly noindexed items from the search engines: | ||
< | <blockquote>noindex items if: | ||
( | ( | ||
| Line 60: | Line 61: | ||
items-get-reindexed-voluntarily-by-IA-before-May-2023 = false; | items-get-reindexed-voluntarily-by-IA-before-May-2023 = false; | ||
)</ | )</blockquote> | ||
===Data breaches (2012-2024)=== | ===Data breaches (2012-2024)=== | ||
On May 19, 2017, The Archive's Development Manager made a blog post detailing that anyone who had created their account before 2012 had to change their password as the site had been breached with user's public information and lightly encrypted passwords being leaked.<ref>{{Cite web |last=Barrett |first=Katie |date=2017-05-19 |title=Re: User account breach {{!}} Internet Archive Blogs |url=https://blog.archive.org/2017/05/19/re-user-account-breach/ |url-status=live |archive-url=https://web.archive.org/web/20250520030556/https://blog.archive.org/2017/05/19/re-user-account-breach/ |archive-date=2025-05-20 |access-date=2025-08-16 |website=[[Internet Archive]]}}</ref> | On May 19, 2017, The Archive's Development Manager made a blog post detailing that anyone who had created their account before 2012 had to change their password as the site had been breached with user's public information and lightly encrypted passwords being leaked.<ref>{{Cite web |last=Barrett |first=Katie |date=2017-05-19 |title=Re: User account breach {{!}} Internet Archive Blogs |url=https://blog.archive.org/2017/05/19/re-user-account-breach/ |url-status=live |archive-url=https://web.archive.org/web/20250520030556/https://blog.archive.org/2017/05/19/re-user-account-breach/ |archive-date=2025-05-20 |access-date=2025-08-16 |website=[[Internet Archive]]}}</ref> | ||
On October 9, 2024, users on the Internet Archive got pop-ups that the website had been hacked with notifications appearing from the perpetrators at around 9PM CST,<ref>{{Cite web |date=2024-10-09 |title=Dark Web Informer on X |url=https:// | On October 9, 2024, users on the Internet Archive got pop-ups that the website had been hacked with notifications appearing from the perpetrators at around 9PM CST,<ref>{{Cite web |date=2024-10-09 |title=Dark Web Informer on X |url=https://nitter.us.catsarch.com/DarkWebInformer/status/1844123206413943274 |url-status=live |archive-url=https://web.archive.org/web/20260321121941/https://nitter.us.catsarch.com/DarkWebInformer/status/1844123206413943274 |archive-date=21 Mar 2026 |access-date=2025-08-16 |website=[[Twitter]]}}</ref> and an hour later Troy Hunt of HaveIBeenPwned confirmed the breach.<ref>{{Cite web |last=Hunt |first=Troy |date=2024-10-09 |title=Troy Hunt on X |url=https://nitter.us.catsarch.com/troyhunt/status/1844136762727448644 |url-status=live |archive-url=https://web.archive.org/web/20260321122129/https://nitter.us.catsarch.com/troyhunt/status/1844136762727448644 |archive-date=21 Mar 2026 |access-date=2025-08-16 |website=[[Twitter]]}}</ref> | ||
Around 31 million users were affected with their user IDs, Emails, encrypted passwords and usernames being leaked.<ref>{{Cite news |last=LeClair |first=Dave |date=2024-10-11 |title=31 million users impacted by Internet Archive data breach — what we know |url=https://www.tomsguide.com/computing/online-security/31-million-users-impacted-by-internet-archive-data-breach-what-we-know |url-status=live |archive-url=https://web.archive.org/web/20241109231711/https://www.tomsguide.com/computing/online-security/31-million-users-impacted-by-internet-archive-data-breach-what-we-know |archive-date=2024-11-09 |access-date=2025-08-16 |work=Tom's Guide}}</ref> | Around 31 million users were affected with their user IDs, Emails, encrypted passwords and usernames being leaked.<ref>{{Cite news |last=LeClair |first=Dave |date=2024-10-11 |title=31 million users impacted by Internet Archive data breach — what we know |url=https://www.tomsguide.com/computing/online-security/31-million-users-impacted-by-internet-archive-data-breach-what-we-know |url-status=live |archive-url=https://web.archive.org/web/20241109231711/https://www.tomsguide.com/computing/online-security/31-million-users-impacted-by-internet-archive-data-breach-what-we-know |archive-date=2024-11-09 |access-date=2025-08-16 |work=Tom's Guide}}</ref> | ||
=== Website no longer usable without JavaScript (2023) === | |||
Up until 2022, Archive.org was one of the few remaining major websites that could be browsed and searched without [[JavaScript]]. JavaScript was only used where necessary, for example to enable bottomless scrolling. This is known as progressive enhancement.<ref name=jakearchibald>{{cite web |url=https://jakearchibald.com/2013/progressive-enhancement-still-important/ |title=Progressive enhancement is still important - JakeArchibald.com |date=2013-07-03 |access-date=2026-04-18 }}</ref> | |||
Since 2023 however, large parts of the Archive.org website (including the home page, collection pages, and the search engine) can no longer be browsed at all without JavaScript, because the legacy HTML-based user interface was replaced with a Google Lit web app. As of April 2026, only individual item pages remain viewable without JavaScript.<ref>Before change: [https://ghostarchive.org/archive/3vxC8 2023-06-28]. After change: [https://ghostarchive.org/archive/sdLIp 2023-09-28]</ref><!-- Editor note: I also know this from personal experience, but given that archive.org/details was excluded from the Wayback Machine and Archive Today converts everything to static HTML, there is not much of a historical record available for these changes. User account pages (archive.org/details/@...) were made JS-only in March 2024, but I'll have to find a source for this. --> | |||
This made it impossible to browse the site on legacy systems that do not support modern web browsers, as well as minimalist web browser alternatives to the Google-Mozilla duopoly, and slowed down loading on modern web browsers because lots of code has to be executed before any content can appear on screen, putting the content at the end of the rendering path.<ref>{{cite web |title=Critical rendering path – Mozilla Developer Network |url=https://developer.mozilla.org/en-US/docs/Web/Performance/Guides/Critical_rendering_path |access-date=2026-04-18 }}</ref> | |||
==References== | ==References== | ||