Artificial intelligence: Difference between revisions

Kirb (talk | contribs)
Plankton (talk | contribs)
Line 6: Line 6:


==Unethical website scraping==
==Unethical website scraping==
Further Reading: [[Nonconsensual Scraping|Nonconsensual scraping]]
While "mainstream" companies such as [[OpenAI]], [[Anthropic]], and [[Meta]] appear to correctly follow industry-standard practice for web crawlers, others ignore them, causing [[wikipedia:Denial-of-service attack|distributed denial of service attacks]] which damage access to freely-accessible websites. This is particularly an issue for websites that are large or contain many dynamic links.
While "mainstream" companies such as [[OpenAI]], [[Anthropic]], and [[Meta]] appear to correctly follow industry-standard practice for web crawlers, others ignore them, causing [[wikipedia:Denial-of-service attack|distributed denial of service attacks]] which damage access to freely-accessible websites. This is particularly an issue for websites that are large or contain many dynamic links.


Line 88: Line 89:


On 17 March 2025, the Git source code host SourceHut announced that the service was being disrupted by large language model crawlers. Mitigations deployed to reduce disruption involved requiring login for some areas of the service, and blocking IP ranges of cloud providers, affecting legitimate use of the website by its users.<ref>https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/</ref> In response to the event, SourceHut founder Drew DeVault wrote a blog post entitled "[https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html Please stop externalizing your costs directly into my face]", discussing his frustrations with having ongoing and ever-adapting attacks that must be addressed in a timely fashion to reduce disruption to legitimate SourceHut users. DeVault estimates that between "20-100%" of his time is now spent addressing such attacks.
On 17 March 2025, the Git source code host SourceHut announced that the service was being disrupted by large language model crawlers. Mitigations deployed to reduce disruption involved requiring login for some areas of the service, and blocking IP ranges of cloud providers, affecting legitimate use of the website by its users.<ref>https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/</ref> In response to the event, SourceHut founder Drew DeVault wrote a blog post entitled "[https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html Please stop externalizing your costs directly into my face]", discussing his frustrations with having ongoing and ever-adapting attacks that must be addressed in a timely fashion to reduce disruption to legitimate SourceHut users. DeVault estimates that between "20-100%" of his time is now spent addressing such attacks.


==Privacy concerns of online AI models==
==Privacy concerns of online AI models==