<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://consumerrights.wiki/index.php?action=history&amp;feed=atom&amp;title=Artificial_intelligence%2Ftraining</id>
	<title>Artificial intelligence/training - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://consumerrights.wiki/index.php?action=history&amp;feed=atom&amp;title=Artificial_intelligence%2Ftraining"/>
	<link rel="alternate" type="text/html" href="https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;action=history"/>
	<updated>2026-04-29T05:05:17Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.44.0</generator>
	<entry>
		<id>https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52386&amp;oldid=prev</id>
		<title>Rudxain: mention chip shortage</title>
		<link rel="alternate" type="text/html" href="https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52386&amp;oldid=prev"/>
		<updated>2026-04-27T03:21:29Z</updated>

		<summary type="html">&lt;p&gt;mention chip shortage&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 03:21, 27 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l19&quot;&gt;Line 19:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 19:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Bandwidth abuse===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Bandwidth abuse===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Massive data needs massive bandwidth. Scraping web-pages across the entire internet requires sending millions of requests to all known servers. Some AI companies go as far as to &amp;#039;&amp;#039;repeatedly&amp;#039;&amp;#039; send requests for the same content (or several revisions of the same content) as frequent bursts in short intervals, which is indistinguishable from [[wikipedia:Denial-of-service_attack|distributed denial-of-service (DDoS) attacks]].&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Massive data needs massive bandwidth. Scraping web-pages across the entire internet requires sending millions of requests to all known servers. Some AI companies go as far as to &amp;#039;&amp;#039;repeatedly&amp;#039;&amp;#039; send requests for the same content (or several revisions of the same content) as frequent bursts in short intervals, which is indistinguishable from [[wikipedia:Denial-of-service_attack|distributed denial-of-service (DDoS) attacks]].&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;=== Chip shortage ===&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Many AI companies have pre-ordered massive amounts of computer components. So much that it doesn&#039;t even fit in their current data-centers. This is done in anticipation for &#039;&#039;more&#039;&#039; data-centers being built.{{Citation needed}} This has caused such components to become scarce, and prices to spike. The most notable being the [[wikipedia:2024–present_global_memory_supply_shortage|increase in RAM prices]].&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Examples==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Examples==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff:1.41:old-52385:rev-52386:php=table --&gt;
&lt;/table&gt;</summary>
		<author><name>Rudxain</name></author>
	</entry>
	<entry>
		<id>https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52385&amp;oldid=prev</id>
		<title>Rudxain: link GPU</title>
		<link rel="alternate" type="text/html" href="https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52385&amp;oldid=prev"/>
		<updated>2026-04-27T03:07:50Z</updated>

		<summary type="html">&lt;p&gt;link GPU&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 03:07, 27 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l15&quot;&gt;Line 15:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 15:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Energy use===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Energy use===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While [[Self-hosting|self-hosted]] models can be trained with a single consumer-grade GPU, data-centers with hundreds or thousands of GPUs (known for being more power-hungry than CPUs) are used to train corporate-grade (or &quot;enterprise&quot;) models. This can worsen [[wikipedia:Climate_change|climate change]].&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While [[Self-hosting|self-hosted]] models can be trained with a single consumer-grade &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[wikipedia:Graphics_processing_unit|&lt;/ins&gt;GPU&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;]]&lt;/ins&gt;, data-centers with hundreds or thousands of GPUs (known for being more power-hungry than CPUs) are used to train corporate-grade (or &quot;enterprise&quot;) models. This can worsen [[wikipedia:Climate_change|climate change]].&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Bandwidth abuse===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Bandwidth abuse===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff:1.41:old-52384:rev-52385:php=table --&gt;
&lt;/table&gt;</summary>
		<author><name>Rudxain</name></author>
	</entry>
	<entry>
		<id>https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52384&amp;oldid=prev</id>
		<title>Rudxain: style-guide compliance for DDoS</title>
		<link rel="alternate" type="text/html" href="https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52384&amp;oldid=prev"/>
		<updated>2026-04-27T03:05:26Z</updated>

		<summary type="html">&lt;p&gt;style-guide compliance for DDoS&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 03:05, 27 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l17&quot;&gt;Line 17:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 17:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While [[Self-hosting|self-hosted]] models can be trained with a single consumer-grade GPU, data-centers with hundreds or thousands of GPUs (known for being more power-hungry than CPUs) are used to train corporate-grade (or &amp;quot;enterprise&amp;quot;) models. This can worsen [[wikipedia:Climate_change|climate change]].&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While [[Self-hosting|self-hosted]] models can be trained with a single consumer-grade GPU, data-centers with hundreds or thousands of GPUs (known for being more power-hungry than CPUs) are used to train corporate-grade (or &amp;quot;enterprise&amp;quot;) models. This can worsen [[wikipedia:Climate_change|climate change]].&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;=== Bandwidth abuse ===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Bandwidth abuse===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Massive data needs massive bandwidth. Scraping web-pages across the entire internet requires sending millions of requests to all known servers. Some AI companies go as far as to &#039;&#039;repeatedly&#039;&#039; send requests for the same content (or several revisions of the same content) as frequent bursts in short intervals, which is indistinguishable from [[wikipedia:Denial-of-service_attack|DDoS attacks]].&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Massive data needs massive bandwidth. Scraping web-pages across the entire internet requires sending millions of requests to all known servers. Some AI companies go as far as to &#039;&#039;repeatedly&#039;&#039; send requests for the same content (or several revisions of the same content) as frequent bursts in short intervals, which is indistinguishable from [[wikipedia:Denial-of-service_attack|&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;distributed denial-of-service (&lt;/ins&gt;DDoS&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;) &lt;/ins&gt;attacks]].&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Examples==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Examples==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While &quot;mainstream&quot; companies such as [[OpenAI]], [[Anthropic]], and [[Meta]] appear to correctly follow industry-standard practice for web crawlers, others ignore them (such as [[wikipedia:Alibaba_Group|Alibaba]]&amp;lt;ref&amp;gt;{{Cite news |last=Venerandi |first=Niccolò |title=FOSS infrastructure is under attack by AI companies |url=https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies |access-date=2026-02-23 |work=LibreNews |archive-url=http://web.archive.org/web/20260217195639/https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/ |archive-date=17 Feb 2026}}&amp;lt;/ref&amp;gt;), causing &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[wikipedia:Denial-of-service attack|distributed denial of service &lt;/del&gt;attacks&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;]] &lt;/del&gt;which damage access to freely-accessible websites. This is particularly an issue for websites that are large or contain many dynamic links.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While &quot;mainstream&quot; companies such as [[OpenAI]], [[Anthropic]], and [[Meta]] appear to correctly follow industry-standard practice for web crawlers, others ignore them (such as [[wikipedia:Alibaba_Group|Alibaba]]&amp;lt;ref&amp;gt;{{Cite news |last=Venerandi |first=Niccolò |title=FOSS infrastructure is under attack by AI companies |url=https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies |access-date=2026-02-23 |work=LibreNews |archive-url=http://web.archive.org/web/20260217195639/https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/ |archive-date=17 Feb 2026}}&amp;lt;/ref&amp;gt;), causing &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;DDoS &lt;/ins&gt;attacks which damage access to freely-accessible websites. This is particularly an issue for websites that are large or contain many dynamic links.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Ethical website scrapers, known as &amp;quot;spiders&amp;quot; that crawl the web, follow a certain set of minimum guidelines. Specifically, they follow &amp;lt;code&amp;gt;[[wikipedia:robots.txt|robots.txt]]&amp;lt;/code&amp;gt;, a text file found at the root of a domain that indicates:&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Ethical website scrapers, known as &amp;quot;spiders&amp;quot; that crawl the web, follow a certain set of minimum guidelines. Specifically, they follow &amp;lt;code&amp;gt;[[wikipedia:robots.txt|robots.txt]]&amp;lt;/code&amp;gt;, a text file found at the root of a domain that indicates:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff:1.41:old-52383:rev-52384:php=table --&gt;
&lt;/table&gt;</summary>
		<author><name>Rudxain</name></author>
	</entry>
	<entry>
		<id>https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52383&amp;oldid=prev</id>
		<title>Rudxain: expand why-prob</title>
		<link rel="alternate" type="text/html" href="https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52383&amp;oldid=prev"/>
		<updated>2026-04-27T03:00:34Z</updated>

		<summary type="html">&lt;p&gt;expand why-prob&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 03:00, 27 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l7&quot;&gt;Line 7:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 7:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Why it is a problem==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Why it is a problem==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;=== Intellectual property laundering ===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===Intellectual property laundering===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Most, if not all, of the data used for training is copied indiscriminately, without even checking licenses or any copyright terms.{{Citation needed}} This is very controversial. Some people argue that it is &amp;quot;fair use&amp;quot; because AI systems learn in ways similar to animal and human brains, others claim it&amp;#039;s more like [[wikt:parroting|a parrot learning phrases]], others claim that it&amp;#039;s &amp;quot;transformative&amp;quot; so it&amp;#039;s still fair-use, others say it&amp;#039;s akin to [[wikipedia:Tracing_(art)|tracing images]] (this applies mostly to image models, though the analogy can work for text models).{{Citation needed|reason=too many opinions}}&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Most, if not all, of the data used for training is copied indiscriminately, without even checking licenses or any copyright terms.{{Citation needed}} This is very controversial. Some people argue that it is &amp;quot;fair use&amp;quot; because AI systems learn in ways similar to animal and human brains, others claim it&amp;#039;s more like [[wikt:parroting|a parrot learning phrases]], others claim that it&amp;#039;s &amp;quot;transformative&amp;quot; so it&amp;#039;s still fair-use, others say it&amp;#039;s akin to [[wikipedia:Tracing_(art)|tracing images]] (this applies mostly to image models, though the analogy can work for text models).{{Citation needed|reason=too many opinions}}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l14&quot;&gt;Line 14:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 14:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Some people request that, at the very least, the sources of the training data must be publicly disclosed, for the sake of [[wikipedia:Transparency_(behavior)|transparency]] and [[wikipedia:Attribution_(copyright)|attribution]].&amp;lt;ref&amp;gt;{{Cite web |last=Tunney |first=Justine |date=2024-08-23 |title=AI Training Shouldn&amp;#039;t Erase Authorship |url=https://justine.lol/history/ |access-date=2026-04-26}}&amp;lt;/ref&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Some people request that, at the very least, the sources of the training data must be publicly disclosed, for the sake of [[wikipedia:Transparency_(behavior)|transparency]] and [[wikipedia:Attribution_(copyright)|attribution]].&amp;lt;ref&amp;gt;{{Cite web |last=Tunney |first=Justine |date=2024-08-23 |title=AI Training Shouldn&amp;#039;t Erase Authorship |url=https://justine.lol/history/ |access-date=2026-04-26}}&amp;lt;/ref&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;=== &lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Ecosystem damage &lt;/del&gt;===&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;===&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Energy use&lt;/ins&gt;===&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;TO&lt;/del&gt;-&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;DO&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;While [[Self&lt;/ins&gt;-&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;hosting|self-hosted]] models can be trained with a single consumer-grade GPU, data-centers with hundreds or thousands of GPUs (known for being more power-hungry than CPUs) are used to train corporate-grade (or &quot;enterprise&quot;) models. This can worsen [[wikipedia:Climate_change|climate change]].&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;=== Bandwidth abuse ===&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Massive data needs massive bandwidth. Scraping web-pages across the entire internet requires sending millions of requests to all known servers. Some AI companies go as far as to &#039;&#039;repeatedly&#039;&#039; send requests for the same content (or several revisions of the same content) as frequent bursts in short intervals, which is indistinguishable from [[wikipedia:Denial-of-service_attack|DDoS attacks]].&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Examples==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Examples==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff:1.41:old-52382:rev-52383:php=table --&gt;
&lt;/table&gt;</summary>
		<author><name>Rudxain</name></author>
	</entry>
	<entry>
		<id>https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52382&amp;oldid=prev</id>
		<title>Rudxain: move notice to lone line, allowing users to edit intro</title>
		<link rel="alternate" type="text/html" href="https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52382&amp;oldid=prev"/>
		<updated>2026-04-27T02:23:10Z</updated>

		<summary type="html">&lt;p&gt;move notice to lone line, allowing users to edit intro&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 02:23, 27 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;{{Incomplete}}&#039;&#039;&#039;AI training&#039;&#039;&#039; is a process by which data is fed into an AI model, in order to adjust its weights. This makes the output of the model closely match that of its input.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;{{Incomplete}}&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&#039;&#039;&#039;AI training&#039;&#039;&#039; is a process by which data is fed into an AI model, in order to adjust its weights. This makes the output of the model closely match that of its input.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==How it works==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==How it works==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff:1.41:old-52381:rev-52382:php=table --&gt;
&lt;/table&gt;</summary>
		<author><name>Rudxain</name></author>
	</entry>
	<entry>
		<id>https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52381&amp;oldid=prev</id>
		<title>Rudxain: populate how-works and why-prob</title>
		<link rel="alternate" type="text/html" href="https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=52381&amp;oldid=prev"/>
		<updated>2026-04-27T02:22:28Z</updated>

		<summary type="html">&lt;p&gt;populate how-works and why-prob&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 02:22, 27 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l1&quot;&gt;Line 1:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 1:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;{{Incomplete}}&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;{{Incomplete}}&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&#039;&#039;&#039;AI training&#039;&#039;&#039; is a process by which data is fed into an AI model, in order to adjust its weights. This makes the output of the model closely match that of its input.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;{{Ph-T-Int}}&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==How it works==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==How it works==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;{{Ph&lt;/del&gt;-&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;T&lt;/del&gt;-&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;HIW}}&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;There are several ways to implement AI, and even more ways to train them, the most well&lt;/ins&gt;-&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;known being [[wikipedia:Backpropagation|backpropagation]]. With respect to the data&lt;/ins&gt;-&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;set, LLMs must be trained on massive amounts of data, which is a task that&#039;s only feasible via automation. This is in contrast to curated data-sets, in which both the data and the training is done in a more carefully controlled environment. Automated training on massive data-sets is typically done using internet web-sites as sources. The process of scraping is similar to how web-[[wikipedia:Search_engine|search-engines]] index and [[wikipedia:Cache_(computing)|cache]] pages.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Why it is a problem==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Why it is a problem==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;{{&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Ph&lt;/del&gt;-&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;T&lt;/del&gt;-&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;WIIAP&lt;/del&gt;}}&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;=== Intellectual property laundering ===&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Most, if not all, of the data used for training is copied indiscriminately, without even checking licenses or any copyright terms.&lt;/ins&gt;{{&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Citation needed}} This is very controversial. Some people argue that it is &quot;fair use&quot; because AI systems learn in ways similar to animal and human brains, others claim it&#039;s more like [[wikt:parroting|a parrot learning phrases]], others claim that it&#039;s &quot;transformative&quot; so it&#039;s still fair&lt;/ins&gt;-&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;use, others say it&#039;s akin to [[wikipedia:Tracing_(art)|tracing images]] (this applies mostly to image models, though the analogy can work for text models).{{Citation needed|reason=too many opinions}}&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Ultimately, it depends a lot on the technical details of how each model works, so none of those arguments are universal.&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Some people request that, at the very least, the sources of the training data must be publicly disclosed, for the sake of [[wikipedia:Transparency_(behavior)|transparency]] and [[wikipedia:Attribution_(copyright)|attribution]].&amp;lt;ref&amp;gt;{{Cite web |last=Tunney |first=Justine |date=2024-08-23 |title=AI Training Shouldn&#039;t Erase Authorship |url=https://justine.lol/history/ |access-date=2026&lt;/ins&gt;-&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;04-26&lt;/ins&gt;}}&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;/ref&amp;gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt; &lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;=== Ecosystem damage ===&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;TO-DO&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Examples==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==Examples==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff:1.41:old-51006:rev-52381:php=table --&gt;
&lt;/table&gt;</summary>
		<author><name>Rudxain</name></author>
	</entry>
	<entry>
		<id>https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=51006&amp;oldid=prev</id>
		<title>Rudxain: re-link Anubis: GH -&gt; WP</title>
		<link rel="alternate" type="text/html" href="https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=51006&amp;oldid=prev"/>
		<updated>2026-04-13T04:38:19Z</updated>

		<summary type="html">&lt;p&gt;re-link Anubis: GH -&amp;gt; WP&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 04:38, 13 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l31&quot;&gt;Line 31:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 31:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;To protect against unethical crawlers, due to concerns of both intellectual property and service disruption, websites adopt practices that affect the experience of real users:&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;To protect against unethical crawlers, due to concerns of both intellectual property and service disruption, websites adopt practices that affect the experience of real users:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*&#039;&#039;&#039;Bot check walls&#039;&#039;&#039;: The user may be required to pass a security check &quot;wall&quot;. While usually automatic for the user, this can affect legitimate bots. When a website protection service such as [[Cloudflare]] is not confident as to whether the visitor is legitimate, it may present a [[CAPTCHA]] to be manually filled out. An example is &quot;Google Sorry&quot;, a CAPTCHA wall frequently seen when using Google Search via a VPN. An example that&#039;s popular in the FOSS community is [&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;https&lt;/del&gt;:&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;//github.com/TecharoHQ/anubis &lt;/del&gt;Anubis].&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*&#039;&#039;&#039;Bot check walls&#039;&#039;&#039;: The user may be required to pass a security check &quot;wall&quot;. While usually automatic for the user, this can affect legitimate bots. When a website protection service such as [[Cloudflare]] is not confident as to whether the visitor is legitimate, it may present a [[CAPTCHA]] to be manually filled out. An example is &quot;Google Sorry&quot;, a CAPTCHA wall frequently seen when using Google Search via a VPN. An example that&#039;s popular in the FOSS community is [&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[wikipedia&lt;/ins&gt;:&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;Anubis_(software)|&lt;/ins&gt;Anubis&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;]&lt;/ins&gt;].&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*&amp;#039;&amp;#039;&amp;#039;Login walls&amp;#039;&amp;#039;&amp;#039;: Should bots be found to pass CAPTCHA walls, the website may advance to requiring logging in to view content. A major recent example of this is [[YouTube]]&amp;#039;s &amp;quot;Sign in to confirm you&amp;#039;re not a bot&amp;quot; messages.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*&amp;#039;&amp;#039;&amp;#039;Login walls&amp;#039;&amp;#039;&amp;#039;: Should bots be found to pass CAPTCHA walls, the website may advance to requiring logging in to view content. A major recent example of this is [[YouTube]]&amp;#039;s &amp;quot;Sign in to confirm you&amp;#039;re not a bot&amp;quot; messages.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*&amp;#039;&amp;#039;&amp;#039;[[JavaScript]] requirement&amp;#039;&amp;#039;&amp;#039;: Most websites do not need JavaScript to deliver their content. However, as many scrapers expect content to be found directly in the HTML, it is often an easy workaround to use JavaScript to &amp;quot;insert&amp;quot; the content after the page has loaded. This may reduce the responsiveness of the website, increasing points of failure, and preventing security-conscious users who disable JavaScript from viewing the website.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*&amp;#039;&amp;#039;&amp;#039;[[JavaScript]] requirement&amp;#039;&amp;#039;&amp;#039;: Most websites do not need JavaScript to deliver their content. However, as many scrapers expect content to be found directly in the HTML, it is often an easy workaround to use JavaScript to &amp;quot;insert&amp;quot; the content after the page has loaded. This may reduce the responsiveness of the website, increasing points of failure, and preventing security-conscious users who disable JavaScript from viewing the website.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff:1.41:old-39885:rev-51006:php=table --&gt;
&lt;/table&gt;</summary>
		<author><name>Rudxain</name></author>
	</entry>
	<entry>
		<id>https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=39885&amp;oldid=prev</id>
		<title>Rudxain: linkify: Chrome, Windows</title>
		<link rel="alternate" type="text/html" href="https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=39885&amp;oldid=prev"/>
		<updated>2026-02-26T05:46:00Z</updated>

		<summary type="html">&lt;p&gt;linkify: Chrome, Windows&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 05:46, 26 February 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l24&quot;&gt;Line 24:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 24:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Unethical [[Artificial_intelligence|AI]] scraper bots do not follow &amp;lt;code&amp;gt;robots.txt&amp;lt;/code&amp;gt; - in fact, they may not even request this file at all. They typically completely ignore it, instead opting to start from an entry point such as the root home page (&amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt;), working its way through an exponentially growing list of links as it finds them, with little to no delay between requests. The bots use false User-Agent header strings that would correspond to real web browsers on desktop or mobile operating systems - blocking them would also block legitimate users, or at least legitimate users on VPNs.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Unethical [[Artificial_intelligence|AI]] scraper bots do not follow &amp;lt;code&amp;gt;robots.txt&amp;lt;/code&amp;gt; - in fact, they may not even request this file at all. They typically completely ignore it, instead opting to start from an entry point such as the root home page (&amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt;), working its way through an exponentially growing list of links as it finds them, with little to no delay between requests. The bots use false User-Agent header strings that would correspond to real web browsers on desktop or mobile operating systems - blocking them would also block legitimate users, or at least legitimate users on VPNs.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Some AI services opt to use separate User-Agent strings, potentially also ignoring &amp;lt;code&amp;gt;robots.txt&amp;lt;/code&amp;gt;, when a request is made through user command rather than as part of model training. For example, ChatGPT identifies itself as &amp;lt;code&amp;gt;ChatGPT-User&amp;lt;/code&amp;gt; rather than its standard &amp;lt;code&amp;gt;OpenAI&amp;lt;/code&amp;gt; when it uses the &quot;search the web&quot; command - even if searching the web was an automatic decision. In a less favorable example, Perplexity AI in this same situation falsely identifies as a standard Chrome web browser running on Windows. AI companies defend this under the belief that they are not a &quot;spider&quot;, but rather a &quot;user agent&quot; (like a web browser), when called upon by a user&#039;s request.&amp;lt;ref name=&quot;perplexity-aws&quot; /&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Some AI services opt to use separate User-Agent strings, potentially also ignoring &amp;lt;code&amp;gt;robots.txt&amp;lt;/code&amp;gt;, when a request is made through user command rather than as part of model training. For example, ChatGPT identifies itself as &amp;lt;code&amp;gt;ChatGPT-User&amp;lt;/code&amp;gt; rather than its standard &amp;lt;code&amp;gt;OpenAI&amp;lt;/code&amp;gt; when it uses the &quot;search the web&quot; command - even if searching the web was an automatic decision. In a less favorable example, Perplexity AI in this same situation falsely identifies as a standard &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[Google_Chrome|&lt;/ins&gt;Chrome&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;]] &lt;/ins&gt;web browser running on &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[Microsoft_Windows|&lt;/ins&gt;Windows&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;]]&lt;/ins&gt;. AI companies defend this under the belief that they are not a &quot;spider&quot;, but rather a &quot;user agent&quot; (like a web browser), when called upon by a user&#039;s request.&amp;lt;ref name=&quot;perplexity-aws&quot; /&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Less legitimate bots use a wide distribution of IP addresses, further reducing options for the website to protect itself. This is in a clear attempt to bypass IP-based request throttling and rate limiting the website may implement. They are also known to ignore HTTP response status codes that indicate a server error ([[wikipedia:HTTP status code#5xx server errors|5xx]]), or warnings that the client needs to slow down ([[wikipedia:HTTP status code#429|429 Too Many Requests]]) or has been entirely blocked ([[wikipedia:HTTP status code#403|403 Forbidden]]).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Less legitimate bots use a wide distribution of IP addresses, further reducing options for the website to protect itself. This is in a clear attempt to bypass IP-based request throttling and rate limiting the website may implement. They are also known to ignore HTTP response status codes that indicate a server error ([[wikipedia:HTTP status code#5xx server errors|5xx]]), or warnings that the client needs to slow down ([[wikipedia:HTTP status code#429|429 Too Many Requests]]) or has been entirely blocked ([[wikipedia:HTTP status code#403|403 Forbidden]]).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff:1.41:old-39884:rev-39885:php=table --&gt;
&lt;/table&gt;</summary>
		<author><name>Rudxain</name></author>
	</entry>
	<entry>
		<id>https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=39884&amp;oldid=prev</id>
		<title>Rudxain: style `robots.txt` as code</title>
		<link rel="alternate" type="text/html" href="https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=39884&amp;oldid=prev"/>
		<updated>2026-02-26T05:38:25Z</updated>

		<summary type="html">&lt;p&gt;style `robots.txt` as code&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 05:38, 26 February 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l11&quot;&gt;Line 11:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 11:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While &amp;quot;mainstream&amp;quot; companies such as [[OpenAI]], [[Anthropic]], and [[Meta]] appear to correctly follow industry-standard practice for web crawlers, others ignore them (such as [[wikipedia:Alibaba_Group|Alibaba]]&amp;lt;ref&amp;gt;{{Cite news |last=Venerandi |first=Niccolò |title=FOSS infrastructure is under attack by AI companies |url=https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies |access-date=2026-02-23 |work=LibreNews |archive-url=http://web.archive.org/web/20260217195639/https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/ |archive-date=17 Feb 2026}}&amp;lt;/ref&amp;gt;), causing [[wikipedia:Denial-of-service attack|distributed denial of service attacks]] which damage access to freely-accessible websites. This is particularly an issue for websites that are large or contain many dynamic links.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While &amp;quot;mainstream&amp;quot; companies such as [[OpenAI]], [[Anthropic]], and [[Meta]] appear to correctly follow industry-standard practice for web crawlers, others ignore them (such as [[wikipedia:Alibaba_Group|Alibaba]]&amp;lt;ref&amp;gt;{{Cite news |last=Venerandi |first=Niccolò |title=FOSS infrastructure is under attack by AI companies |url=https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies |access-date=2026-02-23 |work=LibreNews |archive-url=http://web.archive.org/web/20260217195639/https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/ |archive-date=17 Feb 2026}}&amp;lt;/ref&amp;gt;), causing [[wikipedia:Denial-of-service attack|distributed denial of service attacks]] which damage access to freely-accessible websites. This is particularly an issue for websites that are large or contain many dynamic links.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Ethical website scrapers, known as &quot;spiders&quot; that crawl the web, follow a certain set of minimum guidelines. Specifically, they follow [[wikipedia:robots.txt|robots.txt]], a text file found at the root of a domain that indicates:&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Ethical website scrapers, known as &quot;spiders&quot; that crawl the web, follow a certain set of minimum guidelines. Specifically, they follow &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;[[wikipedia:robots.txt|robots.txt]]&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;/code&amp;gt;&lt;/ins&gt;, a text file found at the root of a domain that indicates:&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*Paths bots are allowed to index&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;*Paths bots are allowed to index&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l20&quot;&gt;Line 20:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 20:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;These rules are typically configured for all bots, with minor adjustments made to individual bots as needed. Additionally, specific web pages may use the [[wikipedia:noindex|robots meta tag]] to control use of their output.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;These rules are typically configured for all bots, with minor adjustments made to individual bots as needed. Additionally, specific web pages may use the [[wikipedia:noindex|robots meta tag]] to control use of their output.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While it is good practice for a bot to respect robots.txt, there is no requirement for it, and there is no punishment for not following a website&#039;s wishes. It is additionally standard practice, but in no way enforced, that bots use a [[wikipedia:User-Agent header|User-Agent header]] to uniquely identify itself. This allows a website operator to observe a bot&#039;s traffic patterns, potentially blocking the bot outright if its scraping is not desirable. The header also typically contains a URL or email address that can be used to contact the operator in case of anomalies observed in its traffic.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While it is good practice for a bot to respect &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;robots.txt&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;/code&amp;gt;&lt;/ins&gt;, there is no requirement for it, and there is no punishment for not following a website&#039;s wishes. It is additionally standard practice, but in no way enforced, that bots use a [[wikipedia:User-Agent header|User-Agent header]] to uniquely identify itself. This allows a website operator to observe a bot&#039;s traffic patterns, potentially blocking the bot outright if its scraping is not desirable. The header also typically contains a URL or email address that can be used to contact the operator in case of anomalies observed in its traffic.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Unethical [[Artificial_intelligence|AI]] scraper bots do not follow robots.txt - in fact, they may not even request this file at all. They typically completely ignore it, instead opting to start from an entry point such as the root home page (&amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt;), working its way through an exponentially growing list of links as it finds them, with little to no delay between requests. The bots use false User-Agent header strings that would correspond to real web browsers on desktop or mobile operating systems - blocking them would also block legitimate users, or at least legitimate users on VPNs.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Unethical [[Artificial_intelligence|AI]] scraper bots do not follow &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;robots.txt&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;/code&amp;gt; &lt;/ins&gt;- in fact, they may not even request this file at all. They typically completely ignore it, instead opting to start from an entry point such as the root home page (&amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt;), working its way through an exponentially growing list of links as it finds them, with little to no delay between requests. The bots use false User-Agent header strings that would correspond to real web browsers on desktop or mobile operating systems - blocking them would also block legitimate users, or at least legitimate users on VPNs.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Some AI services opt to use separate User-Agent strings, potentially also ignoring robots.txt, when a request is made through user command rather than as part of model training. For example, ChatGPT identifies itself as &amp;lt;code&amp;gt;ChatGPT-User&amp;lt;/code&amp;gt; rather than its standard &amp;lt;code&amp;gt;OpenAI&amp;lt;/code&amp;gt; when it uses the &quot;search the web&quot; command - even if searching the web was an automatic decision. In a less favorable example, Perplexity AI in this same situation falsely identifies as a standard Chrome web browser running on Windows. AI companies defend this under the belief that they are not a &quot;spider&quot;, but rather a &quot;user agent&quot; (like a web browser), when called upon by a user&#039;s request.&amp;lt;ref name=&quot;perplexity-aws&quot; /&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Some AI services opt to use separate User-Agent strings, potentially also ignoring &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;code&amp;gt;&lt;/ins&gt;robots.txt&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;/code&amp;gt;&lt;/ins&gt;, when a request is made through user command rather than as part of model training. For example, ChatGPT identifies itself as &amp;lt;code&amp;gt;ChatGPT-User&amp;lt;/code&amp;gt; rather than its standard &amp;lt;code&amp;gt;OpenAI&amp;lt;/code&amp;gt; when it uses the &quot;search the web&quot; command - even if searching the web was an automatic decision. In a less favorable example, Perplexity AI in this same situation falsely identifies as a standard Chrome web browser running on Windows. AI companies defend this under the belief that they are not a &quot;spider&quot;, but rather a &quot;user agent&quot; (like a web browser), when called upon by a user&#039;s request.&amp;lt;ref name=&quot;perplexity-aws&quot; /&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Less legitimate bots use a wide distribution of IP addresses, further reducing options for the website to protect itself. This is in a clear attempt to bypass IP-based request throttling and rate limiting the website may implement. They are also known to ignore HTTP response status codes that indicate a server error ([[wikipedia:HTTP status code#5xx server errors|5xx]]), or warnings that the client needs to slow down ([[wikipedia:HTTP status code#429|429 Too Many Requests]]) or has been entirely blocked ([[wikipedia:HTTP status code#403|403 Forbidden]]).&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Less legitimate bots use a wide distribution of IP addresses, further reducing options for the website to protect itself. This is in a clear attempt to bypass IP-based request throttling and rate limiting the website may implement. They are also known to ignore HTTP response status codes that indicate a server error ([[wikipedia:HTTP status code#5xx server errors|5xx]]), or warnings that the client needs to slow down ([[wikipedia:HTTP status code#429|429 Too Many Requests]]) or has been entirely blocked ([[wikipedia:HTTP status code#403|403 Forbidden]]).&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff:1.41:old-39883:rev-39884:php=table --&gt;
&lt;/table&gt;</summary>
		<author><name>Rudxain</name></author>
	</entry>
	<entry>
		<id>https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=39883&amp;oldid=prev</id>
		<title>Rudxain: link AI</title>
		<link rel="alternate" type="text/html" href="https://consumerrights.wiki/index.php?title=Artificial_intelligence/training&amp;diff=39883&amp;oldid=prev"/>
		<updated>2026-02-26T05:33:36Z</updated>

		<summary type="html">&lt;p&gt;link AI&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 05:33, 26 February 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l22&quot;&gt;Line 22:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 22:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While it is good practice for a bot to respect robots.txt, there is no requirement for it, and there is no punishment for not following a website&amp;#039;s wishes. It is additionally standard practice, but in no way enforced, that bots use a [[wikipedia:User-Agent header|User-Agent header]] to uniquely identify itself. This allows a website operator to observe a bot&amp;#039;s traffic patterns, potentially blocking the bot outright if its scraping is not desirable. The header also typically contains a URL or email address that can be used to contact the operator in case of anomalies observed in its traffic.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;While it is good practice for a bot to respect robots.txt, there is no requirement for it, and there is no punishment for not following a website&amp;#039;s wishes. It is additionally standard practice, but in no way enforced, that bots use a [[wikipedia:User-Agent header|User-Agent header]] to uniquely identify itself. This allows a website operator to observe a bot&amp;#039;s traffic patterns, potentially blocking the bot outright if its scraping is not desirable. The header also typically contains a URL or email address that can be used to contact the operator in case of anomalies observed in its traffic.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Unethical AI scraper bots do not follow robots.txt - in fact, they may not even request this file at all. They typically completely ignore it, instead opting to start from an entry point such as the root home page (&amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt;), working its way through an exponentially growing list of links as it finds them, with little to no delay between requests. The bots use false User-Agent header strings that would correspond to real web browsers on desktop or mobile operating systems - blocking them would also block legitimate users, or at least legitimate users on VPNs.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Unethical &lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[Artificial_intelligence|&lt;/ins&gt;AI&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;]] &lt;/ins&gt;scraper bots do not follow robots.txt - in fact, they may not even request this file at all. They typically completely ignore it, instead opting to start from an entry point such as the root home page (&amp;lt;code&amp;gt;/&amp;lt;/code&amp;gt;), working its way through an exponentially growing list of links as it finds them, with little to no delay between requests. The bots use false User-Agent header strings that would correspond to real web browsers on desktop or mobile operating systems - blocking them would also block legitimate users, or at least legitimate users on VPNs.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Some AI services opt to use separate User-Agent strings, potentially also ignoring robots.txt, when a request is made through user command rather than as part of model training. For example, ChatGPT identifies itself as &amp;lt;code&amp;gt;ChatGPT-User&amp;lt;/code&amp;gt; rather than its standard &amp;lt;code&amp;gt;OpenAI&amp;lt;/code&amp;gt; when it uses the &amp;quot;search the web&amp;quot; command - even if searching the web was an automatic decision. In a less favorable example, Perplexity AI in this same situation falsely identifies as a standard Chrome web browser running on Windows. AI companies defend this under the belief that they are not a &amp;quot;spider&amp;quot;, but rather a &amp;quot;user agent&amp;quot; (like a web browser), when called upon by a user&amp;#039;s request.&amp;lt;ref name=&amp;quot;perplexity-aws&amp;quot; /&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Some AI services opt to use separate User-Agent strings, potentially also ignoring robots.txt, when a request is made through user command rather than as part of model training. For example, ChatGPT identifies itself as &amp;lt;code&amp;gt;ChatGPT-User&amp;lt;/code&amp;gt; rather than its standard &amp;lt;code&amp;gt;OpenAI&amp;lt;/code&amp;gt; when it uses the &amp;quot;search the web&amp;quot; command - even if searching the web was an automatic decision. In a less favorable example, Perplexity AI in this same situation falsely identifies as a standard Chrome web browser running on Windows. AI companies defend this under the belief that they are not a &amp;quot;spider&amp;quot;, but rather a &amp;quot;user agent&amp;quot; (like a web browser), when called upon by a user&amp;#039;s request.&amp;lt;ref name=&amp;quot;perplexity-aws&amp;quot; /&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l91&quot;&gt;Line 91:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 91:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;On 17 March 2025, the Git source code host SourceHut announced that the service was being disrupted by large language model crawlers. Mitigations deployed to reduce disruption involved requiring login for some areas of the service, and blocking IP ranges of cloud providers, affecting legitimate use of the website by its users.&amp;lt;ref&amp;gt;{{Cite web |date=17 Mar 2025 |title=LLM crawlers continue to DDoS SourceHut |url=https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/ |website=sr.ht status |url-status=live |archive-url=http://web.archive.org/web/20251220125852/https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/ |archive-date=20 Dec 2025}}&amp;lt;/ref&amp;gt; In response to the event, SourceHut founder Drew DeVault wrote a blog post entitled &amp;quot;[https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html Please stop externalizing your costs directly into my face]&amp;quot;, discussing his frustrations with having ongoing and ever-adapting attacks that must be addressed in a timely fashion to reduce disruption to legitimate SourceHut users. DeVault estimates that between &amp;quot;20-100%&amp;quot; of his time is now spent addressing such attacks.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;On 17 March 2025, the Git source code host SourceHut announced that the service was being disrupted by large language model crawlers. Mitigations deployed to reduce disruption involved requiring login for some areas of the service, and blocking IP ranges of cloud providers, affecting legitimate use of the website by its users.&amp;lt;ref&amp;gt;{{Cite web |date=17 Mar 2025 |title=LLM crawlers continue to DDoS SourceHut |url=https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/ |website=sr.ht status |url-status=live |archive-url=http://web.archive.org/web/20251220125852/https://status.sr.ht/issues/2025-03-17-git.sr.ht-llms/ |archive-date=20 Dec 2025}}&amp;lt;/ref&amp;gt; In response to the event, SourceHut founder Drew DeVault wrote a blog post entitled &amp;quot;[https://drewdevault.com/2025/03/17/2025-03-17-Stop-externalizing-your-costs-on-me.html Please stop externalizing your costs directly into my face]&amp;quot;, discussing his frustrations with having ongoing and ever-adapting attacks that must be addressed in a timely fashion to reduce disruption to legitimate SourceHut users. DeVault estimates that between &amp;quot;20-100%&amp;quot; of his time is now spent addressing such attacks.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==References==&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;==References==&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key wiki:diff:1.41:old-39328:rev-39883:php=table --&gt;
&lt;/table&gt;</summary>
		<author><name>Rudxain</name></author>
	</entry>
</feed>