Rudxain (talk | contribs)
m fix broken bold format
Rudxain (talk | contribs)
Line 44: Line 44:


===Scraping===
===Scraping===
Since the rise of big LLM's many brokers have started offering scraping services for companies that want more training data for their AI. and to that end a lot of headless browser agents have begun to scrape (collect a sites information provided) even with the users robots.txt provided as a common standard to tell agents not to do so. this has lead to many forums and websites that had not used JS before to start implementing CAPCHAS or Anubis to prevent increased overhead and bandwidth costs.
{{Main|Artificial intelligence/training}}
Since the rise of big LLM's many brokers<!-- link to data brokers? --> have started offering scraping services for companies that want more training data for their AI. And to that end, a lot of [[wikipedia:Headless_browser|headless browser]] agents have begun to scrape (collect a sites information provided) even with the site's <code>robots.txt</code> provided as a common standard to tell agents not to do so. This has lead to many forums and websites that had not used JS before to start implementing [[CAPTCHA|CAPTCHAS]] (or [[wikipedia:Anubis_(software)|Anubis]]), to prevent increased overhead and bandwidth costs.


==Incidents==
==Incidents==