Artificial intelligence/training: Difference between revisions
m re-link Anubis: GH -> WP |
populate how-works and why-prob |
||
| Line 1: | Line 1: | ||
{{Incomplete}} | {{Incomplete}}'''AI training''' is a process by which data is fed into an AI model, in order to adjust its weights. This makes the output of the model closely match that of its input. | ||
==How it works== | ==How it works== | ||
There are several ways to implement AI, and even more ways to train them, the most well-known being [[wikipedia:Backpropagation|backpropagation]]. With respect to the data-set, LLMs must be trained on massive amounts of data, which is a task that's only feasible via automation. This is in contrast to curated data-sets, in which both the data and the training is done in a more carefully controlled environment. Automated training on massive data-sets is typically done using internet web-sites as sources. The process of scraping is similar to how web-[[wikipedia:Search_engine|search-engines]] index and [[wikipedia:Cache_(computing)|cache]] pages. | |||
==Why it is a problem== | ==Why it is a problem== | ||
{{ | |||
=== Intellectual property laundering === | |||
Most, if not all, of the data used for training is copied indiscriminately, without even checking licenses or any copyright terms.{{Citation needed}} This is very controversial. Some people argue that it is "fair use" because AI systems learn in ways similar to animal and human brains, others claim it's more like [[wikt:parroting|a parrot learning phrases]], others claim that it's "transformative" so it's still fair-use, others say it's akin to [[wikipedia:Tracing_(art)|tracing images]] (this applies mostly to image models, though the analogy can work for text models).{{Citation needed|reason=too many opinions}} | |||
Ultimately, it depends a lot on the technical details of how each model works, so none of those arguments are universal. | |||
Some people request that, at the very least, the sources of the training data must be publicly disclosed, for the sake of [[wikipedia:Transparency_(behavior)|transparency]] and [[wikipedia:Attribution_(copyright)|attribution]].<ref>{{Cite web |last=Tunney |first=Justine |date=2024-08-23 |title=AI Training Shouldn't Erase Authorship |url=https://justine.lol/history/ |access-date=2026-04-26}}</ref> | |||
=== Ecosystem damage === | |||
TO-DO | |||
==Examples== | ==Examples== | ||