Artificial intelligence/training: Difference between revisions

← Older edit Newer edit →

VisualWikitext

@@ Line 1: / Line 1: @@
-{{Incomplete}}
+{{Incomplete}}'''AI training''' is a process by which data is fed into an AI model, in order to adjust its weights. This makes the output of the model closely match that of its input.
-{{Ph-T-Int}}
 ==How it works==
-{{Ph-T-HIW}}
+There are several ways to implement AI, and even more ways to train them, the most well-known being [[wikipedia:Backpropagation|backpropagation]]. With respect to the data-set, LLMs must be trained on massive amounts of data, which is a task that's only feasible via automation. This is in contrast to curated data-sets, in which both the data and the training is done in a more carefully controlled environment. Automated training on massive data-sets is typically done using internet web-sites as sources. The process of scraping is similar to how web-[[wikipedia:Search_engine|search-engines]] index and [[wikipedia:Cache_(computing)|cache]] pages.
 ==Why it is a problem==
-{{Ph-T-WIIAP}}
+=== Intellectual property laundering ===
+Most, if not all, of the data used for training is copied indiscriminately, without even checking licenses or any copyright terms.{{Citation needed}} This is very controversial. Some people argue that it is "fair use" because AI systems learn in ways similar to animal and human brains, others claim it's more like [[wikt:parroting|a parrot learning phrases]], others claim that it's "transformative" so it's still fair-use, others say it's akin to [[wikipedia:Tracing_(art)|tracing images]] (this applies mostly to image models, though the analogy can work for text models).{{Citation needed|reason=too many opinions}}
+Ultimately, it depends a lot on the technical details of how each model works, so none of those arguments are universal.
+Some people request that, at the very least, the sources of the training data must be publicly disclosed, for the sake of [[wikipedia:Transparency_(behavior)|transparency]] and [[wikipedia:Attribution_(copyright)|attribution]].<ref>{{Cite web |last=Tunney |first=Justine |date=2024-08-23 |title=AI Training Shouldn't Erase Authorship |url=https://justine.lol/history/ |access-date=2026-04-26}}</ref>
+=== Ecosystem damage ===
+TO-DO
 ==Examples==