Generative artificial intelligence

Revision as of 07:52, 30 March 2025 by 96.232.253.207 (talk) (Apply some form of tone fixes (hopefully it's better) unsure if it's just github or all applications that run Copilot like Word, notepad etc... something for future notice)

This article is a stub. You can help by expanding it.

A moderator needs to check the page before this notice can be removed. Visit the noticeboard or the #appeals channel in either Zulip or Discord to request removal.
More info ▼

An article may be flagged as a stub when it is missing major elements needed to make it useful to a reader. You can help by adding missing sections, verifiable sources, relevant company policies and communications, etc. to make the article more complete.

This article uses tone or wording inconsistent with the editorial guidelines.

A moderator needs to check the page before this notice can be removed. Visit the noticeboard or the #appeals channel in either Zulip or Discord to request removal.
More info ▼

The wiki's voice should remain neutral and avoid loaded language or unnecessary offence. Direct attacks on individuals or companies are not permitted; malice may only be attributed through quotation and citation, never in the wiki's voice. In theme articles argumentation is allowed but should be clear, formal, and never inflammatory.

Generative AI, also referred to as GenAI or simply AI, is a program whose existence is to generate pieces of media based on a simple prompt (e.g. "How long do I heat popcorn for in the microwave?" or "bowl of buttery popcorn, realistic, artstation, pretty") with various and random results. GenAI over its currently short existence being accessible to the public has garnered large amounts of concern across the various fields it has been applied to.

General controversies surrounding generative AI

Controversy Brief Description Related Article(s)/Section(s)
Training data collected without consent Various platforms have scraped data ranging within the petabytes concerning content created by users and potentially owned by companies, without first obtaining an adequate license to use this data. This has gone so far as to not even request consent or even notifying users in advance that their content was used to train AI-powered tools.
Replacing skilled workers with AI Due to its generalized nature, jobs across fields from digital art to writing and programming have had experienced staff replaced by lesser-paid (and often lesser-experienced) employees who would be tasked to use generative tools to do their work. To remain relevant to the wiki's purpose, the usage leads to the detriment of product quality for consumers, such as representatives replaced with chatbots, or products being sold by companies use poorly-generated content that may harm the consumer[1].

Specific controversies involving generative AI

Reddit training AI on posts

In late 2024, Reddit announced the release of 'Reddit Answers,' a large language model (LLM) that was publicly stated[2] to use content created by users to train the tool, without requiring prior consent or prior public notice.

DeviantArt DreamUp

While more speculative, it is reasonable for users to assume[3] that when DeviantArt initially automatically opted all users into allowing their work to be training data for generative AI[4][5], that all content uploaded to DeviantArt was used as training data for their DreamUp tool, however according to statements from DeviantArt CEO Moti Levy[6], DeviantArt did not plan or intend to train their tool based on user-generated works and that any user-generated works that were used in their model, were introduced by StabilityAI. Regardless, the introduction of DreamUp to the art sharing platform has both stirred controversy on the platform[7], and also fractured the platform into 2 parties[8], those for generative AI (typically those who hold newer accounts) and those against (typically users who have existed on the platform for far longer.) Due to the introduction of DreamUp, the platform has been cluttered by AI generated images, and staff have historically, frequently, and intentionally featured multiple users who exclusively upload GenAI content[9][10][11] or post content that uses generative content as a base[12], with a majority of featured creators being ones who nearly or exclusively upload AI generated content.

LAION-5b training database

Many users have had their content scraped by LAION to power their training database, and the only way they can opt out is via a third party[13].

STACKOVERFLOW training data

In mid 2023, StackOverflow released Overflow AI which uses all users questions and answers for their neural network that they charge enterprises for. Despite their strong stance on Generated AI Post Policy they still require users to not generate AI responses despite scraping your human content for profit. As of now there still is no official way to opt out officially other than deleting all your posts/topics manually before permanently deleting your account. But even then their Get Out Clause still allows them to use it even after you thought you deleted it. They have completely burned their bridge of privacy and user trust for the platform and brand and the users content that the site was built by.

Microsofts Github Copilot training on free user repos

According to this Copilot FAQ topic it will not train on people that are on PRO or ENTERPRISE plans but says nothing about the people using Github under a free account. With that being said we can assume that any and all accounts that copilot is in currently is being used to further train the model. As of now there is no known way of opting out of this besides privatizing your repo (from context it seems only public ones are targeted for now) making the reach of your open source project limited to individuals you share it with. In some cases this has eroded community Users Trust with the platform that is built around developers content.

References