OpenAI: Difference between revisions
ChaoticDev (talk | contribs) Added in a section about the companies crawling habits and added in a missing reference. |
SinexTitan (talk | contribs) →See also: Anthropic |
||
| Line 25: | Line 25: | ||
This is a list of all consumer-protection incidents this company is involved in. Any incidents not mentioned here can be found in the [[:Category:{{FULLPAGENAME}}|{{PAGENAME}} category]]. | This is a list of all consumer-protection incidents this company is involved in. Any incidents not mentioned here can be found in the [[:Category:{{FULLPAGENAME}}|{{PAGENAME}} category]]. | ||
=== Web Crawlers ignoring robots.txt (2025) === | ===Web Crawlers ignoring robots.txt (2025)=== | ||
In 2025, Jonathan Bailey from PlagiarismToday posted an article going into how ChatGPTs web crawlers were ignoring the sites Robots.txt file.<ref>https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/</ref> PlaigarismToday had blocked OpenAI's web crawlers in August of 2023, yet the latest ChatGPT model at the time provided data from articles that were posted the day before on the website, even though OpenAI wasn't supposed to be scraping these web pages. This can be problematic for smaller websites, due to OpenAI's aggressive approach to web crawling, with their crawlers reportedly in a single week sending in more than 29 thousand requests to a wiki known as The Cutting Room Floor.<ref>https://discord.com/channels/386543982399979531/386553674932944899/1386485774220001310 (Message link from The Cutting Room Floor's official Discord server)</ref> | In 2025, Jonathan Bailey from PlagiarismToday posted an article going into how ChatGPTs web crawlers were ignoring the sites Robots.txt file.<ref>https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/</ref> PlaigarismToday had blocked OpenAI's web crawlers in August of 2023, yet the latest ChatGPT model at the time provided data from articles that were posted the day before on the website, even though OpenAI wasn't supposed to be scraping these web pages. This can be problematic for smaller websites, due to OpenAI's aggressive approach to web crawling, with their crawlers reportedly in a single week sending in more than 29 thousand requests to a wiki known as The Cutting Room Floor.<ref>https://discord.com/channels/386543982399979531/386553674932944899/1386485774220001310 (Message link from The Cutting Room Floor's official Discord server)</ref> | ||
=== ChatGPT Atlas and prompt-injection vulnerability (2025) === | ===ChatGPT Atlas and prompt-injection vulnerability (2025)=== | ||
In 2025, Brave posted an article about vulnerabilities that have agentic web browsers, such as ChatGPT Atlas, that consists of adding hidden malicious prompts in files, text or another media. Those prompts, combined with weak safeguards of the AI agents, can make them to expose and leak sensitive data of the user.<ref>https://owasp.org/www-community/attacks/PromptInjection</ref> | In 2025, Brave posted an article about vulnerabilities that have agentic web browsers, such as ChatGPT Atlas, that consists of adding hidden malicious prompts in files, text or another media. Those prompts, combined with weak safeguards of the AI agents, can make them to expose and leak sensitive data of the user.<ref>https://owasp.org/www-community/attacks/PromptInjection</ref> | ||
| Line 39: | Line 39: | ||
==See also== | ==See also== | ||
* [[Anthropic]] | |||
==References== | ==References== | ||