SinexTitan (talk | contribs)
See also: Anthropic
Line 26: Line 26:


===Web Crawlers ignoring robots.txt (2025)===
===Web Crawlers ignoring robots.txt (2025)===
In 2025, Jonathan Bailey from PlagiarismToday posted an article going into how ChatGPTs web crawlers were ignoring the sites Robots.txt file.<ref>https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/</ref> PlaigarismToday had blocked OpenAI's web crawlers in August of 2023, yet the latest ChatGPT model at the time provided data from articles that were posted the day before on the website, even though OpenAI wasn't supposed to be scraping these web pages. This can be problematic for smaller websites, due to OpenAI's aggressive approach to web crawling, with their crawlers reportedly in a single week sending in more than 29 thousand requests to a wiki known as The Cutting Room Floor.<ref>https://discord.com/channels/386543982399979531/386553674932944899/1386485774220001310 (Message link from The Cutting Room Floor's official Discord server)</ref>
In 2025, Jonathan Bailey from PlagiarismToday posted an article going into how ChatGPTs web crawlers were ignoring the sites Robots.txt file.<ref>https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/</ref> PlaigarismToday had blocked OpenAI's web crawlers in August of 2023, yet the latest ChatGPT model at the time provided data from articles that were posted the day before on the website, even though OpenAI wasn't supposed to be scraping these web pages. This can be problematic for smaller websites, due to OpenAI's aggressive approach to web crawling, with their crawlers reportedly in a single week sending in more than 29 thousand requests to a wiki known as The Cutting Room Floor.


===ChatGPT Atlas and prompt-injection vulnerability (2025)===
===ChatGPT Atlas and prompt-injection vulnerability (2025)===