OpenAI: Difference between revisions
→Funding of the Parents & Kids Safe AI Act and creation of a child safety organization (2026): added citation details & archived links |
ChaoticDev (talk | contribs) |
||
| Line 26: | Line 26: | ||
===Web Crawlers ignoring robots.txt (2025)=== | ===Web Crawlers ignoring robots.txt (2025)=== | ||
In 2025, Jonathan Bailey from PlagiarismToday posted an article going into how ChatGPTs web crawlers were ignoring the sites Robots.txt file.<ref>https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/ ([http://web.archive.org/web/20260106080839/https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/ Archived])</ref> PlaigarismToday had blocked OpenAI's web crawlers in August of 2023, yet the latest ChatGPT model at the time provided data from articles that were posted the day before on the website, even though OpenAI wasn't supposed to be scraping | In 2025, Jonathan Bailey from PlagiarismToday posted an article going into how ChatGPTs web crawlers were ignoring the sites Robots.txt file.<ref>https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/ ([http://web.archive.org/web/20260106080839/https://www.plagiarismtoday.com/2025/07/23/chatgpt-ignores-robots-txt-rehashes-my-column/ Archived])</ref> PlaigarismToday had blocked OpenAI's web crawlers in August of 2023, yet the latest ChatGPT model at the time provided data from articles that were posted the day before on the website, even though OpenAI wasn't supposed to be scraping those webpages. This can be problematic for smaller websites, due to OpenAI's aggressive approach to web crawling, with their crawlers reportedly in a single week sending in more than 29 thousand requests to a wiki known as The Cutting Room Floor. | ||
===ChatGPT Atlas and prompt-injection vulnerability (2025)=== | ===ChatGPT Atlas and prompt-injection vulnerability (2025)=== | ||