Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Categories
Random page
Top Contributors
Recent changes
Contribute
Create a page
How to help
Wiki policy
Adapt videos to articles
Articles in need of work
Help
Frequently asked questions
Join the discord!
Help about MediaWiki
Consumer_Action_Taskforce
Search
Search
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Artificial intelligence
(section)
Page
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
Purge cache
General
What links here
Related changes
Special pages
Page information
Cargo data
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
===Effect on users=== To protect against unethical crawlers, due to concerns of both intellectual property and service disruption, websites adopt practices that affect the experience of real users: *'''Bot check walls''': The user may be required to pass a security check "wall". While usually automatic for the user, this can affect legitimate bots. When a website protection service such as [[Cloudflare]] is not confident as to whether the visitor is legitimate, it may present a CAPTCHA to be manually filled out. An example is "Google Sorry", a CAPTCHA wall frequently seen when using Google Search via a VPN. *'''Login walls''': Should bots be found to pass CAPTCHA walls, the website may advance to requiring logging in to view content. A major recent example of this is [[YouTube]]'s "Sign in to confirm you're not a bot" messages. *'''JavaScript requirement''': Most websites do not need JavaScript to deliver their content. However, as many scrapers expect content to be found directly in the HTML, it is often an easy workaround to use JavaScript to "insert" the content after the page has loaded. This may reduce the responsiveness of the website, increasing points of failure, and preventing security-conscious users who disable JavaScript from viewing the website. *'''IP address blocking''': Blocking IP addresses, especially by blocking entire providers via their [[wikipedia:Autonomous system (Internet)|autonomous system number]], always comes with some risk of blocking legitimate users. Particularly, this may restrict access to users making use of a VPN. *'''Heuristic blocking''': Patterns in request headers may give away that the request is being made by an unethical bot, despite attempts to act as a legitimate visitor. Heuristics are imperfect and may block legitimate users, especially those that may use less common browsers. In rare situations, a website operator may redirect detected bot traffic, such as to download speed test files hosted by ISPs containing multiple gigabytes of random garbage data. This may have the effect of disrupting the bot, but its effectiveness is unknown. The need to respond to unethical scraping also further consolidates the web into the control of a few large [[wikipedia:Web application firewall|web application firewall]] (WAF) services, most notably [[Cloudflare]], as website owners find themselves otherwise unable to protect their service from being disrupted by such traffic.
Summary:
Please note that all contributions to Consumer_Action_Taskforce are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 International (see
Consumer Action Taskforce:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
To protect the wiki against automated edit spam, we kindly ask you to solve the following hCaptcha:
Cancel
Editing help
(opens in new window)