They come from web scrapers incorrectly using Yahoo! Search result. This discovery was made by @tenants at XenForo forum. They explain more of the implications of receiving these requests and how they handle them.
1. Do you think we should be concerned with these requests?
You don’t have to be concerned with these requests. They are just hallmarks of dumb bots and dumb bots wander all over the Internet. These requests are not to be identified as malicious just based on the URL, they are probably innocent.
2. What are these requests trying to archieve?
They’re trying to get the contents of the page they’re requesting. They have no special effect, they are (undesired) products of a Yahoo! Search scraping.
3. Anything we can do to stop them?
Not really, anyone is free to post whatever requests they like on the net. (At least technically. Social and legal aspects put aside.)
You can throw them away when generating reports from your logs. This is the option I chose.
Or you can try to fix the requests to succeed and not generate log entries. This is probably what most do from what I saw on the web. I see a flaw in this approach. While making their visitor’s experience better, they forget who those visitors are. Dumb bots. I don’t want dumb bots on my sites so I won’t bother to improve their experience.
If you want to fix the requests, you can do it using mod-rewrite, possibly called from .htaccess, e.g. using the code from the XenForo forum post I mentioned above:
RewriteEngine On
strange behaving bots, these are urls scraped from yahoo (botters scrapping for links, yahoo search link contain RK RS) tenants modification:
RewriteRule ^(.)RK=0/RS= /$1 [L,NC,R=301]
RewriteRule ^(.)RS=^ /$1 [L,NC,R=301]
You may have to fiddle with the regexes a bit, e.g. add an additional slash after the (.*) if your URLs don’t end with one.
Related
Kudos to @dman’s answer on Stack Overflow and @webaware’s comment under this question for finding the XenForo forum post.
As extra information have seen some more connections with the following
RK=0/RS=N3luhChARYe3D3ZNSAkKO3L2gXE-
So it doesn't follow the other logs.
– user3363066 Feb 28 '14 at 08:18