WordPress Draft Crawl by Baiduspider
An interesting log in Apache hosts log (below) surprised me. I saw this URL crawl attempt by what is supposed to be Baidu – I checked – it was. What is so surprising is that the URL is a sentence of a DRAFT I was working on at the moment (screenshot below – notice the draft status bottom right). This is cause for alarm in a couple ways besides that I never published the draft (OK, that is a known issue that bots can index drafts) but this is only one line in the draft – not the URL of the draft, and Baidu was now hitting every wordpress site on the server looking for this URL. Badbots get blocked.
==> /var/log/apache2/other_vhosts_access.log <==
[11/Jul/2015:08:39:47 +0000] dealercomp.com:80 18.104.22.168 – – “GET /why-do-i-bother HTTP/1.1” 301 551 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”
[11/Jul/2015:08:39:47 +0000] hartenstine.com:80 22.214.171.124 – – “GET /why-do-i-bother HTTP/1.1” 404 6797 “-” “Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)”