One of our consultant clients pointed this one out to me. It’s a good example of how vitally important it is for SEO consultants to understand how the search engine machine actually works.
While troubleshooting a potential Penguin issue for a client, she discovered that the client was using the “Bad Behavior” plugin for WordPress, which blocks access for bad bots, fake search engine bots, and the like. By itself, this is awesome.
She also learned that the client had recently started using Cloudflare as well.
Cloudflare is a pretty cool CDN & caching system which, in a lot of cases, will make your website faster and more secure. By itself, this too is pretty awesome. It’s not the only solution but it’s pretty awesome and it’s easy to set up.
A little digging by our friend led her to this TechNet post by Jeremy Clark. Like bleach and ammonia (never mix them!), we have in this case two useful solutions, which lead to disaster when combined.
Here’s what happens:
- A real live Googlebot (or even a Bing bot) shows up and requests a page.
- Because they’re using Cloudflare, that request gets passed through to WordPress from a Cloudflare IP address.
- Bad Behavior checks the IP address of this alleged Googlebot and discovers that it’s not coming from a Google IP address.
- Bad Behavior says “no content for you, bad bot!” and returns a 403 Forbidden response to Googlebot.
- Bing, Google, and other search engines can’t get any content from the site, rankings drop, bad things happen, etc.
This kind of problem can be especially difficult to diagnose if the timing lines up with a Google update. Like, say, Penguin – as it did in this case. Which is why we tell all of our clients that the date of a ranking/traffic drop is a hint, but not the final answer.
Since we live in a day and age when Google is dropping multiple significant change events every single month, the chances of a “false positive” are WAY too high to just assume “well, rankings dropped on April 19 or 20, must be that Panda data push.”
Think about it for a second here… in April alone, we had at least 3 significant change events at Google, and each one spans 2-3 days on the calendar, depending on what time zone you’re in. So, that’s potentially 9 different dates – out of 30 – in the month of April.
If something happened to your site that’s completely unrelated to one of those change events, there’s still a 30% chance that it happened on or near the same date.
Fortunately for this client, the solution was simple – just turn off the Bad Behavior plugin, because Cloudflare already does the job of blocking bad bots.
UPDATE: Thanks to Matthew Prince (https://twitter.com/#!/eastdakota) for pointing out another solution. Cloudflare offers an Apache mod that can allow Bad Behavior to function with Cloudflare (http://www.cloudflare.com/wiki/Log_Files).