Note to self: Implement some of these referrer spam blocking techniques tonight.
Update: Okay, I set up .htaccess to block anybody with a referrer matching the usual spam keywords. Let me know if it gives you any problems reaching the site. (Although that’s actually impossible, because if you can’t reach the site then you can’t read this post or comment anyway. Catch-22.)
6 responses
Thanks for the link – some good techniques there that will with any luck kill most of my comment spam 🙂
A very useful link – I knew in principle that you could do this sort of thing, but clear instructions like this are a big help. I’ve been seeing several hundred attempts at comment and referrer spam per day lately, to the point where it’s hardly worth reading my referrer stats.
One failed editing attempt later – you’ve got to watch for those stray carriage returns that make Apache throw a fit – my .htaccess file is now set to reject all sorts of referrer spam. Now all I have to do is try to keep up with the new domains the spammers use, but at least I have the tools to do the job now.
I messed up my own site a few times in the process. I think it was because I already had an “allow,deny” statement in there and putting another one in doesn’t work. Combining them fixed the error though.
Maybe we should share our blacklists?
I had an allow,deny statement too, but it was just blocking a few odd IP addresses which were too-persistent sources of comment spam way back in the day so I was happy to junk it in favour of this much more useful list.
At the moment my blacklist is pretty much the one found at the site you linked to, but down the line as I find new entries to add I probably will put up a page somewhere so anyone who feels like it can pick up my entries if they feel it’ll help them out.
Interestingly, one of the techniques Spam Karma for WordPress uses is a centralised blacklist at http://www.unknowngenius.com/blog/blacklist/ which SK automatically reads from every couple of days. I’m sorely tempted to pick up the various :url: entries and pour them into my .htaccess file (with ‘|’ characters as separators and stripping out the surrounding gubbins) and see what happens…
… which would probably be to bork my site, but it’s soooo tempting. All that lovely blacklist data, just waiting to be put to other uses.
I didn’t want to blacklist too many words to start with – for fear of blocking real referrers – so I scrapped it in favor of a few keywords from the spams I always seem to get. I wasn’t sure it was working at first. I don’t bother looking at my logs; I’ve had the Snook write me a perl script I can call from a PHP in my admin area that scrapes them out of the logs, culls out search engines, and presents them nicely. I was still seeing all the spammers last night which confused me until I realized the script was scraping *everything*, even the 403’s. So I had the Snook alter it to filter out the 403’s, and voila! They all disappeared. The raw logs confirm that they’re all getting the smack down. Rodd pointed out that I could’ve just implemented the blacklist at the PHP level and hidden them from view, but it’s much more satisfying to give them the 403 and save the bandwidth, isn’t it? 🙂
Yes, I’ve just been looking at all those 403s in my logs – nice.
Next step: stop the comment spammers generating 404s by requesting non-existent mt-comments.cgi and mt-tb.cgi. I’m tempted to just create a couple of 1-byte files of those names, just so they waste time trying to communicate with a script that isn’t there.