« November 2006 | Main

May 20, 2007

Spam filtering with Gmail

Pair's default spam filtering options are pathetic. SpamAssassin is wholly inadequate, and after 2 months of trying, I still couldn't get Bayes to work. So I finally gave up, disabled spam filtering, and am now using a Gmail account as the spam filter for all of my Pair aliases. I configured the Gmail account to forward messages to gmail@huangfamily.com. The idea is basically:

Ham → @huangfamily.com → huangfamily.com@gmail.com → gmail@huangfamily.com → Deliver

Spam → @huangfamily.com → huangfamily.com@gmail.com → Gmail Spam folder

At the top of my .procmailrc file (which handles all mail sent to @huangfamily.com), I added the following rule:

The rule basically says, send all new messages to Gmail, and if they return, deliver them. Since Gmail doesn't forward spam, only clean messages will return. To prevent loops, the rule checks for the X-Forwarded-For header set by Gmail when it forwards messages. Messages sent directly to gmail@huangfamily.com are dropped.

I suppose that a smart spam bot could fake the X-Forwarded-For header. They can't fake Received headers, though, so if this starts happening, I could replace the condition with one that checks that the message was sent from a google.com relay.

Renaming the comment script

Bots saved the name of the CGI script used for posting comments on Craftlog. So now it changes every day as part of my daily MT maintenance script. Make sure that the real script mt-comments.cgi is not executable:

Daily DB cleaning

MT 3.2 only deletes junk comments from your blog when you click on the name of your blog from the Main Menu. Presumably because they didn't want to add periodic triggers throughout the code and they thought that most people accessed their blogs every day from the Main Menu. In addition, the activity log can really slow down the DB. So I configured a cron job to clean out the DB every night: