Dealing with Spam

If there’s one jargon term that every user new to the Internet soon becomes acquainted with, spam must near the top of the list. Its prevalence and virtual ubiquity through many forms of online communication have generated miniature industries devoted to dealing with it, and the science of spam detection, prevention and treatment almost resembles the tactical skirmishes of biological immune systems.

Spam exists in many forms, from bogus guestbook entries to elaborate instant messaging robots, but the variety which prompted this post was that classic form – unsolicited email. The level of penetration of spam illustrates itself in the number of systems put in place to combat it as standard on the vast majority of websites, including of course authentication emails and the ever evolving captcha. I use a small combination of plugins on this blog to block out most of the spam, and given the extreme sparcity of genuine comments, the potential for inconvenient ‘false positives’ is rather slim. Nevertheless, even the cursory inspection I tend to make over Akismet’s latest haul becomes tiresome for all the size of this blog – spam comments to date outnumber genuine ones by a factor of almost 500 (and that only counts those caught and tallied by Akismet). Quite how larger, more popular blogs deal with searching for false positives, I don’t know, but the task must be fairly time-consuming.

Yet even that abysmal ratio sometimes seems quite congenial next to the level of email spam I receive in its current state. Whilst the common techniques for filtering out spam emails have fairly high success ratios, the constantly evolving battle with the Bayesian filter can never ultimately separate emails, black and white, and sifting through the gray matter can be a painful experience, particularly when searching for unexpected false positives. Indeed with some of my emails going through multiple filters (before finally ending up in a Thunderbird client and getting filtered once more), I begin to wonder how many emails have simply drifted away in that black sea of jetsam.

The problems of course don’t stop there. In recent days I have been reminded of another serious gripe, when my inbox was flooded with bounced messages, evidence that my address was being used by spammers (and many of those were filtered as spam on account of their message contents, despite technically being genuine messages). Since very few strings at the associated domain are actually received by anybody, it stands that the deluge represents merely the tip of the melting iceberg. There are many tips out there to stop spammers from harvesting your email address, but very few to prevent them using it to spoof messages elsewhere (and even to yourself). The most common piece of advice is simply to wait it out – eventually the spammers move on and utilise a new address, and indeed the bounced messages seem to come in waves.

One of the methods used to reduce spam that was highlighted through these bounced messages is Sender Address Verification. As covered by this post, the method requires people sending email to an address for the first time to verify their authenticity by fulfilling a certain task explained in an automated email reply, before the message (and future messages) may be delivered to the recipient. This bears some resemblance to the automated email verification sent out by many online accounts. However, it is not without its weaknesses. After all, spam sent via spoofed, verified emails will still be delivered as genuine messages, and the potential for spammers to find methods to fulfil the authentication tasks is all too clear from the variety of methods already deployed to crack online captchas.

Ultimately I’m reduced to dealing with spam in the usual manner, relying on filters to do the heavy work and leaving me to occasionally label those messages not picked up, whilst occasionally doing my own filtering for false positives (and burying my head in the sand every time my addresses come up for spoofing duties).

How do others combat the spam plague? Are there other methods commonly available that I’m overlooking? And do people consider the possibility of false positives a necessary evil in the war against spam?

Sadly the only way of maintaining a vaguely clean inbox is spending quite a lot of effort tweaking your (bayesian?) filters. We should in fact probably have a chat about that at some point — if you have mails that haven't been caught by SpamAssassin on the tux in a folder (or otherwise easily identified) somewhere, then you could use those to train SA specifically for your account. If there is some type of mail that routinely gets through, you should let me know and I can add appropriate rules to SA (e.g. I'm currently contemplating slapping an extra 3 spam points on any message that mentions 123greetings).

All this takes time and effort, but I guess the benefit of not seeing any spam is worth quite a lot. As for false positives, I have always tried to err on the side of caution, disabling tests that don't make 100% sense to me and/or setting thresholds rather high. I haven't ever had (noticed?) a false positive myself, but apparently Haz had one which he fortunately noticed. If this happens, there are certainly ways to white-list 'from' addresses on the server level, and possibly also on the user level.

We should probably have a chat about this if it's an increasing problem. 🙂

2 Comments

P

Sadly the only way of maintaining a vaguely clean inbox is spending quite a lot of effort tweaking your (bayesian?) filters. We should in fact probably have a chat about that at some point — if you have mails that haven't been caught by SpamAssassin on the tux in a folder (or otherwise easily identified) somewhere, then you could use those to train SA specifically for your account. If there is some type of mail that routinely gets through, you should let me know and I can add appropriate rules to SA (e.g. I'm currently contemplating slapping an extra 3 spam points on any message that mentions 123greetings).

All this takes time and effort, but I guess the benefit of not seeing any spam is worth quite a lot. As for false positives, I have always tried to err on the side of caution, disabling tests that don't make 100% sense to me and/or setting thresholds rather high. I haven't ever had (noticed?) a false positive myself, but apparently Haz had one which he fortunately noticed. If this happens, there are certainly ways to white-list 'from' addresses on the server level, and possibly also on the user level.

We should probably have a chat about this if it's an increasing problem. 🙂

19th September 2008
Fips

Fortunately, I'm not aware of any false positives going astray, and that to my mind is a much greater evil than dealing with the rare minnows that manage to slip through the spam filter nets.

Much more of an issue is the hundreds of bounced messages that fill the inbox on occasion that spammers decide to spoof a real address at the domain, though even that is a rare event and not one that can't be dealt with in a couple of minutes.

22nd September 2008

Dealing with Spam

Share this:

Mapped Drives in Windows XP

The Cost of Reading

2 Comments

P

Fips