If there’s one jargon term that every user new to the Internet soon becomes acquainted with, spam must near the top of the list. Its prevalence and virtual ubiquity through many forms of online communication have generated miniature industries devoted to dealing with it, and the science of spam detection, prevention and treatment almost resembles the tactical skirmishes of biological immune systems.
Spam exists in many forms, from bogus guestbook entries to elaborate instant messaging robots, but the variety which prompted this post was that classic form – unsolicited email. The level of penetration of spam illustrates itself in the number of systems put in place to combat it as standard on the vast majority of websites, including of course authentication emails and the ever evolving captcha. I use a small combination of plugins on this blog to block out most of the spam, and given the extreme sparcity of genuine comments, the potential for inconvenient ‘false positives’ is rather slim. Nevertheless, even the cursory inspection I tend to make over Akismet’s latest haul becomes tiresome for all the size of this blog – spam comments to date outnumber genuine ones by a factor of almost 500 (and that only counts those caught and tallied by Akismet). Quite how larger, more popular blogs deal with searching for false positives, I don’t know, but the task must be fairly time-consuming.
Yet even that abysmal ratio sometimes seems quite congenial next to the level of email spam I receive in its current state. Whilst the common techniques for filtering out spam emails have fairly high success ratios, the constantly evolving battle with the Bayesian filter can never ultimately separate emails, black and white, and sifting through the gray matter can be a painful experience, particularly when searching for unexpected false positives. Indeed with some of my emails going through multiple filters (before finally ending up in a Thunderbird client and getting filtered once more), I begin to wonder how many emails have simply drifted away in that black sea of jetsam.
The problems of course don’t stop there. In recent days I have been reminded of another serious gripe, when my inbox was flooded with bounced messages, evidence that my address was being used by spammers (and many of those were filtered as spam on account of their message contents, despite technically being genuine messages). Since very few strings at the associated domain are actually received by anybody, it stands that the deluge represents merely the tip of the melting iceberg. There are many tips out there to stop spammers from harvesting your email address, but very few to prevent them using it to spoof messages elsewhere (and even to yourself). The most common piece of advice is simply to wait it out – eventually the spammers move on and utilise a new address, and indeed the bounced messages seem to come in waves.
One of the methods used to reduce spam that was highlighted through these bounced messages is Sender Address Verification. As covered by this post, the method requires people sending email to an address for the first time to verify their authenticity by fulfilling a certain task explained in an automated email reply, before the message (and future messages) may be delivered to the recipient. This bears some resemblance to the automated email verification sent out by many online accounts. However, it is not without its weaknesses. After all, spam sent via spoofed, verified emails will still be delivered as genuine messages, and the potential for spammers to find methods to fulfil the authentication tasks is all too clear from the variety of methods already deployed to crack online captchas.
Ultimately I’m reduced to dealing with spam in the usual manner, relying on filters to do the heavy work and leaving me to occasionally label those messages not picked up, whilst occasionally doing my own filtering for false positives (and burying my head in the sand every time my addresses come up for spoofing duties).
How do others combat the spam plague? Are there other methods commonly available that I’m overlooking? And do people consider the possibility of false positives a necessary evil in the war against spam?