Oct. 29th, 2003

I find it pretty damn pathetic that procmail and spamassassin now send about 80% of my inbound mail to junk mail folders, or throw it away entirely. Four-fifths of the work the Internet does to get email to me is completely wasted because I never even see it.

I suppose the good news is that I have spamassassin's Bayesian classifier trained well enough, and the point-rankings it assigns tweaked sufficiently, that it's letting very nearly none of the spam to my inbox. It even seems to be catching the mail which is just a big block of text from War and Peace in a font color that's the same as the background, and then an IMG tag to a giant GIF of the actual ad telling me how I can get a thicker ŗõD or a longer PêÑ|5.

Kate is slowly training her own Bayesian classifier since I've been pretty vocal about how well it works. It's tedious work to manually sort through all the inbound mail to build up a big enough "corpus" of spam and non-spam, but it was definitely worth it, for me at least.


Oct. 29th, 2003 01:06 pm (UTC)
Showing my complete ignorence in all things network. Do you pop your mail to a client or just read thru unix? Is this procmail and spamassassin occur on the unix end? Would this be something I could set up on my prairienet.org account before i download the email into eudora? What is a Bayesian classifier?

Oct. 31st, 2003 12:29 pm (UTC)
I use IMAP to pull mail off a unix server.

Spamassassin and procmail kick in at mail delivery time, so they need to be running on your account on prairienet. I think they have them both installed, so you would just need help with getting the right .procmailrc file in your home directory there. Mail will get sorted into separate folders according to spam/nonspam/whatever, and from there you can manage it with pine or an IMAP-based mail reader, which understands folders on the server side. I'm pretty sure POP won't work because unlike IMAP, all POP knows how to do is suck stuff out of your inbox down to the client.
Oct. 29th, 2003 02:20 pm (UTC)
Training a Bayesian filter
I use bogofilter and initially trained it against the SpamAssassin public corpus. It didn't require too much more training before bogofilter was correctly identifying spam and self-reinforcing.
