my best idea for spam yet

I haven't seen any false positives AT ALL for mail which Spamassassin's Bayesian classifier gives a probability of 0.7 or better, so I just wrote a procmail rule which takes any such mail and simply throws it away. Anything lower than that that spamassassin still catches due to other rules I feed back into the classifier to reinforce it. My inbox also feeds back into the classifier as "ham" to reinforce its decisions in the other direction.

But the vast majority of my spam gets that magic 0.7 or above (most of it is 0.9+ and I'm even starting to see some 0.99+ probabilities now), so it's suddenly like I'm getting very little spam.



Nov. 5th, 2003 02:10 pm (UTC)
Yeah ironic isn't it?

And yes, I'm just brute-force matching on the X-Spam-Status header. I'm not doing it "right" because I was too lazy to re-lookup how to do conditionals in procmail... right now it snags anything with the right bayes_NN flag even if spamassassin in general decided that the mail was NOT spam. I'll fix that some evening... it's not really making much difference.
Nov. 6th, 2003 07:18 am (UTC)
I hope by `throw away,' you mean that you're sending it to a spam folder since it appears they turned off autolearn semi-recently, so you really should tell spamassassin that your spam is indeed spam.

I recently was having an annoying amount of spam getting through spamassassin, so I upgraded from 2.53 to 2.60, blew away my bayes databases, and retrained it from scratch, and it's working a *lot* better now.

Nov. 6th, 2003 09:17 am (UTC)
No, actually for stuff with a high enough Bayes score I just throw it out. I read that it works to "maintenance-train" it with things it gets wrong, so I've been feeding it my inbox as ham, and the spam that spamassassin catches but has a bayes probability of < 0.7 as spam.

So far so good.

Is it true, though, that I can feed to sa-learn the entire spam message, with the spasassassin envelope around it, and it's smart enough to pick out the attachment with the original message? That would certainly save a lot of time feeding messages to it.
Nov. 7th, 2003 12:14 pm (UTC)
Yes, spamassassin knows to ignore it's own chatter when analysing a message you tell it is ham or spam.
