Cornell University

Cornell University

Physics Educational Computing Facility (PECF)

Gnu meets Tux...







Spam handling

Default spam handling
All email to @physics.cornell.edu is passed through dspam which only tags emails, i.e., never quarantines; it is up to the user to filter based on the tags. If dspam thinks that a certain message is spam, it
  • Attaches a prefix [SPAM] to the subject
  • Adds a header X-DSPAM-Result: Spam to the email. It is advisable to use this header rather than the subject line for filtering.
If you're using maildrop to filter tagged messages, the following rule in ~/.mailfilter should filter tagged messages into a folder called .IN-Spam

	      if (/^X-DSPAM-Result: Spam/) 
	        { 
		  to "$HOME/Mail/IN-box/.IN-Spam/"
		}
	      
Of course, you'll have to create the folder first with maildirmake ~/Mail/IN-box/.IN-Spam

Training dspam
Sometimes dspam will tag spam as innocent email, or normal email as spam. That is not unusual, and it simply means that dspam needs to know what you think is spam, so that it can filter properly in the future. Training occurs by bouncing or forwarding emails to the appropriate email address. Since dspam went in effect from May 1st, the training method depends on whether you received the email before or after that date.
  • If the email came before 1st May 2007, was a spam, and you want to train dspam to recognize similar emails as spam, bounce or forward it to old_spam_at_physics.cornell.edu
  • If the email came after 1st May 2007, was a spam, and was not tagged by dspam as spam, bounce or forward it to spam_at_physics.cornell.edu
  • If the email came before 1st May 2007, was not a spam, and you want to train dspam to recognize similar emails as "innocent", forward or bounce it to old_ham_at_physics.cornell.edu
  • If the email came after 1st May 2007, was not a spam, and was tagged incorrectly by dspam as [SPAM], forward or bounce it to ham_at_physics.cornell.edu
Try to keep the total number of sample spams and sample hams you send to dspam roughly the same. After a few hundred of these, dspam should hopefully begin to be accurate enough to make a difference.

Why is training not kicking in?
Training dspam is a complex process. In particular, if you feed it a lot of spam (false negatives) without feeding it enough hams (either false or true negatives), the spam tagging will deteriorate, i.e., it will allow more spams through to prevent false positives (since it will not have a good idea what true ham messages look like). So always feed it a bunch of hams along with the spams. Also, after 2,500 hams, something called statistical sedation kicks off, so that tagging is more aggressive. This is the training threshold. Features like Bayesian noise reduction only kick in after the training period.

I already have a huge collection of spams and non-spams from Spamassassin. How do I use that to train dspam?
Put all the spams and non-spams in two separate maildirs (you probably already have that), and send the info to help_at_physics.cornell.edu