SUBJECT: Re : [ &NAME ] spam I 've been using SpamNet for some time now , so I thought I would give you an update on the performance : &NAME stopped : &NUM spam messages I 've blocked : &NUM spam messages &NAME checked : &NUM , &NUM messages &NAME : &NUM ( &NUM false positives ) Recall : &NUM ( &NUM false negatives ) &NAME uses community-based filtering : &WEBSITE The community is now more than half a million people . There are &NUM things I particularly like about &NAME : ( &NUM ) It does not use filter rules . You just install it and it works ; no tweaking rules . ( &NUM ) Its precision is perfect . It has never blocked a message that was n't spam . - &NAME . I posted a plea for help with spam a while back , so thought I 'd let you know what solution I ended up with . My big constraint was that I get most of my mail through &NAME &NAME , which requires an &NAME connection to download mail over the open &NAME . That seemed at first to rule out a lot of spam solutions . The first one I tried was &NAME Inspector , which creates and periodically updates a set of native &ORG filtering rules . This would be an elegant solution , except that effectiveness was quite low . I also found at least &NUM bug , in that it corrupted some of my existing filtering rules while producing whitelist rules from my address book . I then decided to bit the bullet and deal with the &NAME problem directly , by installing stunnel ( &WEBSITE is proxy that runs on a range of client machines . It makes the &NAME connection to external servers and provides a local server that provides a nonencrypted connection . That opened up a range of spam filtering possibilities . I ended up choosing POPFile ( &WEBSITE It has a learning capability , ability to write ( regrettably simplistic ) hard rules for handing mailing lists , and relatively simple installation . My impression was that &NAME Assassin was harder to install , but I also seriously considered that one . I 've been using POPFile for about a day and have been quite impressed . It uses naive &NAME classification on words and on some aspects of the header , and has a nice interface for labeling training data . It operates as a proxy , and can insert tags in the header and / or subject lines , so is compatible with any email program . There 's lots of ways it could be improved on both the usability and machine learning fronts , but it 's not bad at all and is already making my life much easier . It only trains on those examples you tell it to , and if I recall correctly the documentation encourages you to only train on examples it made a mistake on ( ' conservative training ' in &NAME 's sense ) . &NAME actually had a paper , with somewhat murky results , on conservative training of naive &NAME classifiers : @inproceedings { &NAME , author ' &NAME &NAME and &NAME &NAME ' , title ' An Apobayesian Relative of &NAME ' , booktitle ' Advances in Neural Information Processing Systems &NUM ' , year &NUM , pages ' &NUM ' , editor &NAME &NAME and &NAME &NAME and &NAME &NAME ' , publisher &NAME &NAME ' , address &NAME &NAME ' It would be interesting to know more about how well naive &NAME does with conservative training . &NAME p.s. Thanks to everyone who wrote me either on the list or directly with spam suggestions . People who wrote directly included &NAME &NAME , &NAME &NAME ( with helpful comments on &NAME Assassin ) , and &NAME &NAME ( recommending SpamNet ) . &NAME ( &NAME Mail Avoidance Postscript ) &NUM : &NAME &NAME has a postdoc position available on hierarchical categorization , adaptive filtering , etc. for managing email ( not spam filtering ) . Interested applicants should email a resume and / or any questions to &EMAIL . &NAME &NUM : The deadline for talk proposals for Operational Text Classification &NUM is June &NUM , &NUM . &NAME &NUM will be held 27-Aug-2003 in &NAME , &NAME , at &NAME &NUM . Details at &WEBSITE &NAME &NUM : There will be a &NAME tutorial workshop ( June &NUM , &NUM at &NAME ) on statistical methods for detecting infectious disease outbreaks . &NAME &NAME and I will be discussing text mining approaches . See &WEBSITE