SUBJECT: Re : Non-junk email Thanks &NAME , that 's absolutely fantastic . At some point in the future we are hoping to make publicly available an anonymised corpus for standardised testing of email filtering . Would you be happy for your ( anonymised ) data to be included in this corpus ? Thanks again - your contribution will make a big difference to my project and I hope to the advancement of this field . Regards , &NAME On &NAME , &NUM May &NUM , &NAME &NAME wrote : I 'm not familiar with &NAME - which email program are you using ? hmm , looking at a folder on PINE , it appears to be &NAME , which is the standard &NAME mail folder format , so I 'll use that . My e-mail client is &NAME ( &WEBSITE ) but there are scripts to convert between almost any e-mail folder format to any other . In terms of numbers , between &NUM would be ideal . Genuine mailing list emails are also useful because it 's mostly text-analysis that we 're interested in. I created a folder containing &NUM e-mails from a variety of sources . It was 47Mb , so I compressed it with &NAME and uploaded it to &WEBSITE If you want something smaller , I 'm sure you could run a script over it to strip out all attachments . &NAME this is useful ! &NAME