SUBJECT: Re : [ &NAME ] Corpus Sanitation Hi all , &NUM absolutely second &NAME 's post . In fact , I have issues in principle with anonymization , as this will obviously affect phonological aspects of the corpus , due to the very anonymization . Likewise , it will tend to skew proper nouns , as these are the &NUM anonymized , generally , and these are some issues which interest me particularly . I know that people have addressed the general issue , and the ethical questions are real , but there must be some way around this problem . 0909Jim &NAME Wed , &NUM &NAME &NUM , &NAME , &NAME wrote : Dear All , I was interested to read in the recent posting to the list by &NAME &NAME &CHAR ( see below ) that he was uncertain as to whether he should make his corpus publicly available because it contained some ' uncensored words ' ( &NAME 's point &NUM ) . I guess that this means ' bad language ' ( I assume it does not rel ate to anonymization issues as they are covered in &NAME 's point &NUM ).If this is about ' bleeping out ' words in corpora , should n't we encourage &NAME not to do this ? Surely we want corpora to contain uncensored speech ? The point , for me , of using corpora is to describe / account for language as it is , rather than language as we wish it to be . 2E. . &NAME great and cognitive scientist &NAME &NAME on the mind / brain : ' If ever I gotta bust your brains out , baby , &NAME , It 'll make you lose your mind . ' &NAME &NAME e-mail : &EMAIL &NAME en &NAME &NAME tel . : + ( &NUM ) &NUM x5705 &NAME &NAME &NAME &NAME &CHAR fax : + ( &NUM ) &NUM &NAME &NAME &NAME &NAME