SUBJECT: [ &NAME ] Gross language detection Dear all As a part of a classified ads posting system , a group of natural language processing students supervised by me have to develop a gross language detection system for the Spanish language . I do not know if there is any work in this area ( except maybe [ &NUM ] ) . Dou you have ideas of how to do this ? It seems rather heuristic , but my basic idea is : &NUM To build a dictionary of forbidden words ( &CHAR * * &CHAR , etc ) &NUM To develop a set of regular expresions that allow to detect variations of the forbiden words ( e.g. if ' xyzt ' is a forbidden word , then we have to detect ' &NAME ' , ' &NAME ' or little letter changes for slang - &CHAR ' &CHAR ' instead &CHAR ' &CHAR ' , etc ) . Thank you for your help &NAME &NAME &NAME &NAME &NAME &NAME &NAME &NAME &NAME &NAME &NAME &NAME &NAME &NAME &NUM - &NAME &NAME &NAME - &NAME ( &NUM ) &NUM &EMAIL La legislaciF3n espaF1ola ampara &NAME secreto de las comunicaciones . Este correo electrF3nico es estrictamente confidencial &CHAR va dirigido exclusivamente a su destinatario / &CHAR . Si no es &NAME , le rogamos que no difunda ni copie la transmisiF3n &CHAR nos &NAME notifique cuanto antes . Spanish law guarantees privacy in electronic communications . This electronic transmission is strictly confidential and intended solely for the addressee . If you are not the intended addressee , you are kindly requested not to disclose nor to copy this transmission and to notify us as soon as possible .