SUBJECT: Re : Multilingual text categorisation &EMAIL wrote : If by ' Multilingual &NAME &NAME ' you mean the problem of identifying the language of the document , then try : &NAME and &NAME , ' Using compression-based language models for text categorization ' , Language &NAME and In any case , using , e.g. gzip for text categorization is easy and fun . I have a bourne script here ( for my students ) that does pretty well . Who wants it is welcome to send me an email . It gets more complicated when you try * image * categorization using the same techniques . I am putting a group of students through their paces with this task , but I very much hope that they do n't come up with embarrasing questions this afternoon about problems I am wrestling with right now . If somebody happens to have a pointer to relevant literature I 'd like to hear about it . &NAME Dr &NAME &NAME &NAME &NAME &NAME &WEBSITE