SUBJECT: Re : [ &NAME ] Statistical tests for corpus studies &NAME , &NAME wrote : But there is a problem with the &NAME test of too many &NUM in the slices , as your &NAME paper points out &NAME . For example , in the &NAME and &NAME comparison only words with a frequency of &NUM or more ( in the joint corpus ) had few enough &NUM for the test to be applicable . This means that one of the word types in the joint corpus were omitted from the comparison . But if there is n't enough data we should n't be drawing any inferences , so that seems right . A name or technical term that gets used lots of times , but in only &NUM or &NUM documents , is not good for basing any inferences on . ( Some thought has to be given to slice size , and how the corpus is to be sliced up , which will interact with the number of &NUM values you 'll get for the test . ) A couple of people asked for an e-version of the ' Comparing Corpora ' - see &WEBSITE &NAME &NAME &NAME &NAME , University of &NAME &NAME : ( &NUM ) &NUM &NUM &NAME Road , &NAME &NAME &NAME , &NAME fax : ( &NUM ) &NUM &NUM &EMAIL and &NAME &NAME Ltd. &NUM &NAME Road , &NAME &NAME &NAME , &NAME &NAME : ( &NUM ) &NUM &NUM &EMAIL