SUBJECT: Re : rasp subcat lexicon OK -- we also have the 100M word American &NAME Corpus now -- speak to &NAME if you ca n't find it , so it wld probably make sense to pull egs from this too . &NAME got back to me just now , telling that we have n't got &NAME yet . The &NAME website ( &WEBSITE by autumn &NUM . We have got North American News Text corpus ( /usr / groups / corpora-cds / &NAME ) -- but I do n't know if this is balanced enough ( ? ) . &NAME 7k sentence / ver if we are talking 5k verbs or so is not a problem but gathering / finding them maybe ! There are enough occurrences in &NAME ( )300 parsed egs ) for &NUM out of &NUM Comlex verbs ( /usr / groups / dict / subcat / rasp_lexicon / in-comlex ) . I selected a number of remaining , underrepresented verbs ( from a few freq . ranges ) and checked manually with &NAME how many occurrences can be found in pdf / ps documents . I came up with the following statistics ( obviously not fully accurate , but gives a rough idea ) : &NAME &NAME # # &NUM duplicate - &NUM , &NUM &NUM reunite - &NUM , &NUM &NUM rejoice - &NUM , &NUM &NUM lighten - &NUM , &NUM &NUM misjudge - &NUM , &NUM &NUM absolve - &NUM , &NUM &NUM computerize - &NUM , &NUM &NUM sweeten - &NUM , &NUM &NUM pacify - &NUM , &NUM &NUM authenticate - &NUM , &NUM &NUM amputate- &NUM , &NUM &NUM resettle - &NUM , &NUM &NUM overstep - &NUM , &NUM &NUM embezzle - &NUM , &NUM &NUM smarten - &NUM , &NUM &NUM disinherit - &NUM , &NUM &NUM overreach - &NUM , &NUM &NUM reread - &NUM , &NUM &NUM humanize - &NUM , &NUM &NUM misquote - &NUM , &NUM &NUM inundate - &NUM , &NUM &NUM mispronounce - &NUM , &NUM &NUM abominate - &NUM &NUM reinsure - &NUM , &NUM &NUM miscall - &NUM &NUM piddle - &NUM &NUM contemn - &NUM &NUM declassify - &NUM &NUM misconduct - &NUM &NUM underspend - &NUM &NUM dissatisfy - &NUM &NUM calumniate - &NUM &NUM disinfest - &NUM &NUM asseveriate - &NUM ( some spellings / verbs could be &NAME here and explain higher freqs . ) I think it would be good to either search data from internet for all the &NUM underrepresented Comlex verbs in &NAME ( i.e. all verbs marked with frequency less than &NUM in /usr / groups / dict / subcat / rasp_lexicon / in-comlex ) , or set the &NAME freq . threshold very low ( ( &NUM occurrences ) . It might make sense to also set an upper bound on the highest freq . verbs -- perhaps somewhere around &NUM , &NUM - &NUM , &NUM raw occurrences . I do n't think we need more than that , and parsing will be faster when the data is not so huge . ... Finally , about the trip to &NAME . I asked &NAME about the budget . She gave the following figures : &NAME has a total of &NAME &NUM ( after &NAME 's trip ) &NAME / I4 has a total of &NAME &NUM Can we use &NAME / I4 for travels ? If so , me going to &NAME would exhaust &NAME , but leave some &NUM in &NAME / I4 . Perhaps still enough for you to attend &NUM or one of the conferences next year ? &NAME