SUBJECT: Re : [ &NAME ] Is the &NAME a waste of time ? Whilst trying to observe and not react for a while so as to cut my email writing time , &NUM cannot but reply to this string with an unequivocal , no , &NAME is definitely not a waste of time , but a cornerstone of corpus linguistics . It is obvious that for the small corpus designer , &NUM million or less tokens , markup is a considerable investment in time . However , if one holds , as I do , that a corpus is not a simple mass of data , but a carefully compiled selection of texts , then we need a means to treat them as texts , to store both their general features and their particularities . This the &NAME does . In my own work in the field of English for Academic Purposes , I tend not to use the corpus header but a standard individual header so as to stock all the bibliographic information and socilinguistic parameters associated with the text . The depth of markup depends on my needs , and time , for an individual text . In this way I can move with ease from a fully annotated single text to a more lightly marked up corpus . This is possible because of the encoding possibilities of the &NAME . Education is very much part of the answer . Easy access to vast amounts of downloadable data has meant that a number of ' corpus &NAME ' neither know nor care about the niceties of corpus creation , and the whys and wherefores of selecting and marking up data . Ease of access has become the main criterion , potentially to the detriment of the discipline itself . Easy solutions do not necessarily answer the most pertinent questions . It is true that all this takes time , but if we throw out all that is time-consuming drudgery from corpus linguistics , we may find that we have thrown out our text baby with the corpus bathwater and are only left with ready-made corpora for ready-made answers . Back to some time consuming markup . &NAME Dr. &NAME &NAME &NAME , DE9partement Langues EtrangE8res AppliquE9es &NAME &NAME et Sciences &NAME &NUM , rue &NAME &NAME &NAME &NUM &NUM LORIENT &NAME &NAME &NAME : &NUM ( &NUM ) &NUM &NUM &NUM &NUM &NUM fax : &NUM ( &NUM ) &NUM &NUM &NUM &NUM &NUM email : &EMAIL &WEBSITE Dear &NAME , Yes , I have some sympathy with the point you make . The thing that has attracted me to the &NAME in the past , though , is once the effort is made to get to grips with it ( and it is daunting ) there is usually a well thought through solution contained in it for almost any problem situation you come across in encoding a corpus ! With that said , it is a clear theme of the posts so far that there is , at the very least , an advocacy issue related to the &NAME in corpus linguistics , which is interesting . Best , &CHAR Interesting question ... There are &NUM issues here : &NUM Ignorance and confusion . Most people have only a vague idea what &NAME is or does or what it is good for . There would need to be a effort to ( re- ) educate the potential users of &NAME . Does &NAME do something different from &NAME ? Absurd question I know but that is the kind of confusion which I suspect exists . &NUM Complexity . When it was introduced many people reacted against it as too complex . Now they have all adopted xml , rdf etc. which are much more complicated to use . So potential users ' perception would now be ripe for a re-presentation of &NAME . Related to both of these issues is that of the documentation available to educate people & help potential users understand what &NAME is , does , & is good for . A research assistant & I have recently been poring over a couple chapters of the &NAME guidelines , looking for guidelines & relevant examples to add some markup to our already ( mostly ) TEI-conformant corpus markup scheme . Although the documentation is extensive , it is inadequate in many ways , missing examples , not very good at giving a larger picture to people who are n't sure if they need / want the &NAME at all or who just need some pointers to a few relevant sections . If the only people who can read the documentation and make use of it are information / library science people who are specifically trained in that area , then it 's no wonder &NAME & others who are in the business of building corpora are not using it or promoting it . &NAME &NAME Project Director , &NAME Corpus of &NAME &NAME &NAME ( &NAME ) English Language Institute University of &NAME