Allied to corpus-similarity is corpus-homogeneity. An understanding of homogeneity is a prerequisite to a measure of the similarity -- it makes little sense to compare a corpus sampled across many genres, like the Brown, with a corpus of weather forecasts, without first accounting for the one being broad, the other narrow.
Given the centrality of corpora to contemporary language engineering, it is remarkable how little research there has been to date on the question. Biber's work, coming from sociolinguistics, has made a considerable impact, with various researchers in computational lingustics taking forward the model (Biber 1989, 1995). Studies in text classification, genre and sublanguage are also salient, but it is rarely evident how well the technologies ddeveloped in these fields are suited to measuring corpus similarity or homogeneity.
The workshop will welcome contributions concerned with measuring and comparing corpora using quantitative methods, from any field.
compcorp@itri.brighton.ac.ukand hard copies are to be mailed to
COMPCORP submission
ITRI
University of Brighton
Lewes Road
Brighton BN2 4GJ
United Kingdom
| July 8, 2000 | Submission (of full-length paper) |
| August 17, 2000 | Acceptance notice |
| September 1, 2000 | Camera-ready paper received |
| October 7 or 8 | Workshop date |
| Douglas Biber | Northern Arizona University |
| Jeremy Clear | University of Birmingham |
| Ted Dunning | MusicMatch Software, Inc. |
| Tomaz Erjavec | Jozef Stefan Institute, Slovenia |
| Pascale Fung | University of Science and Technology, Hong Kong |
| Greg Grefenstette | Xerox Research Centre Europe |
| Benoit Habert | LIMSI, France |
| Przemek Kaszubski | Adam Mickiewicz University, Poland |
| Adam Kilgarriff | University of Brighton |
| David Lee | University of Lancaster |
| Oliver Mason | University of Birmingham |
| Doug Oard | University of Maryland |
| Tony Rose | Canon Research |
| Tony Berber Sardinha | Catholic University of Sao Paulo, Brazil |
| George Tambouratzis | ILSP, Athens |
| Christopher Tribble | King's College, London University |