Information about the syntax and semantics of terms in context is essential for reliable inference in a variety of document annotation and retrieval tasks. We hypothesize that we can derive most of the relevant information from explicit and implicit relationships between terms in the masses of Web content and user interactions with that content. We have developed graph-based algorithms that efficiently combine many small pieces of textual evidence to bootstrap broad-coverage syntactic and semantic classifiers from small sets of manually-annotated examples.
Fernando Pereira is research director at Google. His previous positions include chair of the Computer and Information Science department at the University of Pennsylvania, head of the Machine Learning and Information Retrieval department at AT&T Labs, and research and management positions at SRI International. He received a Ph.D. in Artificial Intelligence from the University of Edinburgh in 1982, and he has over 120 research publications on natural language processing, machine learning, speech recognition, bioinformatics, databases, and logic programming as well as several patents. He was elected Fellow of the American Association for Artificial Intelligence in 1991 for his contributions to computational linguistics and logic programming, and he was president of the Association for Computational Linguistics in 1993.