Final Examination:   Thursday, May 11, 2-5PM, our classroom

Exam is closed book/closed note. However, students may bring a 
single, double sided piece of 8.5x11 inch paper with anything
written on it that you wish. This may include notes and formulas
of any kind, and the act of preparing the sheet is helpful in the
review process.

Topics for Exam:

  PAT trees, suffix arrays
  Inverted files (creation and use), indexing
  Signature files

  Goals and methods of document representation compression in IR

  Boolean IR models (including extensions to basic Boolean models)

  Vector-based IR models in detail
     including term weighting, similarity measures, ...

  Bayesian IR models (Inquery system, Naive Bayes, hierarchical Bayes)

  Evaluation metrics
     precision, recall, F-measure, normalized recall, accuracy
     methods for computation, P_25, P_50, interpolation issues,
     understanding of issues and challenges in IR evaluation

  Query expansion vs. Term clustering

  Clustering algorithms, including Salton's greedy method,
       hierarchical agglomerative clustering (including
         algorithm details such as minimal/maximal/average
          linkage variants, dendograms, etc.)

  SVD (singular value decomposition)/LSI (Latent semantic indexing)

  Relevance Feedback  (and sources of obtaining it)
      Roccio algorithm and its variants

  User and group modelling,
      other features for relevance classification besides term overlap

  Document routing/filtering/topic-classification

  Information Extraction - named entity recognition/tagging
         person/place classification,
         sense tagging - including algorithm comparison
                 and understanding of relation to IR algorithms

  Expectation Maximization (EM) algorithm (e.g. for person/place

  Information visualization - Dotplot and Hearst's TileBars
       (and uses for text segmentation, detection of version differences
             and repetition)

  HTTP protocols in *detail* (including HTTP/1.0 and HTTP/1.1)
  SOIF headers, their motivation and potential uses

  Web robot libraries and techniques, robot exclusion protocols,
     queuing strategies (know HW4 in detail)

  Harvest architecture in detail
     (including Gather, broker system, caching and replication subsystems)

  Hierarchy of web agents (from blind web crawlers
   through intelligent shopping bots)

  collection fusion, search-engine merger (e.g. Metacrawler)
    including detailed analysis of the issues, scale normalization

  collaborative filtering

  PageRank algorithm and link analysis approaches
  Hubs & Authorities model, HITS

  future directions and visions