Joshua
open source statistical hierarchical phrase-based machine translation system
 All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Friends
joshua.util.ExtractTopCand Class Reference

List of all members.

Static Public Member Functions

static void main (String[] args)

Static Protected Member Functions

static void extractOneBest (IndexedReader< String > nbestReader, BufferedWriter onebestWriter) throws IOException

Detailed Description

This program extracts the 1-best output translations from the n-best output translations generated by joshua.decoder.JoshuaDecoder.

Author:
wren ng thornton wren@.nosp@m.user.nosp@m.s.sou.nosp@m.rcef.nosp@m.orge..nosp@m.net
Version:
LastChangedDate:
2009-03-26 15:06:57 -0400 (Thu, 26 Mar 2009)

Member Function Documentation

static void joshua.util.ExtractTopCand.extractOneBest ( IndexedReader< String >  nbestReader,
BufferedWriter  onebestWriter 
) throws IOException [static, protected]

Prints the one-best translation for each segment ID from the reader as a line on the writer, and closes both before exiting. The translations for a segment are printed in the order of the first occurance of the segment ID. Any information about the segment other than the translation (including segment ID) is not printed to the writer.

Developer Notes

This implementation assumes:

  1. all translations for a segment are contiguous
  2. the 1-best translation is the first one encountered.

We will need to alter the implementation if these assumptions no longer hold for the output of JoshuaDecoder (or any sensible n-best format passed to this method).

We should switch to using an n-best joshua.decoder.segment_file.SegmentFileParser to ensure future compatibility with being able to configure the output format of the decoder. The MERT code needs such a SegmentFileParser anyways, so that will reduce the code duplication between these two classes.

Here is the call graph for this function:

Here is the caller graph for this function:

static void joshua.util.ExtractTopCand.main ( String[]  args) [static]

Usage: java ExtractTopCand nbestInputFile 1bestOutputFile.

If the input file name is "-" then input is read from System.in. If the output file name is "-" then output is directed to System.out. If a file already exists with the output file name, it is truncated before writing. The bulk of this program is implemented by extractOneBest(IndexedReader,BufferedWriter).

Here is the call graph for this function: