Joshua
open source statistical hierarchical phrase-based machine translation system
 All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Friends
joshua.subsample.BiCorpus Class Reference

List of all members.

Public Member Functions

 BiCorpus (String foreignFileName, String nativeFileName) throws IOException
 BiCorpus (String foreignFileName, String nativeFileName, String alignmentFileName) throws IOException, IllegalArgumentException, IndexOutOfBoundsException
Iterator< PhrasePairiterator ()

Protected Attributes

final String foreignFileName
final String nativeFileName
final String alignmentFileName

Static Private Member Functions

static void fileLengthMismatchException () throws RuntimeException

Detailed Description

Class for representing a sentence-aligned bi-corpus (with optional word-alignments).

In order to avoid memory crashes we no longer extend an ArrayList, which tries to cache the entire file in memory at once. This means we'll re-read through each file (1 + Subsampler#MAX_SENTENCE_LENGTH / binsize) times where binsize is determined by the subsample(String, float, PhraseWriter, BiCorpusFactory) method.

Author:
UMD (Jimmy Lin, Chris Dyer, et al.)
wren ng thornton wren@.nosp@m.user.nosp@m.s.sou.nosp@m.rcef.nosp@m.orge..nosp@m.net
Version:
$LastChangedDate$

Constructor & Destructor Documentation

joshua.subsample.BiCorpus.BiCorpus ( String  foreignFileName,
String  nativeFileName 
) throws IOException

Constructor for unaligned BiCorpus.

joshua.subsample.BiCorpus.BiCorpus ( String  foreignFileName,
String  nativeFileName,
String  alignmentFileName 
) throws IOException, IllegalArgumentException, IndexOutOfBoundsException

Constructor for word-aligned BiCorpus.

Here is the call graph for this function:


Member Function Documentation

static void joshua.subsample.BiCorpus.fileLengthMismatchException ( ) throws RuntimeException [static, private]

Here is the caller graph for this function:

Iterate through the files represented by this BiCorpus, returning a PhrasePair for each pair (or triple) of lines.

Here is the call graph for this function:

Here is the caller graph for this function:


Member Data Documentation

final String joshua.subsample.BiCorpus.nativeFileName [protected]