|
Joshua
open source statistical hierarchical phrase-based machine translation system
|
Public Member Functions | |
| BiCorpus (String foreignFileName, String nativeFileName) throws IOException | |
| BiCorpus (String foreignFileName, String nativeFileName, String alignmentFileName) throws IOException, IllegalArgumentException, IndexOutOfBoundsException | |
| Iterator< PhrasePair > | iterator () |
Protected Attributes | |
| final String | foreignFileName |
| final String | nativeFileName |
| final String | alignmentFileName |
Static Private Member Functions | |
| static void | fileLengthMismatchException () throws RuntimeException |
Class for representing a sentence-aligned bi-corpus (with optional word-alignments).
In order to avoid memory crashes we no longer extend an ArrayList, which tries to cache the entire file in memory at once. This means we'll re-read through each file (1 + Subsampler#MAX_SENTENCE_LENGTH / binsize) times where binsize is determined by the subsample(String, float, PhraseWriter, BiCorpusFactory) method.
| joshua.subsample.BiCorpus.BiCorpus | ( | String | foreignFileName, |
| String | nativeFileName | ||
| ) | throws IOException |
Constructor for unaligned BiCorpus.
| joshua.subsample.BiCorpus.BiCorpus | ( | String | foreignFileName, |
| String | nativeFileName, | ||
| String | alignmentFileName | ||
| ) | throws IOException, IllegalArgumentException, IndexOutOfBoundsException |
| static void joshua.subsample.BiCorpus.fileLengthMismatchException | ( | ) | throws RuntimeException [static, private] |
| Iterator<PhrasePair> joshua.subsample.BiCorpus.iterator | ( | ) |
Iterate through the files represented by this BiCorpus, returning a PhrasePair for each pair (or triple) of lines.
final String joshua.subsample.BiCorpus.alignmentFileName [protected] |
final String joshua.subsample.BiCorpus.foreignFileName [protected] |
final String joshua.subsample.BiCorpus.nativeFileName [protected] |