|
Joshua
open source statistical hierarchical phrase-based machine translation system
|
Public Member Functions | |
| synchronized boolean | hasNext () |
| Sentence | next () |
| void | remove () |
| void | register (Translation translation) |
| String | oracleSentence (int id) |
Package Functions | |
| InputHandler (String corpusFile, String oracleFile) | |
Package Attributes | |
| String | corpusFile = null |
| int | sentenceNo = -1 |
| Sentence | nextSentence = null |
| BufferedReader | lineReader = null |
| String | nextOracleSentence = null |
| BufferedReader | oracleReader = null |
| List< Sentence > | issued |
| List< Translation > | completed |
| List< String > | oracles |
| int | lastCompletedId = -1 |
Private Member Functions | |
| void | prepareNextLine () |
Static Private Attributes | |
| static final Logger | logger = Logger.getLogger(InputHandler.class.getName()) |
| static final Charset | FILE_ENCODING = Charset.forName("UTF-8") |
This class represents input to the decoder. It currently supports three kinds of input: (1) plain sentences and (2) sentences wrapped in a <seg> tag (via the Sentence class) and (3) lattices (in Python Lattice Format, via the Lattice class). Format (2) is used to denote the sentences number of each sentence.
The input handler provides thread-safe sequential access to the input sentences. It also manages receiving and assembling decoded sentences in order (via calls to register()).
Ideally, InputHandler objects could represent complicated constraints and restrictions on the object being decoded. This would require the actual chart-parsing code to be aware of the restrictions, which could be provided through this object, whose job it would be to parse those constraints from the input.
| joshua.decoder.InputHandler.InputHandler | ( | String | corpusFile, |
| String | oracleFile | ||
| ) | [package] |
| synchronized boolean joshua.decoder.InputHandler.hasNext | ( | ) |
| String joshua.decoder.InputHandler.oracleSentence | ( | int | id | ) |
When the ability to handle oracle sentences is added back in, this function should return the parallel oracle sentence.
| void joshua.decoder.InputHandler.prepareNextLine | ( | ) | [private] |
This is called only from (a) the constructor and (b) the next() function. Since the Constructor is called only once, and the call to prepareNextLine() in next() happens within a lock, this function does not require synchronization.
| void joshua.decoder.InputHandler.register | ( | Translation | translation | ) |
Receives a sentence from a thread that has finished translating it.
| void joshua.decoder.InputHandler.remove | ( | ) |
List<Translation> joshua.decoder.InputHandler.completed [package] |
String joshua.decoder.InputHandler.corpusFile = null [package] |
final Charset joshua.decoder.InputHandler.FILE_ENCODING = Charset.forName("UTF-8") [static, private] |
List<Sentence> joshua.decoder.InputHandler.issued [package] |
int joshua.decoder.InputHandler.lastCompletedId = -1 [package] |
BufferedReader joshua.decoder.InputHandler.lineReader = null [package] |
final Logger joshua.decoder.InputHandler.logger = Logger.getLogger(InputHandler.class.getName()) [static, private] |
String joshua.decoder.InputHandler.nextOracleSentence = null [package] |
Sentence joshua.decoder.InputHandler.nextSentence = null [package] |
BufferedReader joshua.decoder.InputHandler.oracleReader = null [package] |
List<String> joshua.decoder.InputHandler.oracles [package] |
int joshua.decoder.InputHandler.sentenceNo = -1 [package] |