Joshua
open source statistical hierarchical phrase-based machine translation system
|
Public Member Functions | |
int | getWordID (int position) |
int | getSentenceIndex (int position) |
int[] | getSentenceIndices (int[] positions) |
int | getSentencePosition (int sentenceID) |
int | getSentenceEndPosition (int sentenceID) |
Phrase | getSentence (int sentenceIndex) |
int | size () |
int | getNumSentences () |
int | comparePhrase (int corpusStart, Phrase phrase, int phraseStart, int phraseEnd) |
int | comparePhrase (int corpusStart, Phrase phrase) |
int | compareSuffixes (int position1, int position2, int maxComparisonLength) |
ContiguousPhrase | getPhrase (int startPosition, int endPosition) |
Iterable< Integer > | corpusPositions () |
Corpus is an interface that contains methods for accessing the information within a monolingual corpus.
int joshua.corpus.Corpus.comparePhrase | ( | int | corpusStart, |
Phrase | phrase, | ||
int | phraseStart, | ||
int | phraseEnd | ||
) |
Compares the phrase that starts at position start with the subphrase indicated by the start and end points of the phrase.
corpusStart | the point in the corpus where the comparison begins |
phrase | the superphrase that the comparsion phrase is drawn from |
phraseStart | the point in the phrase where the comparison begins (inclusive) |
phraseEnd | the point in the phrase where the comparison ends (exclusive) |
int joshua.corpus.Corpus.comparePhrase | ( | int | corpusStart, |
Phrase | phrase | ||
) |
Compares the phrase that starts at position start with the phrase passed in. Compares the entire phrase.
corpusStart | |
phrase |
int joshua.corpus.Corpus.compareSuffixes | ( | int | position1, |
int | position2, | ||
int | maxComparisonLength | ||
) |
Compares the suffixes starting a positions index1 and index2.
position1 | the position in the corpus where the first suffix begins |
position2 | the position in the corpus where the second suffix begins |
maxComparisonLength | a cutoff point to stop the comparison |
Iterable<Integer> joshua.corpus.Corpus.corpusPositions | ( | ) |
Gets an object capable of iterating over all positions in the corpus, in order.
Gets the number of sentences in the corpus.
ContiguousPhrase joshua.corpus.Corpus.getPhrase | ( | int | startPosition, |
int | endPosition | ||
) |
startPosition | |
endPosition |
Phrase joshua.corpus.Corpus.getSentence | ( | int | sentenceIndex | ) |
Gets the specified sentence as a phrase.
sentenceIndex | Zero-based sentence index |
int joshua.corpus.Corpus.getSentenceEndPosition | ( | int | sentenceID | ) |
Gets the exclusive end position of a sentence in the corpus.
int joshua.corpus.Corpus.getSentenceIndex | ( | int | position | ) |
Gets the sentence index associated with the specified position in the corpus.
position | Index into the corpus |
int [] joshua.corpus.Corpus.getSentenceIndices | ( | int[] | positions | ) |
Gets the sentence index of each specified position.
position | Index into the corpus |
int joshua.corpus.Corpus.getSentencePosition | ( | int | sentenceID | ) |
Gets the position in the corpus of the first word of the specified sentence. If the sentenceID is outside of the bounds of the sentences, then it returns the last position in the corpus + 1.
int joshua.corpus.Corpus.getWordID | ( | int | position | ) |
int joshua.corpus.Corpus.size | ( | ) |
Gets the number of words in the corpus.