Public Member Functions
int	getWordID (int position)
int	getSentenceIndex (int position)
int[]	getSentenceIndices (int[] positions)
int	getSentencePosition (int sentenceID)
int	getSentenceEndPosition (int sentenceID)
Phrase	getSentence (int sentenceIndex)
int	size ()
int	getNumSentences ()
int	comparePhrase (int corpusStart, Phrase phrase, int phraseStart, int phraseEnd)
int	comparePhrase (int corpusStart, Phrase phrase)
int	compareSuffixes (int position1, int position2, int maxComparisonLength)
ContiguousPhrase	getPhrase (int startPosition, int endPosition)
Iterable< Integer >	corpusPositions ()

Detailed Description

Corpus is an interface that contains methods for accessing the information within a monolingual corpus.

Author:: Chris Callison-Burch

Since:: 7 February 2005

Version:

LastChangedDate:: 008-07-30 17:15:52 -0400 (Wed, 30 Jul 2008)

Member Function Documentation

int joshua.corpus.Corpus.comparePhrase	(	int	corpusStart,
		Phrase	phrase,
		int	phraseStart,
		int	phraseEnd
	)

Compares the phrase that starts at position start with the subphrase indicated by the start and end points of the phrase.

Parameters:

corpusStart	the point in the corpus where the comparison begins
phrase	the superphrase that the comparsion phrase is drawn from
phraseStart	the point in the phrase where the comparison begins (inclusive)
phraseEnd	the point in the phrase where the comparison ends (exclusive)

Returns:: an int that follows the conventions of java.util.Comparator.compareTo()

int joshua.corpus.Corpus.comparePhrase	(	int	corpusStart,
		Phrase	phrase
	)

Compares the phrase that starts at position start with the phrase passed in. Compares the entire phrase.

Parameters:

corpusStart
phrase

Returns:

int joshua.corpus.Corpus.compareSuffixes	(	int	position1,
		int	position2,
		int	maxComparisonLength
	)

Compares the suffixes starting a positions index1 and index2.

Parameters:

position1	the position in the corpus where the first suffix begins
position2	the position in the corpus where the second suffix begins
maxComparisonLength	a cutoff point to stop the comparison

Returns:: an int that follows the conventions of java.util.Comparator.compareTo()

Iterable<Integer> joshua.corpus.Corpus.corpusPositions ( )

Gets an object capable of iterating over all positions in the corpus, in order.

Returns:: An object capable of iterating over all positions in the corpus, in order.

int joshua.corpus.Corpus.getNumSentences ( )

Gets the number of sentences in the corpus.

Returns:: the number of sentences in the corpus.

ContiguousPhrase joshua.corpus.Corpus.getPhrase	(	int	startPosition,
		int	endPosition
	)

Parameters:

startPosition
endPosition

Returns:

Phrase joshua.corpus.Corpus.getSentence ( int sentenceIndex )

Gets the specified sentence as a phrase.

Parameters:

sentenceIndex Zero-based sentence index

Returns:: the sentence, or null if the specified sentence number doesn't exist

int joshua.corpus.Corpus.getSentenceEndPosition ( int sentenceID )

Gets the exclusive end position of a sentence in the corpus.

Returns:: the position in the corpus one past the last word of the specified sentence. If the sentenceID is outside of the bounds of the sentences, then it returns one past the last position in the corpus.

int joshua.corpus.Corpus.getSentenceIndex ( int position )

Gets the sentence index associated with the specified position in the corpus.

Parameters:

position Index into the corpus

Returns:: the sentence index associated with the specified position in the corpus.

int [] joshua.corpus.Corpus.getSentenceIndices ( int[] positions )

Gets the sentence index of each specified position.

Parameters:

position Index into the corpus

Returns:: array of the sentence indices associated with the specified positions in the corpus.

int joshua.corpus.Corpus.getSentencePosition ( int sentenceID )

Gets the position in the corpus of the first word of the specified sentence. If the sentenceID is outside of the bounds of the sentences, then it returns the last position in the corpus + 1.

Returns:: the position in the corpus of the first word of the specified sentence. If the sentenceID is outside of the bounds of the sentences, then it returns the last position in the corpus + 1.

int joshua.corpus.Corpus.getWordID ( int position )

Returns:: the integer representation of the Word at the specified position in the corpus.

Here is the caller graph for this function:

int joshua.corpus.Corpus.size ( )

Gets the number of words in the corpus.

Returns:: the number of words in the corpus.

Public Member Functions

Detailed Description

Member Function Documentation