Joshua
open source statistical hierarchical phrase-based machine translation system
 All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Friends
joshua.decoder.ff.tm.SentenceFilteredGrammar Class Reference
Inheritance diagram for joshua.decoder.ff.tm.SentenceFilteredGrammar:
[legend]
Collaboration diagram for joshua.decoder.ff.tm.SentenceFilteredGrammar:
[legend]

List of all members.

Classes

class  SentenceFilteredTrie

Public Member Functions

Trie getTrieRoot ()
boolean hasRuleForSpan (int startIndex, int endIndex, int pathLength)
int getNumRules ()
int getNumRules (Trie node)
Rule constructManualRule (int lhs, int[] sourceWords, int[] targetWords, float[] scores, int aritity)
boolean isRegexpGrammar ()

Package Functions

 SentenceFilteredGrammar (AbstractGrammar baseGrammar, Sentence sentence)

Private Member Functions

SentenceFilteredTrie filter (Trie unfilteredTrieRoot)
void filter (int i, SentenceFilteredTrie trieNode, boolean lastWasNT)
SentenceFilteredTrie filter_regexp (Trie unfilteredTrie)
boolean matchesSentence (Trie childTrie)

Private Attributes

AbstractGrammar baseGrammar
SentenceFilteredTrie filteredTrie
int[] tokens
Sentence sentence

Detailed Description

This class implements dynamic sentence-level filtering. This is accomplished with a parallel trie, a subset of the original trie, that only contains trie paths that are reachable from traversals of the current sentence.

Author:
Matt Post post@.nosp@m.cs.j.nosp@m.hu.ed.nosp@m.u

Constructor & Destructor Documentation

Construct a new sentence-filtered grammar. The main work is done in the enclosed trie (obtained from the base grammar, which contains the complete grammar).

Parameters:
baseGrammar
sentence

Here is the call graph for this function:


Member Function Documentation

Rule joshua.decoder.ff.tm.SentenceFilteredGrammar.constructManualRule ( int  lhs,
int[]  sourceWords,
int[]  targetWords,
float[]  scores,
int  arity 
)

This is used to construct a manual rule supported from outside the grammar, but the owner should be the same as the grammar. Rule ID will the same as OOVRuleId, and no lattice cost

Reimplemented from joshua.decoder.ff.tm.hash_based.MemoryBasedBatchGrammar.

What is the algorithm?

Take the first word of the sentence, and start at the root of the trie. There are two things to consider: (a) word matches and (b) nonterminal matches.

For a word match, simply follow that arc along the trie. We create a parallel arc in our filtered grammar to represent it. Each arc in the filtered trie knows about its corresponding/underlying node in the unfiltered grammar trie.

A nonterminal is always permitted to match. The question then is how much of the input sentence we imagine it consumed. The answer is that it could have been any amount. So the recursive call has to be a set of calls, one each to the next trie node with different lengths of the sentence remaining.

A problem occurs when we have multiple sequential nonterminals. For scope-3 grammars, there can be four sequential nonterminals (in the case when they are grounded by terminals on both ends of the nonterminal chain). We'd like to avoid looking at all possible ways to split up the subsequence, because with respect to filtering rules, they are all the same.

We accomplish this with the following restriction: for purposes of grammar filtering, only the first in a sequence of nonterminal traversals can consume more than one word. Each of the subsequent ones would have to consume just one word. We then just have to record in the recursive call whether the last traversal was a nonterminal or not.

Returns:
the root of the filtered trie

Here is the caller graph for this function:

void joshua.decoder.ff.tm.SentenceFilteredGrammar.filter ( int  i,
SentenceFilteredTrie  trieNode,
boolean  lastWasNT 
) [private]

Matches rules against the sentence. Intelligently handles chains of sequential nonterminals. Marks arcs that are traversable for this sentence.

Parameters:
ithe position in the sentence to start matching
triethe trie node to match against
lastWasNTtrue if the match that brought us here was against a nonterminal

Here is the call graph for this function:

Alternate filter that uses regular expressions, walking the grammar trie and matching the source side of each rule collection against the input sentence. Failed matches are discarded, and trie nodes extending from that position need not be explored.

Returns:
the root of the filtered trie if any rules were retained, otherwise null

Here is the call graph for this function:

Gets the number of rules stored in the grammar.

Returns:
the number of rules stored in the grammar

Reimplemented from joshua.decoder.ff.tm.hash_based.MemoryBasedBatchGrammar.

Here is the call graph for this function:

Here is the caller graph for this function:

A convenience function that counts the number of rules in a grammar's trie.

Parameters:
node
Returns:

Here is the call graph for this function:

Gets the root of the Trie backing this grammar.

Note: This method should run as a small constant-time function.

Returns:
the root of the Trie backing this grammar

Reimplemented from joshua.decoder.ff.tm.hash_based.MemoryBasedBatchGrammar.

Here is the caller graph for this function:

boolean joshua.decoder.ff.tm.SentenceFilteredGrammar.hasRuleForSpan ( int  startIndex,
int  endIndex,
int  pathLength 
)

This function is poorly named: it doesn't mean whether a rule exists in the grammar for the current span, but whether the grammar is permitted to apply rules to the current span (a grammar-level parameter). As such we can just chain to the underlying grammar.

Reimplemented from joshua.decoder.ff.tm.hash_based.MemoryBasedBatchGrammar.

Here is the call graph for this function:

This returns true if the grammar contains rules that are regular expressions, possibly matching many different inputs.

Returns:
true if the grammar's rules may contain regular expressions.

Reimplemented from joshua.decoder.ff.tm.hash_based.MemoryBasedBatchGrammar.

Here is the call graph for this function:

Here is the caller graph for this function:


Member Data Documentation