Joshua
open source statistical hierarchical phrase-based machine translation system
|
Classes | |
class | DotCell |
class | DotNode |
Public Member Functions | |
DotCell | getDotCell (int i, int j) |
DotChart (Lattice< Token > input, Grammar grammar, Chart chart, NonterminalMatcher nonTerminalMatcher, boolean regExpMatching) | |
Package Functions | |
void | seed () |
void | expandDotCell (int i, int j) |
void | startDotItems (int i, int j) |
Private Member Functions | |
void | extendDotItemsWithProvedItems (int i, int k, int j, boolean skipUnary) |
ArrayList< Trie > | matchAll (DotNode dotNode, int wordID) |
void | addDotItem (Trie tnode, int i, int j, ArrayList< SuperNode > antSuperNodesIn, SuperNode curSuperNode, SourcePath srcPath) |
Private Attributes | |
ChartSpan< DotCell > | dotcells |
Chart | dotChart |
Grammar | pGrammar |
final int | sentLen |
final Lattice< Token > | input |
final boolean | regexpMatching |
final NonterminalMatcher | nonTerminalMatcher |
Static Private Attributes | |
static final Logger | logger = Logger.getLogger(DotChart.class.getName()) |
The DotChart handles Earley-style implicit binarization of translation rules.
The DotNode object represents the (possibly partial) application of a synchronous rule. The implicit binarization is maintained with a pointer to the Trie node in the grammar, for easy retrieval of the next symbol to be matched. At every span (i,j) of the input sentence, every incomplete DotNode is examined to see whether it (a) needs a terminal and matches against the final terminal of the span or (b) needs a nonterminal and matches against a completed nonterminal in the main chart at some split point (k,j).
Once a rule is completed, it is entered into the DotChart. DotCell objects are used to group completed DotNodes over a span.
There is a separate DotChart for every grammar.
joshua.decoder.chart_parser.DotChart.DotChart | ( | Lattice< Token > | input, |
Grammar | grammar, | ||
Chart | chart, | ||
NonterminalMatcher | nonTerminalMatcher, | ||
boolean | regExpMatching | ||
) |
Constructs a new dot chart from a specified input lattice, a translation grammar, and a parse chart.
input | A lattice which represents an input sentence. |
grammar | A translation grammar. |
chart | A CKY+ style chart in which completed span entries are stored. |
void joshua.decoder.chart_parser.DotChart.addDotItem | ( | Trie | tnode, |
int | i, | ||
int | j, | ||
ArrayList< SuperNode > | antSuperNodesIn, | ||
SuperNode | curSuperNode, | ||
SourcePath | srcPath | ||
) | [private] |
Creates a DotNode and adds it into the DotChart at the correct place. These are (possibly incomplete) rule applications.
tnode | the trie node pointing to the location ("dot") in the grammar trie |
i | |
j | |
antSuperNodesIn | the supernodes representing the rule's tail nodes |
curSuperNode | the lefthand side of the rule being created |
srcPath | the path taken through the input lattice |
void joshua.decoder.chart_parser.DotChart.expandDotCell | ( | int | i, |
int | j | ||
) | [package] |
This function computes all possible expansions of all rules over the provided span (i,j). By expansions, we mean the moving of the dot forward (from left to right) over a nonterminal or terminal symbol on the rule's source side.
There are two kinds of expansions:
Expansion over a nonterminal symbol. For this kind of expansion, a rule has a dot immediately prior to a source-side nonterminal. The main Chart is consulted to see whether there exists a completed nonterminal with the same label. If so, the dot is advanced.
Discovering nonterminal expansions is a matter of enumerating all split points k such that i < k and k < j. The nonterminal symbol must exist in the main Chart over (k,j).
void joshua.decoder.chart_parser.DotChart.extendDotItemsWithProvedItems | ( | int | i, |
int | k, | ||
int | j, | ||
boolean | skipUnary | ||
) | [private] |
Attempt to combine an item in the dot chart with an item in the main chart to create a new item in the dot chart. The DotChart item is a DotNode begun at position i with the dot currently at position k, that is, a partially-applied rule.
In other words, this method looks for (proved) theorems or axioms in the completed chart that may apply and extend the dot position.
i | Start index of a dot chart item |
k | End index of a dot chart item; start index of a completed chart item |
j | End index of a completed chart item |
skipUnary | if true, don't extend unary rules |
DotCell joshua.decoder.chart_parser.DotChart.getDotCell | ( | int | i, |
int | j | ||
) |
ArrayList<Trie> joshua.decoder.chart_parser.DotChart.matchAll | ( | DotNode | dotNode, |
int | wordID | ||
) | [private] |
void joshua.decoder.chart_parser.DotChart.seed | ( | ) | [package] |
Add initial dot items: dot-items pointer to the root of the grammar trie.
void joshua.decoder.chart_parser.DotChart.startDotItems | ( | int | i, |
int | j | ||
) | [package] |
note: (i,j) is a non-terminal, this cannot be a cn-side terminal, which have been handled in case2 of dotchart.expand_cell add dotitems that start with the complete super-items in cell(i,j)
ChartSpan<DotCell> joshua.decoder.chart_parser.DotChart.dotcells [private] |
Two-dimensional chart of cells. Some cells might be null. This could definitely be represented more efficiently, since only the upper half of this triangle is every used.
CKY+ style parse chart in which completed span entries are stored.
final Lattice<Token> joshua.decoder.chart_parser.DotChart.input [private] |
final Logger joshua.decoder.chart_parser.DotChart.logger = Logger.getLogger(DotChart.class.getName()) [static, private] |
Translation grammar which contains the translation rules.
final boolean joshua.decoder.chart_parser.DotChart.regexpMatching [private] |
final int joshua.decoder.chart_parser.DotChart.sentLen [private] |