Joshua
open source statistical hierarchical phrase-based machine translation system
 All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Friends
joshua.decoder.chart_parser.DotChart Class Reference
Collaboration diagram for joshua.decoder.chart_parser.DotChart:
[legend]

List of all members.

Classes

class  DotCell
class  DotNode

Public Member Functions

DotCell getDotCell (int i, int j)
 DotChart (Lattice< Token > input, Grammar grammar, Chart chart, NonterminalMatcher nonTerminalMatcher, boolean regExpMatching)

Package Functions

void seed ()
void expandDotCell (int i, int j)
void startDotItems (int i, int j)

Private Member Functions

void extendDotItemsWithProvedItems (int i, int k, int j, boolean skipUnary)
ArrayList< TriematchAll (DotNode dotNode, int wordID)
void addDotItem (Trie tnode, int i, int j, ArrayList< SuperNode > antSuperNodesIn, SuperNode curSuperNode, SourcePath srcPath)

Private Attributes

ChartSpan< DotCell > dotcells
Chart dotChart
Grammar pGrammar
final int sentLen
final Lattice< Tokeninput
final boolean regexpMatching
final NonterminalMatcher nonTerminalMatcher

Static Private Attributes

static final Logger logger = Logger.getLogger(DotChart.class.getName())

Detailed Description

The DotChart handles Earley-style implicit binarization of translation rules.

The DotNode object represents the (possibly partial) application of a synchronous rule. The implicit binarization is maintained with a pointer to the Trie node in the grammar, for easy retrieval of the next symbol to be matched. At every span (i,j) of the input sentence, every incomplete DotNode is examined to see whether it (a) needs a terminal and matches against the final terminal of the span or (b) needs a nonterminal and matches against a completed nonterminal in the main chart at some split point (k,j).

Once a rule is completed, it is entered into the DotChart. DotCell objects are used to group completed DotNodes over a span.

There is a separate DotChart for every grammar.

Author:
Zhifei Li, zhife.nosp@m.i.wo.nosp@m.rk@gm.nosp@m.ail..nosp@m.com
Matt Post post@.nosp@m.cs.j.nosp@m.hu.ed.nosp@m.u
Kristy Hollingshead Seitz

Constructor & Destructor Documentation

joshua.decoder.chart_parser.DotChart.DotChart ( Lattice< Token input,
Grammar  grammar,
Chart  chart,
NonterminalMatcher  nonTerminalMatcher,
boolean  regExpMatching 
)

Constructs a new dot chart from a specified input lattice, a translation grammar, and a parse chart.

Parameters:
inputA lattice which represents an input sentence.
grammarA translation grammar.
chartA CKY+ style chart in which completed span entries are stored.

Here is the call graph for this function:


Member Function Documentation

void joshua.decoder.chart_parser.DotChart.addDotItem ( Trie  tnode,
int  i,
int  j,
ArrayList< SuperNode antSuperNodesIn,
SuperNode  curSuperNode,
SourcePath  srcPath 
) [private]

Creates a DotNode and adds it into the DotChart at the correct place. These are (possibly incomplete) rule applications.

Parameters:
tnodethe trie node pointing to the location ("dot") in the grammar trie
i
j
antSuperNodesInthe supernodes representing the rule's tail nodes
curSuperNodethe lefthand side of the rule being created
srcPaththe path taken through the input lattice

Here is the call graph for this function:

Here is the caller graph for this function:

void joshua.decoder.chart_parser.DotChart.expandDotCell ( int  i,
int  j 
) [package]

This function computes all possible expansions of all rules over the provided span (i,j). By expansions, we mean the moving of the dot forward (from left to right) over a nonterminal or terminal symbol on the rule's source side.

There are two kinds of expansions:

  1. Expansion over a nonterminal symbol. For this kind of expansion, a rule has a dot immediately prior to a source-side nonterminal. The main Chart is consulted to see whether there exists a completed nonterminal with the same label. If so, the dot is advanced.

    Discovering nonterminal expansions is a matter of enumerating all split points k such that i < k and k < j. The nonterminal symbol must exist in the main Chart over (k,j).

  2. Expansion over a terminal symbol. In this case, expansion is a simple matter of determing whether the input symbol at position j (the end of the span) matches the next symbol in the rule. This is equivalent to choosing a split point k = j - 1 and looking for terminal symbols over (k,j). Note that phrases in the input rule are handled one-by-one as we consider longer spans.

Here is the call graph for this function:

Here is the caller graph for this function:

void joshua.decoder.chart_parser.DotChart.extendDotItemsWithProvedItems ( int  i,
int  k,
int  j,
boolean  skipUnary 
) [private]

Attempt to combine an item in the dot chart with an item in the main chart to create a new item in the dot chart. The DotChart item is a DotNode begun at position i with the dot currently at position k, that is, a partially-applied rule.

In other words, this method looks for (proved) theorems or axioms in the completed chart that may apply and extend the dot position.

Parameters:
iStart index of a dot chart item
kEnd index of a dot chart item; start index of a completed chart item
jEnd index of a completed chart item
skipUnaryif true, don't extend unary rules

Here is the call graph for this function:

Here is the caller graph for this function:

DotCell joshua.decoder.chart_parser.DotChart.getDotCell ( int  i,
int  j 
)
ArrayList<Trie> joshua.decoder.chart_parser.DotChart.matchAll ( DotNode  dotNode,
int  wordID 
) [private]

Here is the call graph for this function:

Here is the caller graph for this function:

Add initial dot items: dot-items pointer to the root of the grammar trie.

Here is the call graph for this function:

Here is the caller graph for this function:

void joshua.decoder.chart_parser.DotChart.startDotItems ( int  i,
int  j 
) [package]

note: (i,j) is a non-terminal, this cannot be a cn-side terminal, which have been handled in case2 of dotchart.expand_cell add dotitems that start with the complete super-items in cell(i,j)

Here is the call graph for this function:

Here is the caller graph for this function:


Member Data Documentation

ChartSpan<DotCell> joshua.decoder.chart_parser.DotChart.dotcells [private]

Two-dimensional chart of cells. Some cells might be null. This could definitely be represented more efficiently, since only the upper half of this triangle is every used.

CKY+ style parse chart in which completed span entries are stored.

final Logger joshua.decoder.chart_parser.DotChart.logger = Logger.getLogger(DotChart.class.getName()) [static, private]

Translation grammar which contains the translation rules.