Static Public Member Functions
static String	processSingleLine (String normalized)
static String	capitalizeLineFirstLetter (String line)
static String	joinPunctuationMarks (String line)
static String	joinHyphen (String line)
static String	joinContractions (String line)
static String	capitalizeNameTitleAbbrvs (String line)
static String	capitalizeI (String line)
static String	replaceBracketTokens (String line)

Detailed Description

Denormalize a(n English) string in a collection of ways listed below.

Capitalize the first character in the string
Detokenize
- Delete whitespace in front of periods and commas
- Join contractions
- Capitalize name titles (Mr Ms Miss Dr etc.)
- TODO: Handle surrounding characters ([{<"''">}])
- TODO: Join multi-period abbreviations (e.g. M.Phil. i.e.)
- TODO: Handle ambiguities like "st.", which can be an abbreviation for both "Saint" and "street"
- TODO: Capitalize both the title and the name of a person, e.g. Mr. Morton (named entities should be demarcated).

<bold>N.B.</bold> These methods all assume that every translation result that will be denormalized has the following format:

There is only one space between every pair of tokens
There is no whitespace before the first token
There is no whitespace after the final token
Standard spaces are the only type of whitespace

Member Function Documentation

static String joshua.decoder.io.DeNormalize.capitalizeI ( String line ) [static]

static String joshua.decoder.io.DeNormalize.capitalizeLineFirstLetter ( String line ) [static]

Capitalize the first letter of a line. This should be the last denormalization step applied to a line.

Parameters:

line	The single-line input string

Returns:: The input string modified as described above

Here is the caller graph for this function:

static String joshua.decoder.io.DeNormalize.capitalizeNameTitleAbbrvs ( String line ) [static]

Capitalize the first character of the titles of names: Mr Mrs Ms Miss Dr Prof

Parameters:

line	The single-line input string

Returns:: The input string modified as described above

Here is the caller graph for this function:

static String joshua.decoder.io.DeNormalize.joinContractions ( String line ) [static]

Scanning the line from left-to-right, a contraction suffix preceded by a space will become just the contraction suffix.

I.e., the preceding space will be deleting, joining the prefix to the suffix.

E.g.

wo n't

becomes

won't

Parameters:

line	The single-line input string

Returns:: The input string modified as described above

Here is the caller graph for this function:

static String joshua.decoder.io.DeNormalize.joinHyphen ( String line ) [static]

Scanning from left-to-right, a hyphen surrounded by a space before and after it will become just the hyphen.

Parameters:

line	The single-line input string

Returns:: The input string modified as described above

Here is the caller graph for this function:

static String joshua.decoder.io.DeNormalize.joinPunctuationMarks ( String line ) [static]

Scanning from left-to-right, a comma or period preceded by a space will become just the comma/period.

Parameters:

line	The single-line input string

Returns:: The input string modified as described above

Here is the caller graph for this function:

static String joshua.decoder.io.DeNormalize.processSingleLine ( String normalized ) [static]

Apply all the denormalization methods to the normalized input line.

Parameters:

normalized

Returns:

Here is the call graph for this function:

Here is the caller graph for this function:

static String joshua.decoder.io.DeNormalize.replaceBracketTokens ( String line ) [static]

Case-insensitively replace all of the character sequences that represent a bracket character.

Keys are token representations of abbreviations of titles for names that capitalize more than just the first letter.
Bracket token sequences: -lrb- -rrb- -lsb- -rsb- -lcb- -rcb-

See http://www.cis.upenn.edu/~treebank/tokenization.html

Parameters:

line	The single-line input string

Returns:: The input string modified as described above

Here is the caller graph for this function:

Static Public Member Functions

Detailed Description

Member Function Documentation