Joshua
open source statistical hierarchical phrase-based machine translation system
|
Static Public Member Functions | |
static String | processSingleLine (String normalized) |
static String | capitalizeLineFirstLetter (String line) |
static String | joinPunctuationMarks (String line) |
static String | joinHyphen (String line) |
static String | joinContractions (String line) |
static String | capitalizeNameTitleAbbrvs (String line) |
static String | capitalizeI (String line) |
static String | replaceBracketTokens (String line) |
Denormalize a(n English) string in a collection of ways listed below.
<bold>N.B.</bold> These methods all assume that every translation result that will be denormalized has the following format:
static String joshua.decoder.io.DeNormalize.capitalizeI | ( | String | line | ) | [static] |
static String joshua.decoder.io.DeNormalize.capitalizeLineFirstLetter | ( | String | line | ) | [static] |
Capitalize the first letter of a line. This should be the last denormalization step applied to a line.
line | The single-line input string |
static String joshua.decoder.io.DeNormalize.capitalizeNameTitleAbbrvs | ( | String | line | ) | [static] |
Capitalize the first character of the titles of names: Mr Mrs Ms Miss Dr Prof
line | The single-line input string |
static String joshua.decoder.io.DeNormalize.joinContractions | ( | String | line | ) | [static] |
Scanning the line from left-to-right, a contraction suffix preceded by a space will become just the contraction suffix.
I.e., the preceding space will be deleting, joining the prefix to the suffix.
E.g.
wo n't
becomes
won't
line | The single-line input string |
static String joshua.decoder.io.DeNormalize.joinHyphen | ( | String | line | ) | [static] |
Scanning from left-to-right, a hyphen surrounded by a space before and after it will become just the hyphen.
line | The single-line input string |
static String joshua.decoder.io.DeNormalize.joinPunctuationMarks | ( | String | line | ) | [static] |
Scanning from left-to-right, a comma or period preceded by a space will become just the comma/period.
line | The single-line input string |
static String joshua.decoder.io.DeNormalize.processSingleLine | ( | String | normalized | ) | [static] |
Apply all the denormalization methods to the normalized input line.
normalized |
static String joshua.decoder.io.DeNormalize.replaceBracketTokens | ( | String | line | ) | [static] |
Case-insensitively replace all of the character sequences that represent a bracket character.
Keys are token representations of abbreviations of titles for names that capitalize more than just the first letter.
Bracket token sequences: -lrb- -rrb- -lsb- -rsb- -lcb- -rcb-
See http://www.cis.upenn.edu/~treebank/tokenization.html
line | The single-line input string |