Joshua
open source statistical hierarchical phrase-based machine translation system
|
Classes | |
class | OOVItem |
Public Member Functions | |
void | reset () |
void | processCommandLineOptions (String[] options) |
void | readConfigFile (String configFile) throws IOException |
void | sanityCheck () |
Static Public Member Functions | |
static String | normalize_key (String text) |
Public Attributes | |
ArrayList< String > | tms = new ArrayList<String>() |
String | weights_file = "" |
String | default_non_terminal = FormatUtils.markup("X") |
String | goal_symbol = FormatUtils.markup("GOAL") |
ArrayList< OOVItem > | oovList = null |
boolean | segment_oovs = false |
boolean | lattice_decoding = false |
boolean | amortized_sorting = true |
boolean | constrain_parse = false |
boolean | use_pos_labels = false |
boolean | true_oovs_only = false |
boolean | filter_grammar = false |
int | pop_limit = 100 |
int | maxlen = 200 |
boolean | use_unique_nbest = true |
boolean | include_align_index = false |
int | topN = 1 |
String | outputFormat = "%i ||| %s ||| %f ||| %c" |
int | num_parallel_decoders = 1 |
String | hypergraphFilePattern = "" |
boolean | mark_oovs = false |
boolean | parse = false |
ArrayList< String > | features = new ArrayList<String>() |
ArrayList< String > | weights = new ArrayList<String>() |
int | server_port = 0 |
boolean | rescoreForest = false |
float | rescoreForestWeight = 10.0f |
String | fragmentMapFile = null |
boolean | fuzzy_matching = false |
String | search_algorithm = "cky" |
int | reordering_limit = 8 |
int | num_translation_options = 20 |
boolean | use_dot_chart = true |
boolean | moses = false |
boolean | show_weights_and_quit = false |
String | input_file = null |
String | n_best_file = null |
boolean | source_annotations = false |
String | weight_overwrite = "" |
Static Public Attributes | |
static final String | SOFT_SYNTACTIC_CONSTRAINT_DECODING_PROPERTY_NAME = "fuzzy_matching" |
Private Attributes | |
final Logger | logger = Logger.getLogger(JoshuaConfiguration.class.getName()) |
Configuration file for Joshua decoder.
When adding new features to Joshua, any new configurable parameters should be added to this class.
static String joshua.decoder.JoshuaConfiguration.normalize_key | ( | String | text | ) | [static] |
Normalizes parameter names by removing underscores and hyphens and lowercasing. This defines equivalence classes on external use of parameter names, permitting arbitrary_under_scores and camelCasing in paramter names without forcing the user to memorize them all. Here are some examples of equivalent ways to refer to parameter names:
{pop-limit, poplimit, PopLimit, popLimit, pop_lim_it} {lmfile, lm-file, LM-FILE, lm_file}
void joshua.decoder.JoshuaConfiguration.processCommandLineOptions | ( | String[] | options | ) |
To process command-line options, we write them to a file that looks like the config file, and then call readConfigFile() on it. It would be more general to define a class that sits on a stream and knows how to chop it up, but this was quicker to implement.
void joshua.decoder.JoshuaConfiguration.readConfigFile | ( | String | configFile | ) | throws IOException |
PHRASE-BASED PARAMETERS
This method resets the state of JoshuaConfiguration back to the state after initialization. This is useful when for example making different calls to the decoder within the same java program, which otherwise leads to potential errors due to inconsistent state as a result of loading the configuration multiple times without resetting etc.
This leads to the insight that in fact it may be an even better idea to refactor the code and make JoshuaConfiguration an object that is is created and passed as an argument, rather than a shared static object. This is just a suggestion for the next step.
Checks for invalid variable configurations
boolean joshua.decoder.JoshuaConfiguration.amortized_sorting = true |
boolean joshua.decoder.JoshuaConfiguration.constrain_parse = false |
ArrayList<String> joshua.decoder.JoshuaConfiguration.features = new ArrayList<String>() |
boolean joshua.decoder.JoshuaConfiguration.filter_grammar = false |
String joshua.decoder.JoshuaConfiguration.fragmentMapFile = null |
boolean joshua.decoder.JoshuaConfiguration.fuzzy_matching = false |
String joshua.decoder.JoshuaConfiguration.goal_symbol = FormatUtils.markup("GOAL") |
boolean joshua.decoder.JoshuaConfiguration.include_align_index = false |
String joshua.decoder.JoshuaConfiguration.input_file = null |
boolean joshua.decoder.JoshuaConfiguration.lattice_decoding = false |
final Logger joshua.decoder.JoshuaConfiguration.logger = Logger.getLogger(JoshuaConfiguration.class.getName()) [private] |
boolean joshua.decoder.JoshuaConfiguration.mark_oovs = false |
boolean joshua.decoder.JoshuaConfiguration.moses = false |
String joshua.decoder.JoshuaConfiguration.n_best_file = null |
ArrayList<OOVItem> joshua.decoder.JoshuaConfiguration.oovList = null |
String joshua.decoder.JoshuaConfiguration.outputFormat = "%i ||| %s ||| %f ||| %c" |
This string describes the format of each line of output from the decoder (i.e., the translations). The string can include arbitrary text and also variables. The following variables are available:
boolean joshua.decoder.JoshuaConfiguration.parse = false |
boolean joshua.decoder.JoshuaConfiguration.rescoreForest = false |
float joshua.decoder.JoshuaConfiguration.rescoreForestWeight = 10.0f |
String joshua.decoder.JoshuaConfiguration.search_algorithm = "cky" |
boolean joshua.decoder.JoshuaConfiguration.segment_oovs = false |
boolean joshua.decoder.JoshuaConfiguration.show_weights_and_quit = false |
final String joshua.decoder.JoshuaConfiguration.SOFT_SYNTACTIC_CONSTRAINT_DECODING_PROPERTY_NAME = "fuzzy_matching" [static] |
boolean joshua.decoder.JoshuaConfiguration.source_annotations = false |
ArrayList<String> joshua.decoder.JoshuaConfiguration.tms = new ArrayList<String>() |
boolean joshua.decoder.JoshuaConfiguration.true_oovs_only = false |
boolean joshua.decoder.JoshuaConfiguration.use_dot_chart = true |
boolean joshua.decoder.JoshuaConfiguration.use_pos_labels = false |
boolean joshua.decoder.JoshuaConfiguration.use_unique_nbest = true |
ArrayList<String> joshua.decoder.JoshuaConfiguration.weights = new ArrayList<String>() |
String joshua.decoder.JoshuaConfiguration.weights_file = "" |