Joshua
open source statistical hierarchical phrase-based machine translation system
 All Classes Namespaces Functions Variables Typedefs Enumerations Enumerator Friends
joshua.decoder.JoshuaConfiguration Class Reference
Collaboration diagram for joshua.decoder.JoshuaConfiguration:
[legend]

List of all members.

Classes

class  OOVItem

Public Member Functions

void reset ()
void processCommandLineOptions (String[] options)
void readConfigFile (String configFile) throws IOException
void sanityCheck ()

Static Public Member Functions

static String normalize_key (String text)

Public Attributes

ArrayList< String > tms = new ArrayList<String>()
String weights_file = ""
String default_non_terminal = FormatUtils.markup("X")
String goal_symbol = FormatUtils.markup("GOAL")
ArrayList< OOVItemoovList = null
boolean segment_oovs = false
boolean lattice_decoding = false
boolean amortized_sorting = true
boolean constrain_parse = false
boolean use_pos_labels = false
boolean true_oovs_only = false
boolean filter_grammar = false
int pop_limit = 100
int maxlen = 200
boolean use_unique_nbest = true
boolean include_align_index = false
int topN = 1
String outputFormat = "%i ||| %s ||| %f ||| %c"
int num_parallel_decoders = 1
String hypergraphFilePattern = ""
boolean mark_oovs = false
boolean parse = false
ArrayList< String > features = new ArrayList<String>()
ArrayList< String > weights = new ArrayList<String>()
int server_port = 0
boolean rescoreForest = false
float rescoreForestWeight = 10.0f
String fragmentMapFile = null
boolean fuzzy_matching = false
String search_algorithm = "cky"
int reordering_limit = 8
int num_translation_options = 20
boolean use_dot_chart = true
boolean moses = false
boolean show_weights_and_quit = false
String input_file = null
String n_best_file = null
boolean source_annotations = false
String weight_overwrite = ""

Static Public Attributes

static final String SOFT_SYNTACTIC_CONSTRAINT_DECODING_PROPERTY_NAME = "fuzzy_matching"

Private Attributes

final Logger logger = Logger.getLogger(JoshuaConfiguration.class.getName())

Detailed Description

Configuration file for Joshua decoder.

When adding new features to Joshua, any new configurable parameters should be added to this class.

Author:
Zhifei Li, zhife.nosp@m.i.wo.nosp@m.rk@gm.nosp@m.ail..nosp@m.com
Matt Post post@.nosp@m.cs.j.nosp@m.hu.ed.nosp@m.u

Member Function Documentation

static String joshua.decoder.JoshuaConfiguration.normalize_key ( String  text) [static]

Normalizes parameter names by removing underscores and hyphens and lowercasing. This defines equivalence classes on external use of parameter names, permitting arbitrary_under_scores and camelCasing in paramter names without forcing the user to memorize them all. Here are some examples of equivalent ways to refer to parameter names:

{pop-limit, poplimit, PopLimit, popLimit, pop_lim_it} {lmfile, lm-file, LM-FILE, lm_file}

Here is the caller graph for this function:

To process command-line options, we write them to a file that looks like the config file, and then call readConfigFile() on it. It would be more general to define a class that sits on a stream and knows how to chop it up, but this was quicker to implement.

Here is the call graph for this function:

void joshua.decoder.JoshuaConfiguration.readConfigFile ( String  configFile) throws IOException

PHRASE-BASED PARAMETERS

Here is the call graph for this function:

Here is the caller graph for this function:

This method resets the state of JoshuaConfiguration back to the state after initialization. This is useful when for example making different calls to the decoder within the same java program, which otherwise leads to potential errors due to inconsistent state as a result of loading the configuration multiple times without resetting etc.

This leads to the insight that in fact it may be an even better idea to refactor the code and make JoshuaConfiguration an object that is is created and passed as an argument, rather than a shared static object. This is just a suggestion for the next step.

Here is the call graph for this function:

Checks for invalid variable configurations

Here is the caller graph for this function:


Member Data Documentation

ArrayList<String> joshua.decoder.JoshuaConfiguration.features = new ArrayList<String>()
final Logger joshua.decoder.JoshuaConfiguration.logger = Logger.getLogger(JoshuaConfiguration.class.getName()) [private]
String joshua.decoder.JoshuaConfiguration.outputFormat = "%i ||| %s ||| %f ||| %c"

This string describes the format of each line of output from the decoder (i.e., the translations). The string can include arbitrary text and also variables. The following variables are available:

  • i the 0-indexed sentence number
  • e the source string s the translated sentence
  • S the translated sentence with some basic capitalization and denormalization
  • t the synchronous derivation
  • f the list of feature values (as name=value pairs)
  • c the model cost
  • w the weight vector
  • a the alignments between source and target words (currently unimplemented)
  • d a verbose, many-line version of the derivation
ArrayList<String> joshua.decoder.JoshuaConfiguration.tms = new ArrayList<String>()
ArrayList<String> joshua.decoder.JoshuaConfiguration.weights = new ArrayList<String>()