Joshua
open source statistical hierarchical phrase-based machine translation system
|
Welcome to the developer documentation for the Joshua hierarchical statistical machine translation system. Joshua can be used in two ways: by downloading packaged releases from the main web site and by installing directly from the hosted source code Github. The former version is intended for people who just wish to use Joshua, and the latter for people who wish to contribute to the codebase. This page is for developers; if you only wish to use Joshua (including, for example, its prebuilt language packs, then you probably want the end-user documentation.
The Joshua source code repository is located at http://github.com/joshua-decoder/joshua.
You'll need to set your $JAVA_HOME
to point to your Java 7+ JDK. Typical values are
# OS X export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_71.jdk/Contents/Home # Redhat / CentOS export JAVA_HOME=/usr/java/default
You can download Joshua and compile it with the following commands.
git clone https://github.com/joshua-decoder/joshua.git cd joshua export JOSHUA=$(pwd) ant
The main ant build target target downloads a number of dependencies using Ivy, along with the Thrax submodule, and then compiles everything, include support tools like KenLM and GIZA++ (which are included).
Moses is used for (1) tuning with kbmira and (2) as an option for extracting GHKM grammars. If you wish to do either of these, you will need to download and install Moses 3+. A simplified approach
We recommend the use of Eclipse for Joshua development. When setting up Eclipse, please import the Eclipse settings under $JOSHUA/.settings
.
If you run into troubles, you can post to the Joshua Developers' Forum. You might also be interested in the support forum for Joshua users.
If you find bugs in Joshua, the best thing is to fix them yourself and submit a pull request. Alternately, you can file bug reports, feature requests, and other issues on the Joshua issues page.
Here's an example of building a release versioned "2012-07-18". It will be placed at release/joshua-2012-07-18.tgz
.
export JOSHUA_VERSION=2012-07-18 export HADOOP=/path/to/hadoop export HADOOP_CONF_DIR=/path/to/hadoop/config export HADOOP_VERSION="0.20.203.0" export AWS_SDK=/path/to/aws export AWS_VERSION="1.1.3" git clone https://github.com/joshua-decoder/joshua.git cd joshua export JOSHUA=`pwd` git submodule update --init cd thrax ant cd $JOSHUA ant release
Dependencies such as JAR archives are automatically downloaded by the Apache Ivy dependency management tool, which is designed to interact with the ant
build tool.
To add a new dependency to the list of automatically downloaded archive libraries, follow these steps:
<dependencies>
in ivy.xml
. E.g. for asm-3.1.jar, the line that would be added is: <dependency org="asm" name="asm" rev="3.1"/>
$JOSHUA/ivysettings.xml
file.