Moses is a powerful open source toolkit for Machine Translation. It is widely used in the academic community for research in Machine Translation. However, adding to or modifying it can seem like a daunting task at first. Fortunately, it is a well-engineered project with many active contributors that make modifying it easier than you would expect.

Here, I'm going to talk about how to add a feature to Moses. Features in Moses are just like features in any other statistical or machine learning method. Features are generally given a weight during a tuning process and the weighted scores from a feature are used during decoding. Features in Machine Translation (MT) are often related to natural language, such as a Language Model probability or a Part-Of-Speech tag. However, you can add any feature you want to Moses and this post should show you how. At a basic overview, our features should return scores. All we are doing is writing code that gives a score based on our feature. This code is executed in various ''Evaluate...'' functions that I will describe later. However, this is all there is to it – we are simply only adding a little code to score a feature. The rest of the code you need is already provided to you.

To get started, we clone the project from GitHub (mosesdecoder). It is a C++ project, but it also makes large use of Perl and basic scripting. There is a developers mailing list you can get added to for questions. On the moses website, you can find the basic instructions for adding a feature function, but this is my take and what I found to be useful. If you are a more visual learner, Hieu Hoang has a nice video tutorial on this exact topic as well. Once you have cloned the project, navigate to the directory for Feature Functions (mosesdecoder/moses/FF). Here you can see many of the features that have been added over time. To add a new feature, there are a couple of skeleton files that have all the basic backbones that you will need to add a feature (SkeletonStatefulFF.cpp/h and SkeletonStateless.cpp/h). We first need to decide whether we are making a stateful or stateless feature function.

Stateful feature functions make use of information from earlier in the sentence you are currently translating. Stateless feature functions only use the information from the current word. The easiest example of a stateful feature function is a language model, as it needs to know the previous n words. A trivial stateless feature function would be whether or not the current word is capitalized, which only relies on that current word. So after deciding if your feature is stateful or stateless, copy the corresponding skeleton header and body files to some new files named after your feature.

First, we modify the header file. If it is a stateful feature function, we begin by creating a class that is the state of our feature. This inherits from FFState. In this class, we define the properties of the state our feature is in. It is here that we store information that can be used by the next state in our decoder. Next, for both stateless and stateful, we create a class for the Feature Function itself. This inherits from either the StatelessFeatureFunction or StatefulFeatureFunction. This will define the behavior when the feature is applied. The main difference between the stateful and the stateless feature function is that the stateful has a pointer to the previous state (hence it has context, i.e. state). When I'm modifying these files, I generally just take a shortcut of find-and-replace the word ''Skeleton'' with the name of my new feature – however, this is risky as you are not actually thinking too deeply about the behavior of the program.

Next, we modify the cpp file. If you are using a stateless feature function, you are most likely just going to modify the function ''EvaluateInIsolation''. If you use some of the information from the source language, you will also need to modify the ''EvaluateWithSourceContext'' Function. Regardless, the main thing you are doing with this modification is giving it a score. You need to define how to score your feature. For instance, if you care if a word is capitalized, you could give it a score of 1.0 or 0.0. Tuning will decide how important this feature is and give a weight to your feature. You can define either sparse or dense scores. Generally, for dense scores, you create a vector of the scores. You allow the decoder to access the scores by calling the function ''scoreBreakdown.PlusEquals(this, [newScores])'' where ''[newScores]'' are the scores your feature has assigned.

For the stateful feature function, we can still have ''EvaluateInIsolation'' and ''EvaluateWithSourceContext'', however they are often not needed and can be deleted. Here, the main functions we are interested in are ''EvaluateWhenApplied''. This is overloaded for phrase based decoding and hierarchical (chart) decoding. It is a best practice to implement for both. However, if you do not, you should have some basic code to abort and return null so that you can compile and won't segfault. The main difference with the stateful feature function is that you have a pointer to the previous state. You can save information about the state. This is done in the state class that you defined in the header file. For instance, I often have functions for getting and setting the previous scores. With the pointer provided in the stateful feature function, I can get these scores. The other big difference with stateful feature is that you need to define a ''Compare'' function. This is used by the decoder for recombining hypotheses. If two states are the same, they can be combined for more efficient decoding. Compare just lets you know if for your given feature, two states can be considered the same (i.e. in a language model if the previous n words are identical.

Finally, there is a function in both stateless and stateful feature functions for defining parameters that your feature may take. These are the parameters given in your moses.ini file for your feature. This is relatively easy to modify as well as it is a simple key-value store. To modify this, simply change the name of the string ''arg'' to whatever your parameter name is. In the parameter function, save that value to variable you have defined in your class. For instance, a language model may keep track of the order (n) of your language model which is specified by the user.

After all of this, if you have been working in the same directory, boost and bjam should automatically recognize your new files. You can compile with bjam as usual and it should be built as well. You can add the name of your feature to a moses.ini file and it should run – ideally improving our knowledge of machine translation.

Hopefully, this was helpful. There are a few other resources out there describing this in detail, but this was just my attempt to jot down what I've learned recently while modifying moses. To recap, it is relatively simple to add a feature to moses. You just decide if it is stateful or stateless and copy some barebones skeleton code. You need to define behavior when your feature is applied, but the majority of the code is already there. This is done through various ''Evaluate...'' functions. These should give a score based on your feature. This is the meat of your code – the reason your feature exists. Lastly, for a stateful feature, you should define a compare operation so that the decoder can decide whether or not to recombine hypotheses.