Let’s talk briefly about the concept of “Robustness” of neural machine translation (NMT) systems. While robustness should be emphasized when building any NMT system, even high-resource languages with plenty of data can still face linguistic challenges.
What does NMT “Robustness” mean?
It simply means that a certain NMT engine can handle a specific linguistic feature, found in the input to be translated. This feature might not be naturally occurring in the training data. Among examples of linguistic features we want our NMT model to be robust while translating are: domain terminology, proper names, number formats, text case, misspellings, code switching (between two languages), and untranslatables such as tags, email addresses, etc.
How can we improve NMT robustness?
The first step to machine translation robustness is defining the issues that your model frequently encounters when translating a certain type of text. This step is underestimated, and in my opinion it is a sign of the maturity of production-level operations.
This goes beyond numerical human evaluation, and moves a step further towards defining specific types of issues. In simple words, human evaluators are asked to mention a clear reason why they think a translation should be ranked as 3 out of 5, for example. At the beginning, they might be provided with common lists, but they should also have the option to add more issues, that can be integrated into that list later. Such explanations should not be vague; they should be precise in a way allowing MT engineers to fix these issues. Problematic words should be marked; sometimes the track-changes feature is used. The main question is: What is the most critical issue in this translation?
How can we improve NMT “Robustness”?
In the findings of the WMT2020 Robustness Shared Task, under the “Common Trends” section, Specia et al. (2020) stated: “Participating systems were trained following a standard recipe, i) using big-transformer models, ii) boosting performance with tagged back-translation, iii) continued training with filtered data and in-domain data (where available), iv) ensembling different models to obtain further improvements.”
In this sense, data augmentation techniques can be helpful, and then you integrate this new data into the NMT system, via combining or fine-tuning.
As training a new system frequently might not be feasible, it is common in some companies to temporarily apply on-the-fly find-replace operations on translations, until the next training is possible. Some researchers also suggest making such on-the-fly handling easier by injecting the training data with certain placeholders, to be able to replace later. To apply this, in a portion of the training data, natural tags (html, xml, long numbers, etc.) are replaced with pseudo-tags (e.g. <t0>, <t1>, <t2>, …). These pseudo-tags should be also added as user_defined_symbols
to the SentencePiece model (cf. SPM options). At inference time, it is easy to define and replace these tags with untraslatables during pre-processing and post-processing steps. On a relevant note, activating the SentencePiece option split_digits
helps with copying longer numbers without intervention, while the option byte_fallback
sometimes helps with irregular characters in the training data.
References
- MQM - Multidimensional Quality Metrics (Lommel et al., 2013)
- Training Neural Machine Translation to Apply Terminology Constraints (Dinu et al., 2019)
- Improving Robustness in Real-World Neural Machine Translation Engines (Gupta et al., 2019)
- How Should Markup Tags Be Translated? (Hanneman and Dinu, 2020)
- Evaluating Robustness to Input Perturbations for Neural Machine Translation (Niu et al., 2020)
- Findings of the WMT 2020 Shared Task on Machine Translation Robustness (Specia et al., 2020)
- Business Critical Errors: A Framework for Adaptive Quality Feedback (Stewart et al., 2022)
- Improve MT for Search with Selected Translation Memory using Search Signals (Zhang 2022)