As using Bash commands is inevitable if you work on NLP and MT tasks, I thought it would be useful to list the majority of commands I learnt to use on a daily base, thanks to practice, searching, and helpful colleagues I met over years. Obviously, this is not an exclusive list; however, I hope it includes most of the one-line Bash commands you would need. Please note the majority of these commands have been mainly tested on Linux.Continue reading “Bash Commands for NLP Engineers”
Word Error Rate (WER) computes the minimum-edit distance between the human-generated sentence and the machine-predicted sentence. In other tutorials, I explained how to use Python to compute BLEU and Edit Distance, and this tutorial, I am going to explain how to calculate the WER score.Continue reading “WER Score for Machine Translation”
So I have published the latest versions of my in-domain OpenNMT-py models. These are in-domain models Neural Machine Translation (NMT) models, which means they are trained and tested only on specialised data, and they can act better than generic models for the specified “domain”. In other words, in-domain models can observe terminology and generate translations that are much more in line with the specialised context.Continue reading “Pre-trained Neural Machine Translation (NMT) Models”
In this tutorial, I am going to explain how I compute the BLEU score for the Machine Translation output using Python.
BLEU is simply a measure for evaluating the quality of your Machine Translation system. It does not really matter whether your MT target is from a high-level framework like OpenNMT or Marian, or from a lower-level one like TensorFlow or PyTorch. It does not also matter whether it is a Neural Machine Translation system or a Statistical Machine Translation tool like Moses.
So let’s see the steps I follow to calculate the BLEU score.Continue reading “Computing BLEU Score for Machine Translation”
So what is Domain Adaptation? Let’s imagine this scenario. You have a new Machine Translation project, and you feel excited; however, you have realized that your training corpus is too small. Now, you see that if you use such limited corpus, your machine translation model will be very poor, with many out-of-vocabulary words and maybe unidiomatic translations.
So, what is the solution? Should you just give up? Fortunately, Domain Adaptation can be a good solution to this issue.
Do you have another corpus that is big enough? Does this big corpus share some characteristics with the small corpus, like language pair and/or the major subject?
In this case, you can use one of Domain Adaptation techniques to make use of both the big generic corpus and the small specialized corpus. While the big generic corpus will help avoid out-of-vocabulary words and unidiomatic translations, the small specialized corpus will help force terminology and vocabulary required for your current Machine Translation project.Continue reading “Domain Adaptation Techniques for Low-Resource Domains, Institutions and Languages”
Domain Adaptation is useful for specializing current generic Machine Translation models, mainly when the specialized corpus is too limited to train a separate model. Furthermore, Domain Adaptation techniques can be handy for low-resource languages that share vocabulary and structure with other rich-resource family languages.
As part of my Machine Translation research, I managed to achieve successful results in retraining Neural Machine Translation models for the purpose of Domain Adaptation using OpenNMT-py (the PyTorch version of OpenNMT). In this article, I am elaborating on the path I took and the achieved outcomes; hopefully, this will be useful for others.Continue reading “Domain Adaptation in Neural Machine Translation”
In this article, I am exploring several GPU options I have either used myself or considered, for training my Neural Machine Translation models. As GPU machines are known for being expensive, the main factor I am concentrating on here is “cost”, which can be determined not only by machine rates, but also in light of other considerations such as technical specifications, and long-term vs. short-term commitments.Continue reading “GPU Options for Neural Machine Translation”
The question was: if I want to have a stand-alone version of OpenNMT that does not require any manual preparations or installations on the target machine, and does not connect to the Internet for Machine Translation, what are my options to achieve this?
Note that my current implementation depends on an OpenNMT Localhost REST API which is perfect for most cases, but not for the case when a client wants to be able to move the whole thing as one package without any prior (manual) preparation or installation of dependencies.
After some research, I finally managed to achieve progress using Python Tkinter, PyInstaller, NSIS and the PyTorch version of OpenNMT.Continue reading “Stand-alone Executable Translator for OpenNMT”
You might want to build a GUI for your Machine Translation model to show to a client or to be able to translate sentences online. I have created a simple web interface for OpenNMT which depends on Python Flask and Flask-PageDown libraries.Continue reading “Machine Translation Web Interface for OpenNMT”
I have implemented the translation option
-phrase_table into the OpenNMT-py version, and today it has been merged into the repository. The
-phrase_table was already documented from the Lua version but was not implemented in the PyTorch version.