Domain Adaptation Techniques for Low-Resource Domains, Institutions and Languages

Domain Adaptation

So what is Domain Adaptation? Let’s imagine this scenario. You have a new Machine Translation project, and you feel excited; however, you have realized that your training corpus is too small. Now, you see that if you use such limited corpus, your machine translation model will be very poor, with many out-of-vocabulary words and maybe unidiomatic translations.

So, what is the solution? Should you just give up? Fortunately, Domain Adaptation can be a good solution to this issue.

Do you have another corpus that is big enough? Does this big corpus share some characteristics with the small corpus, like language pair and/or the major subject?

In this case, you can use one of Domain Adaptation techniques to make use of both the big generic corpus and the small specialized corpus. While the big generic corpus will help avoid out-of-vocabulary words and unidiomatic translations, the small specialized corpus will help force terminology and vocabulary required for your current Machine Translation project.

Continue reading “Domain Adaptation Techniques for Low-Resource Domains, Institutions and Languages”

Domain Adaptation in Neural Machine Translation

Domain Adaptation is useful for specializing current generic Machine Translation models, mainly when the specialized corpus is too limited to train a separate model. Furthermore, Domain Adaptation techniques can be handy for low-resource languages that share vocabulary and structure with other rich-resource family languages.

As part of my Machine Translation research, I managed to achieve successful results in retraining Neural Machine Translation models for the purpose of Domain Adaptation using OpenNMT-py (the PyTorch version of OpenNMT). In this article, I am elaborating on the path I took and the achieved outcomes; hopefully, this will be useful for others.

Continue reading “Domain Adaptation in Neural Machine Translation”