Yasmin Moslem

Machine Translation Researcher.

Web Interface for Machine Translation

25 Jul 2021 » nmt

Today, we will create a very simple Machine Translation (MT) Web Interface for OpenNMT-py, OpenNMT-tf and FairSeq models using CTranslate2 and Streamlit.

Previously, there were other tutorials on how to use a simple server and web interface with Flask. However, today’s tutorial is for those who want to create an ultra simple, quick demo.

We also aim at highlighting that CTranslate2 is now the way to go for serving OpenNMT models due to its exceptional performance. It is completely up to you to use it in a simple way like what we will do here, or to integrate it into a REST API for advanced uses.

So let’s start…


Table of Contents:


Objective: Simple Machine Translation Web Interface

Our objective is to develop a simple web interface for Machine Translation like this one.

streamlit-translate-gui

Install Requirements

Optional: Create and Activate a Virtual Environment

  • Install virtualenv:
    pip3 install virtualenv
    
  • Create a virtual environment, e.g. myvenv:
    virtualenv myvenv --python=python3
    
  • Activate the virtual environment:
    source myvenv/web/bin/activate
    

Install Required Libraries

pip3 install ctranslate2 sentencepiece streamlit watchdog nltk

Convert Model to CTranslate2

CTranslate2 supports both OpenNMT-py and OpenNMT-tf models. As of version 2.0, it also supports FairSeq models. However, you need to convert your model to the CTranslate2 format before using it.

The following commands are simply copied from the CTranslate2 repository, and tested to make sure they are up-to-date. This example uses pre-trained Transformer English-German models. If you trained your own model, run the same commands on it instead.

For an OpenNMT-py model:

pip3 install OpenNMT-py

wget https://s3.amazonaws.com/opennmt-models/transformer-ende-wmt-pyOnmt.tar.gz
tar xf transformer-ende-wmt-pyOnmt.tar.gz

ct2-opennmt-py-converter --model_path averaged-10-epoch.pt --output_dir ende_ctranslate2

For an OpenNMT-tf model:

pip3 install OpenNMT-tf

wget https://s3.amazonaws.com/opennmt-models/averaged-ende-ckpt500k-v2.tar.gz
tar xf averaged-ende-ckpt500k-v2.tar.gz

ct2-opennmt-tf-converter --model_path averaged-ende-ckpt500k-v2 --output_dir ende_ctranslate2 \
    --src_vocab averaged-ende-ckpt500k-v2/wmtende.vocab \
    --tgt_vocab averaged-ende-ckpt500k-v2/wmtende.vocab \
    --model_type TransformerBase

For a FairSeq model:

ct2-fairseq-converter --model_path <model.pt> --data_dir <model_dir> --output_dir <output_dir>

Python sample:

Let’s make sure that CTranslate2 works properly in our setup by running this Python code:

import ctranslate2
translator = ctranslate2.Translator("ende_ctranslate2/")
translator.translate_batch([["▁H", "ello", "▁world", "!"]])

Note: translate_batch() can take a list of sentences and translate them in batches, which would be very efficient. Here we are using only one sentence as an example for demonstration purposes.

Create Your App

Test App

Let’s first create a small app to see how Streamlit works.

Create a file called test.py for example and add the following lines to it.

import streamlit as st

st.title("Upper My Text")

user_input = st.text_input("Write something and press Enter \
    to convert it to the UPPER case.")

if len(user_input) > 0:
    output = user_input.upper()
    st.info(output)

Launch your test app by opening the Terminal and running the following command.

streamlit run test.py

If everything works as expected, you should see something like this in your browser at the URL http://localhost:8501. Once you type a text and press Enter, the text will be printed in the UPPER case.

streamlit-test


Translation App

Let’s now develop our translation web interface. Create a file called translate.py for example, and add the following to it.

import streamlit as st
import sentencepiece as spm
import ctranslate2
from nltk import sent_tokenize


def tokenize(text, sp_source_model):
    """Use SentencePiece model to tokenize a sentence

    Args:
        text (str): A sentence to tokenize
        sp_source_model (str): The path to the SentencePiece source model

    Returns:
        List of of tokens of the text.
    """

    sp = spm.SentencePieceProcessor(sp_source_model)
    tokens = sp.encode(text, out_type=str)
    return tokens


def detokenize(text, sp_target_model):
    """Use SentencePiece model to detokenize a sentence's list of tokens

    Args:
        text (list(str)): A sentence's list of tokens to detokenize
        sp_target_model (str): The path to the SentencePiece target model

    Returns:
        String of the detokenized text.
    """

    sp = spm.SentencePieceProcessor(sp_target_model)
    translation = sp.decode(text)
    return translation


def translate(source, ct_model, sp_source_model, sp_target_model, device="cpu"):
    """Use CTranslate model to translate a sentence

    Args:
        source (str): A source sentence to translate
        ct_model (str): The path to the CTranslate model
        sp_source_model (str): The path to the SentencePiece source model
        sp_target_model (str): The path to the SentencePiece target model
        device (str): "cpu" (default) or "cuda"

    Returns:
        Translation of the source text.
    """

    translator = ctranslate2.Translator(ct_model, device)
    source_sentences = sent_tokenize(source)
    source_tokenized = tokenize(source_sentences, sp_source_model)
    translations = translator.translate_batch(source_tokenized)
    translations = [translation[0]["tokens"] for translation in translations]
    translations_detokenized = detokenize(translations, sp_target_model)
    translation = " ".join(translations_detokenized)
    return translation


# File paths to the CTranslate2 model
# and the SentencePiece source and target models.
ct_model = "/path/to/the/ctranslate/model/directory"
sp_source_model = "/path/to/the/sentencepiece/source/model/file"
sp_target_model = "/path/to/the/sentencepiece/target/model/file"

# Title for the page and nice icon
st.set_page_config(page_title="NMT", page_icon="🤖")
# Header
st.title("Translate")

# Form to add your items
with st.form("my_form"):
    # Textarea to type the source text.
    user_input = st.text_area("Source Text", max_chars=200)
    # Translate with CTranslate2 model
    translation = translate(user_input, ct_model, sp_source_model, sp_target_model)

    # Create a button
    submitted = st.form_submit_button("Translate")
    # If the button pressed, print the translation
    # Here, we use "st.info", but you can try "st.write", "st.code", or "st.success".
    if submitted:
        st.write("Translation")
        st.info(translation)

Note: Make sure you update the variables ct_model, sp_source_model, and sp_target_model with our own paths to the CTranslate2 model, and the SentencePiece source and target models.

Let’s launch our translator. Run the following command in the Terminal.

streamlit run translate.py

If everything works fine, you should see an output like this at the URL http://localhost:8501/

Try typing a sentence (in the same source language of your model) and press the button “Translate”. The translation should be printed as you see here!

streamlit-translate

Add Language Pairs

To give your visitor the option to select between multiple language pairs, you can add a dropdown menu like this one.

streamlit-dropdown

You can first change the paths part into a function:

def load_models(option):
    if option == "English-to-Japanese":
        ct_model = "path/to/your/ct_model"
        sp_source_model = "path/to/your/sp_source_model"
        sp_target_model = "path/to/your/sp_target_model"
    elif option == "Japanese-to-English":
        ct_model = "path/to/your/ct_model"
        sp_source_model = "path/to/your/sp_source_model"
        sp_target_model = "path/to/your/sp_target_model"

    return ct_model

Then, you change the form to:

with st.form("my_form"):

    # Dropdown menu to select a language pair
    option = st.selectbox(
    "Select Language Pair",
    ("English-to-Japanese", "Japanese-to-English"))
    #st.write('You selected:', option)

    # Textarea to type the source text.
    user_input = st.text_area("Source Text", max_chars=200)

    # Load models
    ct_model, sp_source_model, sp_target_model = load_models(option)
    
    # Translate with CTranslate2 model
    translation = translate(user_input, ct_model, sp_source_model, sp_target_model)

    # Create a button
    submitted = st.form_submit_button("Translate")
    # If the button pressed, print the translation
    # Here, we use "st.info", but you can try "st.write", "st.code", or "st.success".
    if submitted:
        st.write("Translation")
        st.info(translation)

I hope this helps. I will be updating this repository with Python samples.

Next steps

Streamlit Components

Streamlit comes with more componets. One of the most intersting NLP components you might want to check is spacy-streamlit

Deplopyment

You can deploy your app on any service of your choice. However, if you are looking for a free and easy option, consider using Heroku. For better performance, test your app with and without Streamlit’s caching option and see if it helps.

Thanks for reading! If you have questions or suggestions, feel free to contact me.