I have implemented the translation option -phrase_table
into the OpenNMT-py version, and today it has been merged into the repository. The -phrase_table
was already documented from the Lua version but was not implemented in the PyTorch version.
Description
If the -phrase_table
translation argument is provided (with -replace_unk
), it will look up the identified source token and give the corresponding target token. If it is not provided (or the identified source token does not exist in the table), then it will copy the source token. Tested with both translate.py and server.py (with conf.json).
The default behaviour of the -replace_unk
option is substituting (for an unknown word) with the source word that has the highest attention weight. Adding the option -phrase_table
as well, it will look up in the phrase table file for a possible translation instead. If a valid replacement is not found, only then the source token will be copied.
The phrase table file should include a single translated word (token) per line in the format:
source|||target
Example with translate.py
python3 OpenNMT-py/translate.py -model available_models/my.model_step_100000.pt -src source.txt -output prep.txt -replace_unk -phrase_table phrase-table.txt
Example with server.py
python3 OpenNMT-py/server.py --ip "0.0.0.0" --port 5000 --url_root "/translator" --config available_models/conf.json
curl -i -X POST -H "Content-Type: application/json" -d '[{"src": "this is a test for model 100", "id": 100}]' http://127.0.0.1:5000/translator/translate
… where conf.json is:
{
"models_root": "/home/available_models",
"models": [
{
"id": 100,
"model": "my.model_step_100000.pt",
"timeout": 600,
"on_timeout": "to_cpu",
"load": true,
"opt": {
"beam_size": 1,
"replace_unk": true,
"phrase_table": "/home/available_models/phrase-table.txt"
}
}
]
}
You can find more details about the options at:
http://opennmt.net/OpenNMT-py/options/translate.html?highlight=phrase%20table
You can also refer to this page – from the Lua version, but the concept is the same:
http://opennmt.net/OpenNMT/translation/unknowns/
If you tried the new OpenNMT-py -phrase_table
option and got feedback, please let me know.
This is really a useful option while using the model which gives at the locations in the translated target. Presently, while working on the PoC for GERMAN-ENGLISH using OpenNMT, I have faced an umpteen number of issues. But this issue was a long pending issue, which is right now resolved.
Going forward if more articles on NLP in correspondence with OpenNMT is published then it would be more helpful for beginners like me. In fact, it would be more enlightening.
GOD BLESS YOU.
Hi Kishor! Thanks for your comment! Glad it was helpful.