GPU Options for Neural Machine Translation

In this article, I am exploring several GPU options I have either used myself or considered, for training my Neural Machine Translation models. As GPU machines are known for being expensive, the main factor I am concentrating on here is “cost”, which can be determined not only by machine rates, but also in light of other considerations such as technical specifications, and long-term vs. short-term commitments.

Free GPU Options

Google Colab

Colab, which is based on Jupyter Notebooks, is now popular among Deep Learning researchers, mainly for its free GPU resources.

So Google Colab is free, with 1-GPU instances, and (theoretically) supports 12 hours of continuous training. Sometimes, it gets interrupted earlier, but it is more than enough for learning purposes.

By default, the runtime type / hardware accelerator of Colab is CPU; to use GPU, open the Colab’s “Runtime” menu, and from “Change Runtime Type“, and select “GPU”.

For more details on how to use Google Colab, you can check this tutorial “Google Colab: Using GPU for Deep Learning” by Mr. Rakshith Ponnappa at GoTrained’s blog.

Amazon AWS Activate

This is the option we currently use. Amazon AWS Activate is a grant for startups, offering free AWS credit for one year (which can be applied towards EC2 GPU instances as well). Amazon AWS Activate qualification can only be achieved via an “AWS Partner”; we was able to apply via WeWork, which offers $5000 credit value for one year.

As far as GPU instances are concerned, there are no technical limitations while using your AWS Activate credit. You can use the same AWS EC2 GPU Instances or Spot Instances, both of which I am mentioning later in this article. Again, you can use any GPU type, including 1-GPU, 4-GPU, 8-GPU, or even 16-GPU instances. So simply there is no limitation on your choice except that the charges will be deducted from your AWS Activate free credit.

Nonprofits, Researchers and Educators

Some companies offer “grants” and free credits for nonprofits, researchers and educators including:

Paid GPU Options

Amazon AWS EC2

As I have been using Amazon AWS services for several years now for diverse purposes, including Statistical Machine Translation, I also used a EC2 GPU machine for my Neural Machine Translation work and research.

Precisely, I used a p2.xlarge instance, which is a 1-GPU machine type with 61 GiB RAM and 12 GiB GPU memory. For the purpose of training an OpenNMT PyTorch Neural Machine Translation model on one GPU, the p2.xlarge machine was good and I did not face any issues. It might be worth mentioning that I used it with the AMI called Deep Learning Base AMI (Ubuntu) Version 15.2; this determines the software installed on the machine and does not affect charges.

The announced pay-as-you-go cost of a p2.xlarge instance is $0.90 per working hour; however, while it is in the state “stopped”, it costs about $0.20 daily as well. As you may already know, the state “stopped” is like shutting down; it stops the pay-as-you-go charges while keeping your files on the machine; on the other hand, the “terminated” state ends your renting period, deletes your files from the machine, and stops all the charges, including the daily charges.

To give you an idea about performance, time, and hence expected hourly charges, I am referring to two models. One model was trained on approx. 500k short segments with the default options of OpenNMT PyTorch, and it took 18186 seconds ( approx. 5 hours). Another model was trained on approx. 140k long segments with the default options of OpenNMT PyTorch, and it took 13366 seconds (approx. 3.7 hours).

A p2.xlarge instance comes with only 50 GB disk space, and this is not enough if you have a large model, but adding more space is relatively affordable; depending on the data center AWS “region”, extra storage can cost you $0.10 per GB per month.

Among other AWS EC2 GPU important options are EC2 G3 instances; although they are dedicated for graphics applications, some researchers used G3 machines for deep learning and reported good (better) performance.

AWS Reserved Instances can be used for long-term commitment, which depends on a payment in advance for a number of months, and then a reduced rate for the hourly usage. This option can be useful for companies with long-term, continuous need for GPU machines.

Google Cloud and Microsoft Azure

Similarly, Google Cloud and Microsoft Azure have similar GPU options. I am mentioning them here because they are popular options that you might want to consider, but I did not use either of them for serious work (maybe just a few tests in the past).

Local GPU Machines

Having your own GPU machine is an option you need to seriously consider if you train models on a daily basis. The main consideration to keep in mind is that this option is really expensive while there are new GPU machine capabilities frequently appear and your needs can increase as you have more data or if you want to use more memory-intensive options. In other words, replacement would not be an easy decision; so be sure you choose a relatively suitable option from the beginning.

Economic GPU Options

AWS Spot Instances

AWS Spot Instances can cost like one quarter of the cost of a regular instance of the same type; for one GPU, a p2.xlarge instance can cost 27 cents per hour. The main difference between Spot Instances and regular instances is that Spot Instances are used for training that allows interruption and continuing. For example, OpenNMT allows continuing a stopped training as it splits the training process into milestones (checkpoints); each checkpoint has a file. So if the training stops for any reason, there is a -train_from command you can use specifying the file of the last complete checkpoint.

Spot instances depend on the idea of “Spot Fleet” meaning when one spot instance is not available, your work is moved to another spot instance.

It is recommended to use Docker while using a Spot Fleet to make it easier to move the whole configuration to another machine.

RenderRapidly and Toplahm

I have also explored another GPU option offered by RenderRapidly. Compared to some other options, this option was relatively affordable.

According to Mr. Burak Basaran, Founder of RenderRapidly and Co-founder of Toplahm, their effective hourly rates of GPU machines are many times cheaper than AWS or other clouds. “For example, at $167/month, our single GTX1080 server’s effective hourly price will be 1/8 the price of AWS K80  p2.xlarge for the same performance,” said Mr. Basaran, “GTX1080 deep learning performance is usually more than twice that of K80; however, this calculation assumes that it is 2x faster.

Well, at the time of writing this article, both RenderRapidly and Toplahm offer weekly/monthly/long-commitment options; so with no pay-as-you-go option, whether their service is cheaper or not depends mainly on how many hours you need per month to train your models. As for the “performance” point, I have noticed this while training my NMT models, comparing it to the training time I spent on an AWS EC2 p2.xlarge machine, but as the machine specifications are different, I cannot calculate that precisely. Still, as for disk space, it is clear that RenderRapidly and Toplahm provide plenty of it, compared to only 50 GB on an AWS p2.xlarge machine.

Update, 23 September, 2019: Toplahm sent an update that they now provide 4xGPU and 8xGPU servers available to rent, and that they have introduced an hourly option for 8x GPU servers, with the condition of a minimum number of hours.

So here are more technical details about this experience. My machine was a 2-GPU system with 32 GB RAM, 480 GB SSD, and 8 GB memory per GPU .

Training a Neural Machine Translation model with OpenNMT PyTorch using the default options, 13 million segments (MultiUN corpus), everything went fine. It took 30136 seconds (approx. 8.37 hours) to finish training.

Training a Transformer model for the same MultiUN corpus, with OpenNMT PyTorch and the Transformer recommended options, the 8 GB GPU memory was almost fully used. To avoid the RuntimeError: CUDA out of memory error, I had to minimize the number of segments per -valid_batch_size in the evaluation step. The default -valid_batch_size is 32 segments; I tried 25 which did not work, and then tried 5 which allowed the training to complete; the option -valid_batch_size does not effect the model’s performance anyhow. This screenshot shows the output of the command nvidia-smi

Regarding RAM and space, we realised that less capacities could be enough, 16 GB RAM (instead of 32 GB RAM) and 120 GB SSD (instead of 480 GB SSD).

Running the Transformer model with the recommended options, and using the -m command to get an idea about the RAM usage; here was the output:

     total  used  free  shared  buff/cache   available
Mem: 32035  11239  795    25     20000       20310
Swap: 2047    0    2047

Each checkpoint file of the model takes 1.5 GB disk space, and with 20 checkpoints, it is expected to take 30 GB. The command df -h gives details about used available disk space.

The training of the above-mentioned Transformer model took 224633 seconds (approx. 62.4 hours, or 2.6 days).

Overall, my experience with this GPU service was good.

Other GPU Options

You can find other GPU options in the answers to this Quora question “Which cloud hosting provides GPU servers at the lowest cost?“. Among them, I tried RenderRabidly (as detailed above). I wanted to try Paperspace, but their payment options were too limited (for me), so I did not get the chance.

Among other popular options is Lambda. I also came across GPUServersRental, but did not try it. My only observation is that most of their options tend to have multiple GPUs, but with low memory per GPU (2, 3, 4, 8); as I mentioned earlier the Transformer model with the recommended options was barely working with 8 GB memory per GPU.

Non-GPU Options

Note that you mainly need a GPU machine for training your Neural Machine Translation model; otherwise, it would take forever to train a big model. However, when it comes to a machine for deployment, precisely translation, you can still use a CPU machine with enough RAM and the speed difference will not be noticed. The definition of “enough” depends on how big your model is , and if it is not, it will tell you anyhow.

Personally, I use DigitalOcean for REST API deployment of OpenNMT-py, but any local or cloud option can be used.

Disclaimer: I do NOT have any affiliation to any of the mentioned options. The only reason for concentrating on some options more than others is that I used them and I want to write down my experience; hopefully, it is useful for you, or even for me in the future!

If you have other GPU experiences that you wanted to share, this will be highly appreciated.

Rating: 5.0/5. From 2 votes.
Please wait...

Leave a Reply

Your email address will not be published. Required fields are marked *