Popular Searches

The Best Services For Running Machine Learning Models On AWS

AWS Logo

Machine learning is a huge industry, and it’s got a lot of support on AWS. We’ll discuss the best services for building, training, and running both custom and preconfigured machine learning models on the AWS platform.


SageMaker is AWS’s fully managed machine learning suite, designed to replace all the manual work involved with configuring servers for training and inference. From SageMaker, you can create and train models using datasets you provide, with all of your work saved in a “notebook.” This is the most complete experience you’ll find on AWS for running machine learning models.

SageMaker will handle the creation of preconfigured training instances automatically, which can save you a lot of money on time wasted configuring an expensive instance for training. SageMaker also has a marketplace for algorithms, similar to Amazon Machine Images, that you can run on the platform. Some are free, while some will cost an hourly fee to run.

Once you’ve got a model, deployment is fairly easy. All you’ll have to do is add your model to your endpoint configuration,

sagemaker deployment configuration

And choose the instance type (and optional Elastic Inference accelerators) you’d like to use.

sagemaker model settings

You can also use the model in Batch Transform jobs to run inference on an entire dataset and store the results in S3.

However, while SageMaker is “free to use”, it’s not exactly free to use. SageMaker only allows you to deploy to special instances, denoted by the “ml.” prefix. These are effectively the same as regular EC2 instances, but with one key difference—they cost 40% more, across the board. If you’re using SageMaker, this is the fee you will have to pay. SageMaker does support Spot Instances, which can help bring costs down, but it will still be more expensive than EC2.

While SageMaker does allow the use of Elastic Inference accelerator addons, they also are subject to the same 40% price increase.

Elastic Inference + EC2

If you’d rather set things up yourself, or want to save some money on the overpriced SageMaker instances, there’s always regular ol’ EC2. This gives you the freedom to configure your servers however you like, and have full control over the services installed on it.

Training models will generally be a lot more intensive than running inference. This is where SageMaker can have an advantage—being a fully managed service, you only pay for the time you spend actually training, not time spent waiting for startup, configuring servers with your data, and finishing up after it’s done. Even still, EC2 can be stopped and started at will, and Spot Instances are perfect for this task, making it not much of an issue in practice.

However, running inference in production often doesn’t require the full power of an entire GPU, which is expensive to run. To combat this, AWS provides a service called Elastic Inference that allows you to rent GPU accelerator addons for existing EC2 instances. These can be attached to instances of any type, and charge by the hour based on the power of the accelerator, essentially providing a whole new SKU of GPU instances below the powerful (and expensive) p3 lineup.

While Elastic Inference isn’t a compute platform on its own, it allows you to greatly speed up your inference with GPU acceleration for a fraction of the cost. The cheapest full-GPU p3 instance, the p3.2xlarge, costs $3.06 per hour to run, and comes with 8 cores, 61 GB of RAM, and 16 TFLOPS of GPU performance. For an accurate comparison to the GPU-only EI accelerators, we’ll subtract out the vCPU and RAM costs. The similarly spec’d m5.4xlarge costs $0.768, so the estimated cost AWS seems to be selling a single Tesla V100 GPU is about $2.292, give or take, which is about $0.143 per TFLOP. The cheapest EI accelerator, providing a single TFLOP of performance, costs $0.120—a 16% decrease over the EC2 price. The 4 TFLOP option is even better: a 40% decrease over EC2.

AWS also provides pre-configured environments for running machine learning with Deep Learning AMIs. These come preinstalled with ML frameworks and interfaces like TensorFlow, PyTorch, Apache MXNet, and many others. They’re entirely free to use, and available on both Ubuntu and Amazon Linux.

Elastic Inference accelerators also support Auto Scaling, so you’ll be able to configure them to scale up to meet increasing demand, and scale down at night when they’re not used as much.

AWS’s Own Machine Learning Services

While these services don’t allow you to run your own custom models, they do provide many useful features for applications that make use of machine learning underneath. In a sense, these services are a frontend for a machine learning model that AWS has already trained and programmed.

AWS Personalize is a general purpose recommendation engine. You give it a list of products, services, or items, and feed in user activity. It spits out recommendations for new things to suggest to that user. This is based on the same technology that powers recommendations on Amazon.com.

AWS Lex is a fully managed chatbot service that can be configured with custom commands and routines, powered by the same tech behind Alexa. The chatbots can be text only, or can be fully interactive voice bots using AWS Transcribe for speech-to-text and AWS Polly for text-to-speech, both of which are also standalone services.

AWS Rekognition performs image recognition in images and video, a common machine learning task. It’s able to recognize most common objects, generate keywords from images, and can even be configured with custom labels to extend the detection capabilities further.

Anthony Heddings Anthony Heddings
Anthony Heddings is the resident cloud engineer for LifeSavvy Media, a technical writer, programmer, and an expert at Amazon's AWS platform. He's written hundreds of articles for How-To Geek and CloudSavvy IT that have been read millions of times. Read Full Bio »

The above article may contain affiliate links, which help support CloudSavvy IT.