Database performance is heavily affected by the performance of the underlying memory. For reads, having a lot of RAM can speed things up, but for write-heavy operations, the bottleneck is the SSD of hard disk it runs out. AWS has plenty of options for storage, so which one is best for you?
Database-Focused EC2 Instances
Beyond just underlying storage, there are plenty of other factors to database performance. AWS has many different classes of instances with individual tiers within them.
The most database-oriented instance are the R5 series. Theses are optimized for memory performance, both with RAM speed and size and EBS performance. They offer a high ratio of core count to available memory, all the way up to 768 GB of RAM on the r5.24xlarge.
There is also the r5d series, a subclass of R5 that offers a straight up local disk, not on EBS. The largest tier has four 900 GB NVMe SSDs. Smaller in size than EBS’s max capacity, but will have stellar performance and great latency.
There is also the D3 series, which offer the largest amount of local storage possible for an EC2 instance, up to 336 TB. If you’re looking to run a particularly massive instance storing a lot of data, D3 may work best for you.
EBS Volume Types
EBS has a a few different tiers. The most common is gp3, which is a general purpose SSD backed volume that offers solid performance at a higher price than hard drive backed volumes.
gp3 is the latest generation, replacing
gp2and offering 4x better performance using PCIe Gen 4 SSDs.
gp3 uses a burst-bucket pricing model. Depending on the size of the volume, it earns “IO Credits” per hour that are used automatically to buy IOPS, or input-output-operations-per-second. This allows quick bursts of performance when needed, but if you need steady, solid performance, relying on this is not a great idea. There is also a maximum number of IOPS; for gp3, that’s 16,000.
Volumes earn IO credits at a rate of 3 per GB per second. Meaning that if you have a volume greater than 1 TB, your bucket will be always full, and you won’t have to worry about burst performance. Anything lower than that, and you’re limited to the baseline performance based on how many credits you earn.
What this means in practice is that if you need extra performance, you’ll want to use the second SSD based volume,
io2, also known as Provisioned IOPS SSD. These allow you to literally buy disk performance directly, provisioned to your EBS volume. The best tier,
io2 Block Express, offers up to 4000 MB/s per volume and 7,500 MB/s per instance.
That’s up to four times the performance of
gp3, but only if you can pay for it—bandwidth is expensive, and you’ll need to pay for every bit of it. A top-of-the-line
io2 volume can easily cost thousands of dollars per month, more than the EC2 instance that may be running on it. That’s on top of the 83% increase in per-GB storage costs.
io2 is an option for customers who need every ounce of performance they can get, unless you’re maxing out your drive, the general purpose
gp3 volumes will be great for many people.
Hard Drive Volumes
There are two main hard drive EBS volumes, Throughput Optimized HDD (st1) Volumes, and Cold HDD (sc1) Volumes. The names are fairly self explanatory—st1 is optimized for decent enough sequential read speeds (though terrible random performance, as all hard drives have). For non-critical applications requiring large file sizes, sc1 offers great local storage performance.
Both types of volumes also use the burst bucket model, but top out at a fixed MBps number based on volume size.
For databases though, random read and write performance can matter a lot, as does latency. It’s 2020, and your users shouldn’t have to wait for a disk to spin up and wait for a magnetic read head to fetch some basic data. Not to mention how it would handle complex SQL queries that could grind the disk to a halt.
For anything user-facing, performance matters, and you should use an SSD. The only case where it makes sense is in read heavy applications where the database is small enough that it can be held mostly in memory, but even then, it would be small enough where the slight premium of even a basic gp3 volume would be worth it.
However, for big data, analytics, and other internal databases, the database can be so large that the cost of local storage is too high to run on SSDs. If you’re looking to run a high capacity data lake or multi-server cluster, you may not care so much about slightly worse disk speed, especially if it’s saving you money in the process.