EBS benchmark testing

Background

Customer want to deployment Solr Cloud on AWS EC2 ask AWS to provide benchmark data about EBS volume which target IOPS required > 30,000 and Solr engine need SSD volumes.

Benchmark EBS Volumes official guide

Launch an EBS-optimized instance.
Create new EBS volumes.
Attach the volumes to your EBS-optimized instance.
Configure and mount the block device.
Install a tool to benchmark I/O performance.
Benchmark the I/O performance of your volumes.
Watch the cloudwatch metrics
Delete your volumes and terminate your instance.

EBS RAID configuration

For example, two 500 GiB Amazon EBS io1 volumes with 4,000 provisioned IOPS each will create a 1000 GiB RAID 0 array with an available bandwidth of 8,000 IOPS and 1,000 MB/s of throughput.

Creating a RAID 0 array allows you to achieve a higher level of performance for a file system than you can provision on a single Amazon EBS volume. A RAID 1 array offers a "mirror" of your data for extra redundancy.

RAID 5 and RAID 6 are not recommended for Amazon EBS because the parity write operations of these RAID modes consume some of the IOPS available to your volumes.

You can creat a RAID array with hight volumes such as (we recommend 6 volumes), which offers a high level of performance. But for this customer case testing, 4 volumes RAID will used.

注意

Raid0和Instance store对于数据持久性都有局限，需要通过定期备份到S3。考虑到Solr Cloud分布式特性以及对于IOPS的要求，因此可以使用。
如果由于实例的IOPS和吞吐有上限，因此需要更大IOPS和吞吐，需要开更大的机型，相应的机器性能更好。
如果想不改变单盘的IOPS提升总体IOPS，这个需要做raid，但是带来的问题也是Performance of the stripe is limited to the worst performing volume in the set 以及 Loss of a single volume results in a complete data loss for the array。比如要40000的IOPS，可以raid 4块 10000IOPS的盘；也可以不做raid，然后直接1块或者多块 40000IOPS的盘
GP2是突增类型，注意credit。
Raid0和Raid1的选择 The following table compares the common RAID 0 and RAID 1 options.

Configuration	Use Advantages	Disadvantages
RAID 0	When I/O performance is more important than fault tolerance; for example data replication is already set up separately	I/O is distributed across the volumes in a stripe. If you add a volume, you get the straight addition of throughput and IOPS
RAID 1	When fault tolerance is more important than I/O performance; for example, as in a critical application.	Safer from the standpoint of data durability.

Test case

Raid0 on instance storage for different instance type

Compare io1 high IOPS v.s gp2 RAID0 v.s io1 RAID0 Note: If you need IOPS and high throughput you need use the large instance type to avoid instance limits

Testing snapshot

c4.xlarge EBS Snapshot testing

Fio Testing Result

instance type	IOPS randread 16 KB I/O	Throughput (MBps) randread 16 KB I/O	IOPS randwrite 16 KB I/O	Throughput (MBps) randwrite 16 KB I/O	IOPS randread 4 KB I/O	Throughput (MBps) randread 4 KB I/O	IOPS randwrite 4 KB I/O	Throughput (MBps) randwrite 4 KB I/O
m5.8xlarge	30327	485	30139	482	N/A	N/A	N/A	N/A
m5.12xlarge io1	40308	644.9	40216	634.4	N/A	N/A	N/A	N/A
m5.12xlarge gp2 raid0	36299	580.8	36155	578.5	N/A	N/A	N/A	N/A
m5.12xlarge io1 raid0	40295	644.7	40219	643.5	N/A	N/A	N/A	N/A
m5d.8xlarge Single NVMe SSD	205133	3205.3	32233	515.7	437578	1709.3	117463	469.8
m5d.8xlarge NVMe SSD Raid0	135220	2112.9	64470	1007.4	470710	1838.8	234654	938.6
I3.4xlarge Single NVMe SSD	216421	3381.7	48958	783.3	581387	2271.5	180559	722.2
I3.4xlarge NVMe SSD Raid0	243184	3799.8	97678	1526.3	824841	3222.4	357232	1395.5