sysbench CPU benchmark of different AWS EC2 instances

AWS EC2 推出 M5a/R5a/T3a使用 AMD CPU 的 主機分別有大約一年/半年左右的時間了,因為平時使用到美國境內區域的比例比較少、亞洲更新的時間相對還是比較晚,所以幾個月前才陸續開始使用這幾個系列的機器 (截至目前為止東京都還不是每個 AZ 都可以開 T3a 的機器),除了官方宣稱的 10% 左右價差 (跟 M5/R5/T3 相比) 以外,對於效能這方面並沒有提供太多資訊,想像上以及一些其他管道的資訊多是和同系列但使用 Intel 處理器的主機效能約略相同、或少個 10% 以內的效能,但實際上如何好像沒有太多的討論或比較,這邊就來用 sysbench 這套工具很簡單的比較一下幾款比較常用到、不同系列的 AWS EC2 主機 CPU 效能,方便起見,使用預設值下取跑、且都拿 .large 這個大小的機器來比、只看單個執行緒的測試結果

測試環境:

  • Ubuntu: 18.04.3 LTS
  • Linux kernel: 4.15.0-1052-aws
  • sysbench: 1.0.18

比較對象如下,共十一種機型:

  • General purpose
    • T2
    • T3
    • T3a
    • M4
    • M5
    • M5a
  • Compute optimized
    • C4
    • C5
  • Memory optimized
    • R4
    • R5
    • R5a

因為要測試的是 CPU 的效能,一些跟磁碟、網路有關衍生出來的機型例如 C5d / C5n / M5dn / R5dn 等也就不列入測試範圍了。另外 T1 / C3 / M3 / M4 這種前幾個世代比較舊的機器也不列入這次測試,現在要開機器應該不會開這麼舊的來用了,沒事不會同時跟錢包還有效能過不去,除非有什麼比較久遠的 AMI 沒辦法在新機器上開起來非得用舊機器跑不可 …

其中 T2 / T3 / T3a 系列要靠 CPU Credits 才能使用到最佳效能的機器,在測試時都是開 Unlimited Mode 下去測的 (實際上 sysbench 預設只會跑十秒左右,連 1 CPU Credit 都用不滿) ,拿沒有被卡住的效能比較才比較有

關於各機型的規格/價格比較,建議可以參考 EC2Instances.info ,我覺得這網站還不錯很方便:

EC2Instances.info Easy Amazon EC2 Instance Comparison
https://ec2instances.info/?filter=.large&region=ap-northeast-1&cost_duration=monthly&reserved_term=yrTerm1Standard.allUpfront

接下來就直接看測試結果:

# t2.large
$ sysbench cpu run
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   909.23

General statistics:
    total time:                          10.0008s
    total number of events:              9095

Latency (ms):
         min:                                    1.09
         avg:                                    1.10
         max:                                    1.17
         95th percentile:                        1.10
         sum:                                 9983.83

Threads fairness:
    events (avg/stddev):           9095.0000/0.00
    execution time (avg/stddev):   9.9838/0.00
# t3.large
$ sysbench cpu run
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1082.23

General statistics:
    total time:                          10.0008s
    total number of events:              10825

Latency (ms):
         min:                                    0.88
         avg:                                    0.92
         max:                                    9.74
         95th percentile:                        0.95
         sum:                                 9995.95

Threads fairness:
    events (avg/stddev):           10825.0000/0.00
    execution time (avg/stddev):   9.9960/0.00
# t3a.large
$ sysbench cpu run
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1234.78

General statistics:
    total time:                          10.0001s
    total number of events:              12350

Latency (ms):
         min:                                    0.78
         avg:                                    0.81
         max:                                    1.28
         95th percentile:                        0.83
         sum:                                 9995.83

Threads fairness:
    events (avg/stddev):           12350.0000/0.00
    execution time (avg/stddev):   9.9958/0.00
# m4.large
$ sysbench cpu run
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   912.43

General statistics:
    total time:                          10.0009s
    total number of events:              9127

Latency (ms):
         min:                                    1.09
         avg:                                    1.09
         max:                                    4.21
         95th percentile:                        1.10
         sum:                                 9986.15

Threads fairness:
    events (avg/stddev):           9127.0000/0.00
    execution time (avg/stddev):   9.9861/0.00
# m5.large
$ sysbench cpu run
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1096.93

General statistics:
    total time:                          10.0009s
    total number of events:              10972

Latency (ms):
         min:                                    0.91
         avg:                                    0.91
         max:                                    1.03
         95th percentile:                        0.92
         sum:                                 9998.14

Threads fairness:
    events (avg/stddev):           10972.0000/0.00
    execution time (avg/stddev):   9.9981/0.00
# m5a.large
$ sysbench cpu run
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1433.26

General statistics:
    total time:                          10.0007s
    total number of events:              14336

Latency (ms):
         min:                                    0.66
         avg:                                    0.70
         max:                                    1.57
         95th percentile:                        0.80
         sum:                                 9997.21

Threads fairness:
    events (avg/stddev):           14336.0000/0.00
    execution time (avg/stddev):   9.9972/0.00
# r4.large
$ sysbench cpu run
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:   912.11

General statistics:
    total time:                          10.0012s
    total number of events:              9124

Latency (ms):
         min:                                    1.09
         avg:                                    1.09
         max:                                    1.35
         95th percentile:                        1.10
         sum:                                 9986.20

Threads fairness:
    events (avg/stddev):           9124.0000/0.00
    execution time (avg/stddev):   9.9862/0.00
# r5.large
$ sysbench cpu run
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1095.93

General statistics:
    total time:                          10.0008s
    total number of events:              10962

Latency (ms):
         min:                                    0.91
         avg:                                    0.91
         max:                                    1.27
         95th percentile:                        0.92
         sum:                                 9997.97

Threads fairness:
    events (avg/stddev):           10962.0000/0.00
    execution time (avg/stddev):   9.9980/0.00
# r5a.large
$ sysbench cpu run
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1329.11

General statistics:
    total time:                          10.0007s
    total number of events:              13294

Latency (ms):
         min:                                    0.67
         avg:                                    0.75
         max:                                    0.88
         95th percentile:                        0.83
         sum:                                 9997.24

Threads fairness:
    events (avg/stddev):           13294.0000/0.00
    execution time (avg/stddev):   9.9972/0.00
# c4.large
$ sysbench cpu run
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1032.31

General statistics:
    total time:                          10.0010s
    total number of events:              10326

Latency (ms):
         min:                                    0.96
         avg:                                    0.97
         max:                                    1.06
         95th percentile:                        0.97
         sum:                                 9988.51

Threads fairness:
    events (avg/stddev):           10326.0000/0.00
    execution time (avg/stddev):   9.9885/0.00
# c5.large
$ sysbench cpu run
sysbench 1.0.18 (using bundled LuaJIT 2.1.0-beta2)

Running the test with following options:
Number of threads: 1
Initializing random number generator from current time


Prime numbers limit: 10000

Initializing worker threads...

Threads started!

CPU speed:
    events per second:  1189.51

General statistics:
    total time:                          10.0001s
    total number of events:              11897

Latency (ms):
         min:                                    0.83
         avg:                                    0.84
         max:                                    1.03
         95th percentile:                        0.86
         sum:                                 9997.20

Threads fairness:
    events (avg/stddev):           11897.0000/0.00
    execution time (avg/stddev):   9.9972/0.00

跑測試的時候其實都會跑三次以避免有誤差太大的情況,不過實際上看起來誤差都非常小,甚至有測出來兩次的一樣的情況,所以就取三次裡面結果落在中間的那次貼就好。

最後拿 CPU speed: events per second 做成簡單的長條圖以便視覺化比較

滿有趣的是 T3a / R5a / M5a 都比同系列的 T3/R5/M5 效能來的更勝一籌,差距甚至比跟上一代主機(T2 / R4 / M4)的效能落差來的還大,難道是因為 Intel 的 CPU 不斷被爆出安全性漏洞後的修補結果嗎 XD

發表迴響