Benchmarking wolfSSL and wolfCrypt

Many users are curious about how the wolfSSL embedded SSL/TLS library will perform on a specific hardware device or in a specific environment. Because of the wide variety of different platforms and compilers used today in embedded, enterprise, and cloud-based environments, it is hard to give generic performance calculations.

To help wolfSSL users and customers in determining performance for wolfSSL and wolfCrypt, a benchmark application is bundled with wolfSSL. Because the underlying cryptography is a very performance-critical aspect of SSL/TLS, our benchmark application runs performance tests on wolfCrypt’s algorithms.

The benchmark utility is located in the “./wolfcrypt/benchmark” directory of the wolfSSL package. After building wolfSSL and the associated examples and apps, the benchmark application can be run by issuing the following command from the package directory root:

./wolfcrypt/benchmark/benchmark

Typical output will look similar to the output below (showing throughput in MB/s as well as cycles per byte):

wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
RNG                100 MB took 1.047 seconds,   95.466 MB/s Cycles per byte =  22.92
AES-128-CBC-enc    200 MB took 1.020 seconds,  196.027 MB/s Cycles per byte =  11.16
AES-128-CBC-dec    215 MB took 1.008 seconds,  213.318 MB/s Cycles per byte =  10.26
AES-192-CBC-enc    175 MB took 1.016 seconds,  172.265 MB/s Cycles per byte =  12.70
AES-192-CBC-dec    180 MB took 1.009 seconds,  178.405 MB/s Cycles per byte =  12.27
AES-256-CBC-enc    150 MB took 1.007 seconds,  148.932 MB/s Cycles per byte =  14.69
AES-256-CBC-dec    160 MB took 1.026 seconds,  155.994 MB/s Cycles per byte =  14.03
AES-128-GCM-enc     60 MB took 1.010 seconds,   59.427 MB/s Cycles per byte =  36.82
AES-128-GCM-dec     65 MB took 1.070 seconds,   60.750 MB/s Cycles per byte =  36.02
AES-192-GCM-enc     60 MB took 1.050 seconds,   57.138 MB/s Cycles per byte =  38.30
AES-192-GCM-dec     60 MB took 1.024 seconds,   58.590 MB/s Cycles per byte =  37.35
AES-256-GCM-enc     55 MB took 1.029 seconds,   53.438 MB/s Cycles per byte =  40.95
AES-256-GCM-dec     60 MB took 1.090 seconds,   55.069 MB/s Cycles per byte =  39.74
CHACHA             360 MB took 1.001 seconds,  359.628 MB/s Cycles per byte =   6.09
CHA-POLY           285 MB took 1.014 seconds,  280.943 MB/s Cycles per byte =   7.79
MD5                450 MB took 1.010 seconds,  445.573 MB/s Cycles per byte =   4.91
POLY1305          1265 MB took 1.000 seconds, 1264.402 MB/s Cycles per byte =   1.73
SHA                475 MB took 1.000 seconds,  474.914 MB/s Cycles per byte =   4.61
SHA-224            210 MB took 1.018 seconds,  206.308 MB/s Cycles per byte =  10.61
SHA-256            210 MB took 1.018 seconds,  206.200 MB/s Cycles per byte =  10.61
SHA-384            280 MB took 1.016 seconds,  275.520 MB/s Cycles per byte =   7.94
SHA-512            275 MB took 1.000 seconds,  274.868 MB/s Cycles per byte =   7.96
SHA3-224           240 MB took 1.006 seconds,  238.506 MB/s Cycles per byte =   9.18
SHA3-256           225 MB took 1.007 seconds,  223.454 MB/s Cycles per byte =   9.79
SHA3-384           175 MB took 1.002 seconds,  174.610 MB/s Cycles per byte =  12.53
SHA3-512           125 MB took 1.031 seconds,  121.254 MB/s Cycles per byte =  18.05
HMAC-MD5           445 MB took 1.001 seconds,  444.651 MB/s Cycles per byte =   4.92
HMAC-SHA           470 MB took 1.009 seconds,  465.749 MB/s Cycles per byte =   4.70
HMAC-SHA224        200 MB took 1.001 seconds,  199.874 MB/s Cycles per byte =  10.95
HMAC-SHA256        205 MB took 1.004 seconds,  204.228 MB/s Cycles per byte =  10.72
HMAC-SHA384        290 MB took 1.009 seconds,  287.401 MB/s Cycles per byte =   7.61
HMAC-SHA512        290 MB took 1.013 seconds,  286.214 MB/s Cycles per byte =   7.65
RSA   2048 public       2800 ops took 1.014 sec, avg 0.362 ms, 2761.995 ops/sec
RSA   2048 private       300 ops took 1.308 sec, avg 4.359 ms, 229.402 ops/sec
DH    2048 key gen       735 ops took 1.001 sec, avg 1.361 ms, 734.608 ops/sec
DH    2048 key agree     800 ops took 1.123 sec, avg 1.404 ms, 712.131 ops/sec
ECC    256 key gen      1108 ops took 1.001 sec, avg 0.903 ms, 1107.306 ops/sec
ECDHE  256 agree        1200 ops took 1.043 sec, avg 0.869 ms, 1150.329 ops/sec
ECDSA  256 sign         1200 ops took 1.078 sec, avg 0.898 ms, 1113.279 ops/sec
ECDSA  256 verify       1700 ops took 1.045 sec, avg 0.615 ms, 1627.064 ops/sec

This application is especially useful for comparing the public key speed before and after changing the math library. You can test the results using the normal math library (./configure), the fastmath library (./configure –enable-fastmath), the fasthugemath library (./configure –enable-fasthugemath), and the sp-math-all library (./configure –enable-sp-math-all).

Note: By default the reported units scale based on the value of each benchmark. To force consistent, fixed units for all reported values build the application defining WOLFSSL_BENCHMARK_FIXED_UNITS_XX, where XX is GB, MB, KB, or B (ytes). For example, ./configure CFLAGS=”-DWOLFSSL_BENCHMARK_FIXED_UNITS_MB” displays all values in MB.

Footprint sizes (compiled binary size) for wolfSSL range between 20-100kB depending on build options and the compiler being used. Typically on an embedded system with an embedded and optimized compiler, build sizes will be around 60kB. This will include a full-featured TLS 1.2 client and server. For details on build options and ways to further customize wolfSSL, please see Chapter 2 of the CyaSSL Manual, or the wolfSSL Tuning Guide.

Regarding runtime memory usage, wolfSSL will generally consume between 1-36 kB per SSL/TLS session. The RAM usage per connection will vary depending the size of the input/output buffers being used, public key algorithm, and key size. The I/O buffers in wolfSSL default to 128 bytes and are controlled by the RECORD_SIZE define in ./wolfssl/internal.h. The maximum size is 16 kB per buffer (as specified by the SSL/TLS RFC). As an example, with standard 16kB buffers, the total runtime memory usage of wolfSSL with a single connection would be 3kB (the library) + 16kB (input buffer) + 16kB (output buffer) = around 35kB.

The TLS context (WOLFSSL_CTX) is shared between all TLS connections of either a client or server. The runtime memory usage can vary depending on how many certificates are being loaded and what size the certificate files are. It will also vary depending on the session cache and whether or not storing session certificates is turned on (–enable-session-certs). If you are concerned with reducing the session cache size, you can define SMALL_SESSION_CACHE (reduce the default session cache from 33 session to 6 sessions) and save almost 2.5 kB. You can disable the session cache by defining NO_SESSION_CACHE, reducing memory by nearly 3 kB.

As we port wolfSSL to various platforms, we oftentimes conduct benchmarks on these platforms. Below you will find a collection of some of those benchmarks for reference. If you have benchmarked wolfSSL on a specific platform, please send us your benchmark numbers (with specific platform and library configuration) and we’ll add them to the list!

The following benchmarks show the performance improvement when using the wolfSSL Java JSSE Provider versus the default SunJSSE provider.

Client and Configuration	Avg. Connection Time
wolfSSL C only (no Java, software)	9.694 ms
wolfSSL C only (no Java, intelasm + sp + sp-asm)	7.302 ms
wolfJSSE Client (software only)	10.92 ms
wolfJSSE Client (sp + intelasm)	8.42 ms
wolfJSSE Client (TLS 1.3 sp + intelasm)	8.04 ms
SunJSSE Provider client (default on Mac)	13.34 ms

More information on using the wolfSSL JSSE Provider can be found in the User Manual.

Algorithm	Software Crypto	TSIP Accelerated Crypto
RNG	231.160 KB/s	1.423 MB/s
SHA	1.239 MB/s	22.254 MB/s
SHA-256	515.565 KB/s	25.217 MB/s

Cipher Suite	Software Crypto (sec)	TSIP Accelerated Crypto (sec)
TLS_RSA_WITH_AES_128_CBC_SHA	0.381	0.028
TLS_RSA_WITH_AES_128_CBC_SHA256	0.383	0.028
TLS_RSA_WITH_AES_256_CBC_SHA	0.382	0.030
TLS_RSA_WITH_AES_256_CBC_SHA256	0.385	0.029

Algorithm	Performance
AES-128 CBC Encrypt	912.347 MB/s (36.58X)
AES-128 CBC Decrypt	6,084.83 MB/s (256.15X)
AES-128 GCM Encrypt	1,242.28 MB/s (193.65X)
AES-128 GCM Decrypt	575.83 MB/s (90.26X)
SHA-256	1,717.28 MB/s (56.11X)

Algorithm	Performance
DHE-RSA-AES128-SHA256	CPS 22.5, Read 388 MB/s, Write 106 MB/s
ECDHE-RSA-AES128-GCM-SHA256	CPS 26.2, Read 598 MB/s RX, Write 125 MB/s
ECDHE-ECDSA-AES128-GCM-SHA256	CPS 83.4, Read 504.8 MB/s, Write 92.2 MB/s

Reference benchmarks

wolfCrypt Benchmark Application

Running the benchmark

Memory Usage

Reference Benchmarks

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Platform:

Benchmark:

Relative Cipher Performance

Post-Quantum Kyber Benchmarks (Linux)

Post-Quantum Kyber Benchmarks (ARM Cortex-M4)

Post-Quantum Kyber Benchmarks (MacOS)

Platform:

Benchmark:

Benchmarking Notes

Publications and Flyers