wolfSSL Performance on Intel x86_64 (Part 1)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made which are being discussed over a six blog post series. In this first blog, we will talk about the performance of AES-GCM.

The assembly code for AES-GCM has been rewritten to take best advantage of the AVX1 and AVX2 instructions. The performance of AES-GCM is now as good or better than OpenSSL.

The two charts below show the relative performance of AES-128-GCM encryption on an Intel AVX1 and AVX2 chipsets. They compare the performance of wolfSSL and OpenSSL with an older version of wolfSSL (before the assembly code changes).

Small block size performance is important when dealing with locally stored data like keys or data in a database. Meanwhile, large block size performance is important for large data transfers in TLS.

The performance of wolfSSL has significantly improved from small up to big block sizes. On AVX1, the smallest block size performance has increased by over 130% and at the top end, there is a 42% improvement. Similarly, on AVX2, the improvement is over 150% for small block sizes to 11% for large block sizes. The new wolfSSL assembly code is also significantly better than OpenSSL for small blocks and is about the same at the largest block size. Similar performance improvements have been achieved for AES-256-GCM as well.

AES-128-GCM Enc - AVX1 AES-128-GCM Enc - AVX2 (with RORX)

If you have questions about using the wolfSSL embedded TLS library on your platform, or about performance optimization of the library, contact us at support@wolfssl.com.

References:

Introduction to Intel® Advanced Vector Extensions
Advanced Vector Extensions (Wikipedia)