wolfSSL Performance on Intel x86_64 (Part 3)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 3. In this blog, we will talk about the performance of SHA-256 and SHA-512.

The most commonly used digest algorithms are SHA-256 and SHA-384. With the introduction of AES-GCM in TLS, SHA-256 and SHA-384 are less commonly used for application data authentication. But, they are still used for handshake message authentication, as a one-way function (as required in a pseudo-random number generator) and digital signatures.

The assembly code has been rewritten to take best advantage of the AVX1 and AVX2 instructions. The performance of SHA-256 and SHA-512 is now as good or better than OpenSSL. The four charts below show the performance of wolfSSL has significantly improved from small up to big block sizes. On AVX1, the performance has increased by between 19% and 60% for SHA-256 and between 25% and 53%. Similarly, on AVX2, the improvement has increased by between 22% and 40% for SHA-256 and between 23% and 37% for SHA-512. The new wolfSSL assembly code is also significantly better than OpenSSL for small blocks and is about the same at the largest block size. SHA-384 uses the same algorithm as SHA-512 and therefore has the same underlying implementation and thus the same performance improvements.

Please contact us at support@wolfssl.com with any questions about the performance of the wolfSSL embedded TLS library.

SHA-256-AVX1 SHA-256-AVX2-BMI2 SHA-512-AVX1 SHA-512-AVX2-BMI2

References:

Introduction to IntelĀ® Advanced Vector Extensions
Advanced Vector Extensions (Wikipedia)