Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 5. In this blog, we will talk about the performance of RSA and Diffie-Hellman (DH).
RSA is the most commonly used public key algorithm for certificates. When performing a TLS handshake, the server will sign a hash of the messages seen so far and the client will verify the signature of certificates in the certificate chain and verify the hash of messages with the public key in the certificate. Signing and verifying are the most time-consuming operations in a handshake.
DH has been the key exchange algorithm of choice in handshakes but is falling out of favor as the Elliptic Curve variants are considerably faster at the same security level. Performing the key exchange is the second most time-consuming operation in a TLS handshake.
wolfSSL 3.13 and later have completely new implementations of RSA and DH targeted at specific key sizes: 2048 and 3072 bits. The implementation is constant-time with respect to private key operations. The implementations include variants in C and assembly code targeted at Intel x86_64 and x86_64 with BMI2 and ADX. The new code is significantly better than the old generic code and is about the same speed as OpenSSL on older CPUs and a little faster on newer CPUs.
The two charts below show the relative performance of the old wolfSSL code, new wolfSSL assembly code and OpenSSL as compared to the new wolfSSL C implementation on Ivy Bridge and Skylake CPUs. Note that the OpenSSL super-app does not measure the speed of DH operations. The new C implementation is a lot faster than the old generic C/ASM code for both CPUs. The assembly code for x86_64 is better than the C code by between 23% and 46% on x86_64 and 92% and 144% using BMI2 and ADX instructions. The OpenSSL code is about the same speed as the wolfSSL assembly code.