wolfSSL Performance on Intel x86_64 (Part 6)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is the last part. In this blog, we will talk about the performance of Elliptic Curve (EC) operations over the P-256 curve.

Elliptic curve cryptography (ECC) is the alternative to finite field (FF) cryptography which has algorithms like RSA, DSA and DH. ECDSA is the elliptic curve variant of RSA and DSA while ECDH is the elliptic curve variant of DH. ECDSA and ECDH can be used anywhere their FF counterparts can be used. ECC requires a pre-defined curve to perform the operations on. The most commonly used curve is P-256 as it has 128-bit strength and is in many standards including TLS, for certificates in IETF, and NIST’s FIPS 186-4. Browsers and web servers are preferring ECDH over DH as it is much faster.

wolfSSL 3.13 and later have completely new implementations of the EC algorithms over the P-256 curve. The implementation is constant-time with respect to private key operations. The implementations include variants in C, and assembly code targeted at Intel x86_64 and x86_64 with BMI2 and ADX. There is a small code size variant of the assembly code that is about 1/3rd the size (smaller pre-computed tables) yet remains very fast.

The two charts below show the relative performance of the old wolfSSL code, new small wolfSSL assembly code, new fast wolfSSL assembly code and OpenSSL as compared to the new wolfSSL C implementation on Ivy Bridge and Skylake CPUs. Note that the OpenSSL super-app does not measure the speed of the ECDH key generation operation. The new C implementation is a lot faster than the old generic C/ASM code for both CPUs. The assembly code is many times better than the C code mostly due to the use of larger pre-computed tables of elliptic curve points. The OpenSSL code is around 10% slower than the new fast wolfSSL assembly code using the generic x86_64 code and between 5% and 35% slower than wolfSSL assembly code for x86_64 with BMI2 and ADX instructions.

Contact us at support@wolfssl.com with questions about the performance of the wolfSSL embedded TLS library.

P-256_x86_64P-256_BMI2_ADX

References:

ECDSA (Elliptic Curve Digital Signature Algorithm)
ECDH (Elliptic-curve Diffie–Hellman)