Thumb2 Gets Assembly Code for AES and SHA-2 Algorithms in WolfSSL 5.6.4

In an effort to improve our Thumb2 support for Cortex-M4 and the like, wolfSSL 5.6.4 includes assembly code for the AES-ECB/CBC/CTR/GCM, SHA-256 and SHA-512 algorithms.

Of particular interest is the AES-CBC and AES-GCM performance improvements you will see when changing from the C code implementations in wolfSSL 5.6.3. Take for example running wolfSSL on a Cortex-M4 at 80MHz. With wolfSSL 5.6.3 the performance numbers for the AES-CBC and AES-GCM algorithms are:

AES-128-CBC-enc            425 KiB took 1.000 seconds,  425.000 KiB/s
AES-128-CBC-dec            450 KiB took 1.024 seconds,  439.453 KiB/s
AES-192-CBC-enc            375 KiB took 1.039 seconds,  360.924 KiB/s
AES-192-CBC-dec            375 KiB took 1.008 seconds,  372.024 KiB/s
AES-256-CBC-enc            325 KiB took 1.027 seconds,  316.456 KiB/s
AES-256-CBC-dec            325 KiB took 1.000 seconds,  325.000 KiB/s
AES-128-GCM-enc            325 KiB took 1.062 seconds,  306.026 KiB/s
AES-128-GCM-dec            325 KiB took 1.063 seconds,  305.738 KiB/s
AES-192-GCM-enc            275 KiB took 1.012 seconds,  271.739 KiB/s
AES-192-GCM-dec            275 KiB took 1.015 seconds,  270.936 KiB/s
AES-256-GCM-enc            250 KiB took 1.024 seconds,  244.141 KiB/s
AES-256-GCM-dec            250 KiB took 1.023 seconds,  244.379 KiB/s

Add the following defines so the assembly code is compiled in:

#define WOLFSSL_ARMASM
#define WOLFSSL_ARMASM_INLINE
#define WOLFSSL_ARMASM_NO_HW_CRYPTO
#define WOLFSSL_ARMASM_NO_NEON
#define WOLFSSL_ARM_ARCH 7

And now, with wolfSSL 5.6.4, the performance is:

AES-128-CBC-enc           1000 KiB took 1.008 seconds,  992.063 KiB/s
AES-128-CBC-dec            850 KiB took 1.007 seconds,  844.091 KiB/s
AES-192-CBC-enc            850 KiB took 1.020 seconds,  833.333 KiB/s
AES-192-CBC-dec            825 KiB took 1.023 seconds,  806.452 KiB/s
AES-256-CBC-enc            725 KiB took 1.008 seconds,  719.246 KiB/s
AES-256-CBC-dec            700 KiB took 1.000 seconds,  700.000 KiB/s
AES-128-GCM-enc            425 KiB took 1.000 seconds,  425.000 KiB/s
AES-128-GCM-dec            425 KiB took 1.004 seconds,  423.307 KiB/s
AES-192-GCM-enc            400 KiB took 1.020 seconds,  392.157 KiB/s
AES-192-GCM-dec            400 KiB took 1.019 seconds,  392.542 KiB/s
AES-256-GCM-enc            375 KiB took 1.032 seconds,  363.372 KiB/s
AES-256-GCM-dec            375 KiB took 1.027 seconds,  365.141 KiB/s

AES-CBC encryption is more than double the C code performance while decryption is 90% better! AES-GCM gets an impressive 35-50% boost.

The SHA-256 and SHA-512 see modest improvements but are worthwhile in order to get the best out of wolfSSL for your embedded device.

Let us know if there are other cryptographic algorithms on Thumb2 for which you would like to see better performance.

If you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247.

Download wolfSSL Now