Aarch64 Gets a Performance Boost in wolfCrypt

We at wolfSSL are continuously improving performance of the wolfCrypt code. Recently we took a look at our AES-GCM on Aarch64 and thought: we can do better.

By using the cryptographic instructions built into Aarch64 chips we had already gotten a significant boost over straight C but we saw that we could do more. By unrolling loops, interleaving the GCM calculation with AES encryption and using NEON as well as the base instructions at the same time, we were able to see a significant improvement!

How significant? Up to 9.5 times faster! The wolfSSL 5.6.4 numbers on an Apple M1 were:

------------------------------------------------------------------------------
 wolfSSL version 5.6.4
------------------------------------------------------------------------------
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
AES-128-GCM-enc           1845 MB took 1.000 seconds, 1845.382 MB/s
AES-128-GCM-dec            907 MB took 1.005 seconds,  902.210 MB/s
AES-192-GCM-enc           1845 MB took 1.002 seconds, 1842.527 MB/s
AES-192-GCM-dec            902 MB took 1.002 seconds,  900.038 MB/s
AES-256-GCM-enc           1845 MB took 1.000 seconds, 1844.793 MB/s
AES-256-GCM-dec            897 MB took 1.001 seconds,  895.873 MB/s
Benchmark complete

And now with the new assembly code:

------------------------------------------------------------------------------
 wolfSSL version master
------------------------------------------------------------------------------
wolfCrypt Benchmark (block bytes 1048576, min 1.0 sec each)
AES-128-GCM-enc           8583 MB took 1.000 seconds, 8580.862 MB/s
AES-128-GCM-dec           8583 MB took 1.000 seconds, 8580.389 MB/s
AES-192-GCM-enc           7875 MB took 1.001 seconds, 7870.179 MB/s
AES-192-GCM-dec           7922 MB took 1.000 seconds, 7921.097 MB/s
AES-256-GCM-enc           7067 MB took 1.000 seconds, 7064.394 MB/s
AES-256-GCM-dec           7230 MB took 1.001 seconds, 7225.034 MB/s
Benchmark complete

Try it out and you will see that the encryption and decryption of TLS packets will appear insignificant.

Are there other algorithms on Aarch64 whose performance you would like to see us improve? Let us know!

If you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247

Download wolfSSL Now