Updated Post-Quantum Benchmarks for ML-KEM and ML-DSA on STM32

A long, long time ago, we took some benchmarks for Kyber on STM32 NUCLEO-F446ZE. Back then, it was the NIST Submission of Kyber, and we were using the implementation from PQM4 as integration in wolfCrypt. Now, Kyber has evolved into ML-KEM, and we have our implementation! We decided to take some benchmarks on a newer STM32 hardware platform as well. Note that we now also have our implementation of ML-DSA which evolved from Dilithium so we also took benchmarking numbers for it as well.

Here are the numbers (some formatting changes have been made for readability):

RSA    2048  public        112 ops took 1.012 sec, avg   9.036 ms, 110.672 ops/sec
RSA    2048  private         4 ops took 1.298 sec, avg 324.500 ms,   3.082 ops/sec
DH     2048  key gen         7 ops took 1.150 sec, avg 164.286 ms,   6.087 ops/sec
DH     2048  agree           8 ops took 1.310 sec, avg 163.750 ms,   6.107 ops/sec
ML-KEM 512   key gen       248 ops took 1.000 sec, avg   4.032 ms, 248.000 ops/sec
ML-KEM 512   encap         262 ops took 1.000 sec, avg   3.817 ms, 262.000 ops/sec
ML-KEM 512   decap         198 ops took 1.000 sec, avg   5.051 ms, 198.000 ops/sec
ML-KEM 768   key gen       154 ops took 1.004 sec, avg   6.519 ms, 153.386 ops/sec
ML-KEM 768   encap         154 ops took 1.012 sec, avg   6.571 ms, 152.174 ops/sec
ML-KEM 768   decap         120 ops took 1.000 sec, avg   8.333 ms, 120.000 ops/sec
ML-KEM 1024  key gen        94 ops took 1.008 sec, avg  10.723 ms,  93.254 ops/sec
ML-KEM 1024  encap          94 ops took 1.016 sec, avg  10.809 ms,  92.520 ops/sec
ML-KEM 1024  decap          78 ops took 1.024 sec, avg  13.128 ms,  76.172 ops/sec
ECC   [SECP256R1] key gen  180 ops took 1.007 sec, avg   5.594 ms, 178.749 ops/sec
ECDH  [SECP256R1] agree     86 ops took 1.016 sec, avg  11.814 ms,  84.646 ops/sec
ECDSA [SECP256R1] sign     106 ops took 1.000 sec, avg   9.434 ms, 106.000 ops/sec
ECDSA [SECP256R1] verify    60 ops took 1.012 sec, avg  16.867 ms,  59.289 ops/sec
ML-DSA    44  key gen       52 ops took 1.011 sec, avg  19.442 ms,  51.434 ops/sec
ML-DSA    44  sign          18 ops took 1.086 sec, avg  60.333 ms,  16.575 ops/sec
ML-DSA    44  verify        46 ops took 1.008 sec, avg  21.913 ms,  45.635 ops/sec
ML-DSA    65  key gen       30 ops took 1.035 sec, avg  34.500 ms,  28.986 ops/sec
ML-DSA    65  sign          12 ops took 1.008 sec, avg  84.000 ms,  11.905 ops/sec
ML-DSA    65  verify        28 ops took 1.027 sec, avg  36.679 ms,  27.264 ops/sec
ML-DSA    87  key gen       18 ops took 1.047 sec, avg  58.167 ms,  17.192 ops/sec
ML-DSA    87  sign          10 ops took 1.255 sec, avg 125.500 ms,   7.968 ops/sec
ML-DSA    87  verify        16 ops took 1.003 sec, avg  62.687 ms,  15.952 ops/sec

This was done on an STM32 NUCLEO-F439ZI ARM Cortex M4 running at 168 MHz. The wolfSSL library was built with assembly optimizations, but does not use any hardware accelerated cryptography. Note: At the time of this writing the ML-DSA (Dilithium) is not using assembly optimizations, just well constructed C code.

  • ML-DSA beats RSA quite nicely and is within an order of magnitude against ECDSA.
  • ML-KEM beats DH and ECDH by a wide margin (thanks to assembly code for Thumb2).

Here are some special macro flags that were defined:

#define WOLFSSL_SP_ARM_CORTEX_M_ASM
#define WOLFSSL_HAVE_SP_RSA
#define WOLFSSL_HAVE_SP_ECC
#define WOLFSSL_SP_SMALL
#define SP_WORD_SIZE 32
#define GCM_TABLE_4BIT

#define HAVE_DILITHIUM
#define WOLFSSL_WC_DILITHIUM
#define WOLFSSL_DILITHIUM_SMALL

#define WOLFSSL_ARMASM
#define WOLFSSL_ARMASM_INLINE
#define WOLFSSL_ARMASM_NO_HW_CRYPTO
#define WOLFSSL_ARMASM_NO_NEON
#define WOLFSSL_ARMASM_THUMB2
#define WOLFSSL_ARM_ARCH 7

We support assembly optimizations on most algorithms and key sizes with Intel x86/x64, ARM Cortex-A/M/R, RISC-V and PowerPC.

If you are interested in seeing other algorithms benchmarked, or have questions about any of the above, please reach out to us at facts@wolfssl.com or call us at +1 425 245 8247 to let us know which ones!

Download wolfSSL Now