Post-Quantum Kyber Benchmarks (ARM Cortex-M4)

Hot on the heels of our MacOS benchmarks, we now have our Kyber Benchmarks for Arm Cortex-M4.

Before getting into the numbers, some information on the conditions under which the benchmarks were taken:

  • The hardware platform was STM NUCLEO-F446ZE
  • The HCLK in the project was set to 168MHz
  • Only 1 core used
  • wolfSSL Math Configuration set to “Single Precision ASM Cortex-M3+ Math”
  • Optimization flag: -Ofast
  • Conventional algorithms are present for comparison purposes

Here are our results:

RSA    	    2048 	public    82 ops took 1.020 sec, avg 12.439 ms, 80.392 ops/sec
RSA    	    2048 	private   4 ops took 1.827 sec, avg 456.750 ms, 2.189 ops/sec
DH     	    2048 	key gen   5 ops took 1.181 sec, avg 236.200 ms, 4.234 ops/sec
DH     	    2048 	agree     6 ops took 1.419 sec, avg 236.500 ms, 4.228 ops/sec
ECC   SECP256R1 	key gen   118 ops took 1.012 sec, avg 8.576 ms, 116.601 ops/sec
ECDHE SECP256R1 	agree     56 ops took 1.016 sec, avg 18.143 ms, 55.118 ops/sec
KYBER512    128 	key gen   232 ops took 1.004 sec, avg 4.328 ms, 231.076 ops/sec
KYBER512    128 	encap     192 ops took 1.008 sec, avg 5.250 ms, 190.476 ops/sec
KYBER512    128 	decap     178 ops took 1.004 sec, avg 5.640 ms, 177.291 ops/sec
KYBER768    192 	key gen   146 ops took 1.008 sec, avg 6.904 ms, 144.841 ops/sec
KYBER768    192 	encap     118 ops took 1.008 sec, avg 8.542 ms, 117.063 ops/sec
KYBER768    192 	decap     110 ops took 1.000 sec, avg 9.091 ms, 110.000 ops/sec
KYBER1024   256 	key gen   92 ops took 1.011 sec, avg 10.989 ms, 90.999 ops/sec
KYBER1024   256 	encap     76 ops took 1.000 sec, avg 13.158 ms, 76.000 ops/sec
KYBER1024   256 	decap     72 ops took 1.000 sec, avg 13.889 ms, 72.000 ops/sec

Our implementation of Kyber’s performance is looking great compared to all the other algorithms. It might appear that ECDHE comes close, but not when you consider the mechanics of a key exchange.

Note that ECDHE is a NIKE (Non-Interactive Key Exchange) while Kyber is a KEM (Key Encapsulation Mechanism) so in the context of TLS 1.3, the numbers as they stand are misleading.

For NIKEs, both the server and the client must do the key generation operation. Then both the server and the client must also do the key agreement step. On the other hand, for KEMs, the client does key generation once, the server does encapsulation once, and the client does decapsulation once. Since NIKEs have double the number of operations to achieve a shared secret, for a fair comparison, we need to double the average time for ECDHE. In this light, the total time for a key exchange looks like this:

Algorithm Total Time for Key Exchange
ECDH SECP256R1 26.719 ms
Kyber512 (NIST Level 1) 15.218 ms
Kyber768 (NIST Level 3) 24.537 ms
Kyber1024 (NIST Level 5) 38.036 ms

Note that Kyber512, from a security perspective, is comparable to ECDH at SECP256R1.

The numbers speak for themselves: Kyber wins. That said, you can look forward to future optimizations and even better performance gains.

As we’ve noted in the past, Kyber has considerably larger artifacts than ECDHE, depending on your method of transmission, this margin can easily be lost if your transmission speeds are slow.

Want to see further optimizations to our Kyber implementation? Interested in wolfSSL’s other post-quantum algorithm implementations? Let us know so we can prioritize the things you are looking for.

If you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247.

Download wolfSSL Now