Optimizing Post-Quantum Algorithm Memory Usage on Embedded Systems

Here at wolfSSL, we are intimately aware of the needs of our embedded customers. It is always about the tradeoffs and optimizations that fit their unique use cases and needs. The tradeoffs are typically between speed, footprint size, and memory usage. In many of our blog posts, we like to focus on our speed performance, but in this post, we look at options around memory usage. This is especially important for our post-quantum algorithms, ML-KEM and ML-DSA, as they are generally faster or on par with their conventional counterparts, such as ECDSA and ECDH, but do use more memory.

We are going to focus on some experiments we did on a Raspberry Pi5. We built and ran wolfSSL’s testwolfcrypt and got some statistics. Here are the results:

Configuration Algorithm Stack (bytes) Heap (bytes) Total (bytes) Heap Allocs
Small code MLKEM-512 23,568 7,552 31,120 3
MLKEM-768 32,672 11,968 44,640 3
MLKEM-1024 42,400 17,568 59,968 3
MLDSA-44 15,904 50,304 66,208 2
MLDSA-65 17,440 77,952 95,392 2
MLDSA-87 19,376 120,960 140,336 2
Small code with small mem MLKEM-512 23,696 3,968 27,664 3
MLKEM-768 32,928 5,824 38,752 3
MLKEM-1024 42,656 7,840 50,496 3
MLDSA-44 15,856 15,656 31,512 2
MLDSA-65 17,392 20,776 38,168 2
MLDSA-87 19,328 26,920 46,248 2
Small code with small mem + stack MLKEM-512 2,112 19,306 21,418 17
MLKEM-768 2,112 27,306 29,418 17
MLKEM-1024 2,112 35,786 37,898 17
MLDSA-44 2,112 28,211 30,323 7
MLDSA-65 2,112 33,331 35,443 7
MLDSA-87 2,160 39,475 41,635 7

Here are some interesting points we noticed in the data:

  • Stack vs Heap Trade-off: WOLFSSL_SMALL_STACK configuration dramatically reduces stack usage (from ~23K-42K to 2,112 bytes)
  • ML-KEM Memory Scaling: Memory usage scales predictably with security levels – ML-KEM-512 uses ~31K total, ML-KEM-768 uses ~44K, and ML-KEM-1024 uses ~59K in default configuration
  • ML-DSA Higher Heap Usage: ML-DSA algorithms use significantly more heap memory (50K-120K) compared to MLKEM (7K-17K) in small code configuration
  • Small Memory Optimization: Adding the small mem configuration flags reduces memory usage by 10 to 15 percent for ML-KEM and 50 to 65 percent for ML-DSA. Quite impressive!

If you’re wondering whether you will be able to use post-quantum algorithms on your system then these numbers should help you get an idea of the resource you will need to allocate.

Here are the configurations and commands used:

Configuration Name Configurations and Command
Small code
$ ./configure –enable-dilithium=all,44,small –enable-mlkem=all,512,small –enable-trackmemory=verbose –enable-stacksize=verbose
$ make
$ ./wolfcrypt/test/testwolfcrypt
Small code with small memory
$ ./configure –enable-dilithium=all,44,small –enable-mlkem=all,512,small CFLAGS=”-DWOLFSSL_DILITHIUM_VERIFY_SMALL_MEM -DWOLFSSL_DILITHIUM_SIGN_SMALL_MEM -DWOLFSSL_DILITHIUM_MAKE_KEY_SMALL_MEM -DWOLFSSL_MLKEM_ENCAPSULATE_SMALL_MEM -DWOLFSSL_MLKEM_MAKEKEY_SMALL_MEM” –enable-trackmemory=verbose –enable-stacksize=verbose
$ make
$ ./wolfcrypt/test/testwolfcrypt
Small code with small mem and small stack
$ ./configure –enable-dilithium=all,44,small –enable-mlkem=all,512,small CFLAGS=”-DWOLFSSL_DILITHIUM_VERIFY_SMALL_MEM -DWOLFSSL_DILITHIUM_SIGN_SMALL_MEM -DWOLFSSL_DILITHIUM_MAKE_KEY_SMALL_MEM -DWOLFSSL_MLKEM_ENCAPSULATE_SMALL_MEM -DWOLFSSL_MLKEM_MAKEKEY_SMALL_MEM” –enable-trackmemory=verbose –enable-stacksize=verbose –enable-smallstack
$ make
$ ./wolfcrypt/test/testwolfcrypt

Let us know if you need us to get even tighter in terms of memory usage. Our cryptographers are wizards when it comes to exploiting tradeoffs!

If you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247.

Download wolfSSL Now