Optimizing Post-Quantum Algorithm Memory Usage on Embedded Systems

Here at wolfSSL, we are intimately aware of the needs of our embedded customers. It is always about the tradeoffs and optimizations that fit their unique use cases and needs. The tradeoffs are typically between speed, footprint size, and memory usage. In many of our blog posts, we like to focus on our speed performance, but in this post, we look at options around memory usage. This is especially important for our post-quantum algorithms, ML-KEM and ML-DSA, as they are generally faster or on par with their conventional counterparts, such as ECDSA and ECDH, but do use more memory.

We are going to focus on some experiments we did on a Raspberry Pi5. We built and ran wolfSSL’s testwolfcrypt and got some statistics. Here are the results:

Configuration	Algorithm	Stack (bytes)	Heap (bytes)	Total (bytes)	Heap Allocs
Small code	MLKEM-512	23,568	7,552	31,120	3
	MLKEM-768	32,672	11,968	44,640	3
	MLKEM-1024	42,400	17,568	59,968	3
	MLDSA-44	15,904	50,304	66,208	2
	MLDSA-65	17,440	77,952	95,392	2
	MLDSA-87	19,376	120,960	140,336	2
Small code with small mem	MLKEM-512	23,696	3,968	27,664	3
	MLKEM-768	32,928	5,824	38,752	3
	MLKEM-1024	42,656	7,840	50,496	3
	MLDSA-44	15,856	15,656	31,512	2
	MLDSA-65	17,392	20,776	38,168	2
	MLDSA-87	19,328	26,920	46,248	2
Small code with small mem + stack	MLKEM-512	2,112	19,306	21,418	17
	MLKEM-768	2,112	27,306	29,418	17
	MLKEM-1024	2,112	35,786	37,898	17
	MLDSA-44	2,112	28,211	30,323	7
	MLDSA-65	2,112	33,331	35,443	7
	MLDSA-87	2,160	39,475	41,635	7

Here are some interesting points we noticed in the data:

Stack vs Heap Trade-off: WOLFSSL_SMALL_STACK configuration dramatically reduces stack usage (from ~23K-42K to 2,112 bytes)
ML-KEM Memory Scaling: Memory usage scales predictably with security levels – ML-KEM-512 uses ~31K total, ML-KEM-768 uses ~44K, and ML-KEM-1024 uses ~59K in default configuration
ML-DSA Higher Heap Usage: ML-DSA algorithms use significantly more heap memory (50K-120K) compared to MLKEM (7K-17K) in small code configuration
Small Memory Optimization: Adding the small mem configuration flags reduces memory usage by 10 to 15 percent for ML-KEM and 50 to 65 percent for ML-DSA. Quite impressive!

If you’re wondering whether you will be able to use post-quantum algorithms on your system then these numbers should help you get an idea of the resource you will need to allocate.

Here are the configurations and commands used:

Configuration Name	Configurations and Command
Small code	$ ./configure –enable-dilithium=all,44,small –enable-mlkem=all,512,small –enable-trackmemory=verbose –enable-stacksize=verbose $ make $ ./wolfcrypt/test/testwolfcrypt
Small code with small memory	$ ./configure –enable-dilithium=all,44,small –enable-mlkem=all,512,small CFLAGS=”-DWOLFSSL_DILITHIUM_VERIFY_SMALL_MEM -DWOLFSSL_DILITHIUM_SIGN_SMALL_MEM -DWOLFSSL_DILITHIUM_MAKE_KEY_SMALL_MEM -DWOLFSSL_MLKEM_ENCAPSULATE_SMALL_MEM -DWOLFSSL_MLKEM_MAKEKEY_SMALL_MEM” –enable-trackmemory=verbose –enable-stacksize=verbose $ make $ ./wolfcrypt/test/testwolfcrypt
Small code with small mem and small stack	$ ./configure –enable-dilithium=all,44,small –enable-mlkem=all,512,small CFLAGS=”-DWOLFSSL_DILITHIUM_VERIFY_SMALL_MEM -DWOLFSSL_DILITHIUM_SIGN_SMALL_MEM -DWOLFSSL_DILITHIUM_MAKE_KEY_SMALL_MEM -DWOLFSSL_MLKEM_ENCAPSULATE_SMALL_MEM -DWOLFSSL_MLKEM_MAKEKEY_SMALL_MEM” –enable-trackmemory=verbose –enable-stacksize=verbose –enable-smallstack $ make $ ./wolfcrypt/test/testwolfcrypt

Let us know if you need us to get even tighter in terms of memory usage. Our cryptographers are wizards when it comes to exploiting tradeoffs!

If you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247.

Download wolfSSL Now