LMS Versus XMSS Versus SLH-DSA

Here at wolfSSL, we don’t just love coding! We love telling the world about what we code. To that end, we want you to understand the differences between LMS, XMSS, and SLH-DSA. Here are their official standard specifications:

The most important similarity of these three algorithms is that they are all hash-based signature schemes. Being hash-based, they are all quantum-safe signature schemes that rely on the tried and true security properties of proven battle-hardened hashing algorithms. They all use Merkle Trees to combine many data structure instances into a single public key.

These instances form the leaf nodes of the Merkle tree and are called WOTS (Winternitz One-Time Signature) in LMS and WOTS+ in XMSS. WOTS uses a “prefix construction” while WOTS+ uses a “prefix and bitmask construction” with random bitmasks to give it stronger security assumptions.

XMSS uses “L-trees” for compression, requiring more hashing operations. LMS does not have a corresponding compression scheme.

From the perspective of performance, LMS is consistently better (fewer clock cycles) for key generation, signing, and verification.

Generally speaking, XMSS has higher memory consumption, mostly during signing and verification.

While XMSS has various theoretical optimizations that would hamper interoperability, LMS remains more efficient in practice, but the difference is quite negligible. If security assurance via the bitmask constructions are important to you, then you should go with XMSS, but LMS is a better default.

The thing that LMS and XMSS both have in common is that they have a state and a limited number of available signatures; once that limit is hit, the private key must be discarded. The state is very important because if it is mismanaged, the signer might reuse a WOTS or WOTS+ which would then allow an attacker to forge signatures. With this formidable problem in mind, SLH-DSA was designed to eliminate this pitfall by not requiring state. SLH-DSA takes a randomized approach and makes conjectures on the probability of collisions. Note that the SLH-DSA equivalent of WOTS is a “few time signature”.

With the elimination of state, SLH-DSA opens the door to parallelization and distributed usage while LMS and XMSS would have signing operations tightly coupled to a single instance of the private key limiting it to serial signing operations.

Finally, one of the most important distinctions is that all three algorithms are standardized and recognized by NIST while only LMS and XMSS are approved for use under CNSA 2.0.

This concludes our comparison of LMS, XMSS and SLH-DSA. That said, this has only touched on the surface of these algorithms. Want deeper technical details? Looking to know which is most appropriate for your use-case? Have some more questions? Let us know by sending a message to facts@wolfSSL.com; we are always happy to continue the conversation!

If you have questions about any of the above, please contact us at facts@wolfssl.com or call us at +1 425 245 8247.

Download wolfSSL Now

Updated Post-Quantum Benchmarks for ML-KEM and ML-DSA on STM32

A long, long time ago, we took some benchmarks for Kyber on STM32 NUCLEO-F446ZE. Back then, it was the NIST Submission of Kyber, and we were using the implementation from PQM4 as integration in wolfCrypt. Now, Kyber has evolved into ML-KEM, and we have our implementation! We decided to take some benchmarks on a newer STM32 hardware platform as well. Note that we now also have our implementation of ML-DSA which evolved from Dilithium so we also took benchmarking numbers for it as well.

Here are the numbers (some formatting changes have been made for readability):

RSA    2048  public        112 ops took 1.012 sec, avg   9.036 ms, 110.672 ops/sec
RSA    2048  private         4 ops took 1.298 sec, avg 324.500 ms,   3.082 ops/sec
DH     2048  key gen         7 ops took 1.150 sec, avg 164.286 ms,   6.087 ops/sec
DH     2048  agree           8 ops took 1.310 sec, avg 163.750 ms,   6.107 ops/sec
ML-KEM 512   key gen       248 ops took 1.000 sec, avg   4.032 ms, 248.000 ops/sec
ML-KEM 512   encap         262 ops took 1.000 sec, avg   3.817 ms, 262.000 ops/sec
ML-KEM 512   decap         198 ops took 1.000 sec, avg   5.051 ms, 198.000 ops/sec
ML-KEM 768   key gen       154 ops took 1.004 sec, avg   6.519 ms, 153.386 ops/sec
ML-KEM 768   encap         154 ops took 1.012 sec, avg   6.571 ms, 152.174 ops/sec
ML-KEM 768   decap         120 ops took 1.000 sec, avg   8.333 ms, 120.000 ops/sec
ML-KEM 1024  key gen        94 ops took 1.008 sec, avg  10.723 ms,  93.254 ops/sec
ML-KEM 1024  encap          94 ops took 1.016 sec, avg  10.809 ms,  92.520 ops/sec
ML-KEM 1024  decap          78 ops took 1.024 sec, avg  13.128 ms,  76.172 ops/sec
ECC   [SECP256R1] key gen  180 ops took 1.007 sec, avg   5.594 ms, 178.749 ops/sec
ECDH  [SECP256R1] agree     86 ops took 1.016 sec, avg  11.814 ms,  84.646 ops/sec
ECDSA [SECP256R1] sign     106 ops took 1.000 sec, avg   9.434 ms, 106.000 ops/sec
ECDSA [SECP256R1] verify    60 ops took 1.012 sec, avg  16.867 ms,  59.289 ops/sec
ML-DSA    44  key gen       52 ops took 1.011 sec, avg  19.442 ms,  51.434 ops/sec
ML-DSA    44  sign          18 ops took 1.086 sec, avg  60.333 ms,  16.575 ops/sec
ML-DSA    44  verify        46 ops took 1.008 sec, avg  21.913 ms,  45.635 ops/sec
ML-DSA    65  key gen       30 ops took 1.035 sec, avg  34.500 ms,  28.986 ops/sec
ML-DSA    65  sign          12 ops took 1.008 sec, avg  84.000 ms,  11.905 ops/sec
ML-DSA    65  verify        28 ops took 1.027 sec, avg  36.679 ms,  27.264 ops/sec
ML-DSA    87  key gen       18 ops took 1.047 sec, avg  58.167 ms,  17.192 ops/sec
ML-DSA    87  sign          10 ops took 1.255 sec, avg 125.500 ms,   7.968 ops/sec
ML-DSA    87  verify        16 ops took 1.003 sec, avg  62.687 ms,  15.952 ops/sec

This was done on an STM32 NUCLEO-F439ZI ARM Cortex M4 running at 168 MHz. The wolfSSL library was built with assembly optimizations, but does not use any hardware accelerated cryptography. Note: At the time of this writing the ML-DSA (Dilithium) is not using assembly optimizations, just well constructed C code.

  • ML-DSA beats RSA quite nicely and is within an order of magnitude against ECDSA.
  • ML-KEM beats DH and ECDH by a wide margin (thanks to assembly code for Thumb2).

Here are some special macro flags that were defined:

#define WOLFSSL_SP_ARM_CORTEX_M_ASM
#define WOLFSSL_HAVE_SP_RSA
#define WOLFSSL_HAVE_SP_ECC
#define WOLFSSL_SP_SMALL
#define SP_WORD_SIZE 32
#define GCM_TABLE_4BIT

#define HAVE_DILITHIUM
#define WOLFSSL_WC_DILITHIUM
#define WOLFSSL_DILITHIUM_SMALL

#define WOLFSSL_ARMASM
#define WOLFSSL_ARMASM_INLINE
#define WOLFSSL_ARMASM_NO_HW_CRYPTO
#define WOLFSSL_ARMASM_NO_NEON
#define WOLFSSL_ARMASM_THUMB2
#define WOLFSSL_ARM_ARCH 7

We support assembly optimizations on most algorithms and key sizes with Intel x86/x64, ARM Cortex-A/M/R, RISC-V and PowerPC.

If you are interested in seeing other algorithms benchmarked, or have questions about any of the above, please reach out to us at facts@wolfssl.com or call us at +1 425 245 8247 to let us know which ones!

Download wolfSSL Now

Optimizing Post-Quantum Algorithm Memory Usage on Embedded Systems

Here at wolfSSL, we are intimately aware of the needs of our embedded customers. It is always about the tradeoffs and optimizations that fit their unique use cases and needs. The tradeoffs are typically between speed, footprint size, and memory usage. In many of our blog posts, we like to focus on our speed performance, but in this post, we look at options around memory usage. This is especially important for our post-quantum algorithms, ML-KEM and ML-DSA, as they are generally faster or on par with their conventional counterparts, such as ECDSA and ECDH, but do use more memory.

We are going to focus on some experiments we did on a Raspberry Pi5. We built and ran wolfSSL’s testwolfcrypt and got some statistics. Here are the results:

Configuration Algorithm Stack (bytes) Heap (bytes) Total (bytes) Heap Allocs
Small code MLKEM-512 23,568 7,552 31,120 3
MLKEM-768 32,672 11,968 44,640 3
MLKEM-1024 42,400 17,568 59,968 3
MLDSA-44 15,904 50,304 66,208 2
MLDSA-65 17,440 77,952 95,392 2
MLDSA-87 19,376 120,960 140,336 2
Small code with small mem MLKEM-512 23,696 3,968 27,664 3
MLKEM-768 32,928 5,824 38,752 3
MLKEM-1024 42,656 7,840 50,496 3
MLDSA-44 15,856 15,656 31,512 2
MLDSA-65 17,392 20,776 38,168 2
MLDSA-87 19,328 26,920 46,248 2
Small code with small mem + stack MLKEM-512 2,112 19,306 21,418 17
MLKEM-768 2,112 27,306 29,418 17
MLKEM-1024 2,112 35,786 37,898 17
MLDSA-44 2,112 28,211 30,323 7
MLDSA-65 2,112 33,331 35,443 7
MLDSA-87 2,160 39,475 41,635 7

Here are some interesting points we noticed in the data:

  • Stack vs Heap Trade-off: WOLFSSL_SMALL_STACK configuration dramatically reduces stack usage (from ~23K-42K to 2,112 bytes)
  • ML-KEM Memory Scaling: Memory usage scales predictably with security levels – ML-KEM-512 uses ~31K total, ML-KEM-768 uses ~44K, and ML-KEM-1024 uses ~59K in default configuration
  • ML-DSA Higher Heap Usage: ML-DSA algorithms use significantly more heap memory (50K-120K) compared to MLKEM (7K-17K) in small code configuration
  • Small Memory Optimization: Adding the small mem configuration flags reduces memory usage by 10 to 15 percent for ML-KEM and 50 to 65 percent for ML-DSA. Quite impressive!

If you’re wondering whether you will be able to use post-quantum algorithms on your system then these numbers should help you get an idea of the resource you will need to allocate.

Here are the configurations and commands used:

Configuration Name Configurations and Command
Small code
$ ./configure –enable-dilithium=all,44,small –enable-mlkem=all,512,small –enable-trackmemory=verbose –enable-stacksize=verbose
$ make
$ ./wolfcrypt/test/testwolfcrypt
Small code with small memory
$ ./configure –enable-dilithium=all,44,small –enable-mlkem=all,512,small CFLAGS=”-DWOLFSSL_DILITHIUM_VERIFY_SMALL_MEM -DWOLFSSL_DILITHIUM_SIGN_SMALL_MEM -DWOLFSSL_DILITHIUM_MAKE_KEY_SMALL_MEM -DWOLFSSL_MLKEM_ENCAPSULATE_SMALL_MEM -DWOLFSSL_MLKEM_MAKEKEY_SMALL_MEM” –enable-trackmemory=verbose –enable-stacksize=verbose
$ make
$ ./wolfcrypt/test/testwolfcrypt
Small code with small mem and small stack
$ ./configure –enable-dilithium=all,44,small –enable-mlkem=all,512,small CFLAGS=”-DWOLFSSL_DILITHIUM_VERIFY_SMALL_MEM -DWOLFSSL_DILITHIUM_SIGN_SMALL_MEM -DWOLFSSL_DILITHIUM_MAKE_KEY_SMALL_MEM -DWOLFSSL_MLKEM_ENCAPSULATE_SMALL_MEM -DWOLFSSL_MLKEM_MAKEKEY_SMALL_MEM” –enable-trackmemory=verbose –enable-stacksize=verbose –enable-smallstack
$ make
$ ./wolfcrypt/test/testwolfcrypt

Let us know if you need us to get even tighter in terms of memory usage. Our cryptographers are wizards when it comes to exploiting tradeoffs!

If you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247.

Download wolfSSL Now

ML-KEM Versus HQC KEM

ML-KEM (Module-Lattice Key Encapsulation Mechanism) and HQC (Hamming Quasi-Cyclic) are both post-quantum cryptographic key encapsulation mechanisms (KEMs) designed to provide secure key exchange in the presence of CRQCs (Cryptographically Relevant Quantum Computers).

ML-KEM is based on the (M-LWE) Module Learning With Errors problem, which involves finding a short vector between two points in a high dimensional vector-space with special points called a lattice. HQC is based on the hardness of decoding random quasi-cyclic codes. This means matrices with columns that are shifts of the first column with some modifications. These domains of mathematics are both considered to be well studied in our modern times.

In terms of numbers, ML-KEM has smaller key sizes and ciphertext sizes when compared to HQC at the same security levels. ML-KEM is generally faster than HQC for all the KEM operations (key generation, encapsulation, decapsulation).

In terms of the status of the algorithms, ML-KEM has already been standardized by NIST, and code points for TLS 1.3 are already in draft standards at the IETF. HQC was recently picked for standardization by NIST, but NIST has yet to issue a FIPS document specifying and standardizing it.

wolfSSL’s perspective is that you should start your post-quantum migration journey today and use algorithms that are already standardized such as ML-KEM. That said, if anyone out there wants to take HQC for a spin, please let us know!! As always, wolfSSL is a customer driven organization and when we hear enough interest, we will make it happen!

If you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247.

Download wolfSSL Now

Coming soon: HQC KEM

Hello there! You! We know you are out there. You learned about Hamming codes in college or university, and maybe even use them in your professional career in consumer electronics or telecommunications. Now you are wondering how simple error correcting codes can be transformed into a KEM (Key Encapsulation Mechanism) for doing secure key transport. To you, we present the quantum-safe HQC (Hamming Quasi-Cyclic) KEM.

Are you hoping to see a professional production level implementation of HQC KEM? You are in luck. We want to make one! Make sure to register your interest in a wolfCrypt implementation of HQC KEM by sending a feature request for it to facts@wolfssl.com.

If you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247.

Download wolfSSL Now

ML-KEM Versus ML-DSA

ML-KEM (Module Lattice Key Encapsulation Mechanism) is for secure key exchange. ML-KEM enables two parties to establish a shared secret key over an insecure channel.

ML-DSA (Module Lattice Digital Signature Algorithm) is for authentication. ML-DSA allows a signer to generate a digital signature that can be verified by others, ensuring the authenticity and integrity of a message.

Both ML-KEM and ML-DSA are public key algorithms; that is to say, both have a key generation operation that generates a public key and private key.

For ML-KEM, an encapsulation operation uses the public key to generate a secret and ciphertext. The decapsulation operation uses the private key and ciphertext to get the same secret.

For ML-DSA, the private key and a message are used to generate a signature. The public key, message, and signature are used in an operation to verify that the signature of the message was generated by the corresponding private key.

The most important thing they have in common is that they are both post-quantum algorithms that have already been standardized by NIST and can be used TODAY!

In summary, ML-KEM and ML-DSA serve different purposes in cryptography, with ML-KEM focused on secure key transport and ML-DSA focused on digital signatures and authentication, but both protecting against a CRQC (Cryptographically Relevant Quantum Computer).

If you have questions about any of the above, please contact us at facts@wolfssl.com or call us at +1 425 245 8247.

Download wolfSSL Now

Post-Quantum Cryptography with curl

At wolfSSL, our commitment to advancing post-quantum cryptography (PQC) is stronger than ever. With the rise of quantum computing threats, securing data from “harvest now, decrypt later” attacks is a critical focus for us. That’s why we are actively enhancing curl with robust PQC support to safeguard your communications well into the quantum era.

wolfSSL implements NIST-standardized post-quantum algorithms such as ML-KEM (Kyber) for key encapsulation and ML-DSA (Dilithium) for digital signatures, documented as FIPS 203 and FIPS 204. These algorithms are optimized for both high performance and strong security.

When built with wolfSSL, curl supports quantum-resistant key exchange with ML-KEM under TLS 1.3, protecting long-term confidentiality against future decryption threats from cryptographically relevant quantum computers. To facilitate a smooth transition, wolfSSL also enables hybrid cryptography, blending classical and post-quantum algorithms for enhanced security in curl-based applications.

For details on building curl with wolfSSL’s post-quantum support, check out our GitHub pull request. To explore our broader efforts in post-quantum cryptography, check out our Post-Quantum page.

If you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247.

Download wolfSSL Now

Post-Quantum Benchmark Comparison: ML-KEM wolfSSL 5.8.0 vs. OpenSSL 3.5

Recently, both OpenSSL 3.5 and wolfSSL 5.8.0 have been released. We thought we’d run some benchmarks on an x86_64 Linux PC.

Note: output has been edited for brevity and clarity.

OpenSSL

Configuration and build:

$ ./Configure
$ make all

Benchmarking Output:

47317 ML-KEM-512 KEM keygen ops in 0.99s
72114 ML-KEM-512 KEM encaps ops in 1.00s
46625 ML-KEM-512 KEM decaps ops in 1.00s
31811 ML-KEM-768 KEM keygen ops in 1.00s
55855 ML-KEM-768 KEM encaps ops in 0.99s
35390 ML-KEM-768 KEM decaps ops in 1.00s
20942 ML-KEM-1024 KEM keygen ops in 1.00s
42164 ML-KEM-1024 KEM encaps ops in 0.99s
27043 ML-KEM-1024 KEM decaps ops in 1.00s

wolfSSL

Configuration and build:

$ ./configure  --enable-mlkem=yes,cache-a --enable-dilithium \
               --enable-all-asm
$ make all

Benchmarking Output:

ML-KEM 512    128  key gen    293900 ops took 1.000 sec
ML-KEM 512    128    encap    271900 ops took 1.000 sec
ML-KEM 512    128    decap    237300 ops took 1.000 sec
ML-KEM 768    192  key gen    163900 ops took 1.000 sec
ML-KEM 768    192    encap    152500 ops took 1.000 sec
ML-KEM 768    192    decap    200700 ops took 1.000 sec
ML-KEM 1024   256  key gen    109200 ops took 1.000 sec
ML-KEM 1024   256    encap    106200 ops took 1.000 sec
ML-KEM 1024   256    decap    143600 ops took 1.001 sec

Analysis & Conclusions

It can be observed that wolfSSL is faster than OpenSSL by a wide margin at every operation and parameter set. Here at wolfSSL, we are extremely proud of our long tradition of excellence when it comes to efficiency and performance.

Now, it is worth pointing out that this is not an apples-to-apples comparison. The build configuration for wolfSSL does indicate that assembly optimizations are enabled while to date, OpenSSL does not have such optimizations. Similarly, we are enabling the “Cache A” optimization which is described as:

Stores the matrix A during key generation for use in encapsulation when performing decapsulation. The key is 8KB larger but decapsulation is significantly faster. Turn on when performing make key and decapsulation with the same object.

We would be happy to re-run these comparisons once OpenSSL has such optimizations enabled.

If you have questions about any of the above, please contact us at facts@wolfSSL.com or call us at +1 425 245 8247.

Download wolfSSL Now

LMS in wolfPKCS11

wolfSSL is excited to announce upcoming support for the Leighton-Micali Signature (LMS) scheme in wolfPKCS11. This implementation builds upon our existing LMS support in wolfCrypt to provide a complete PKCS#11 API interface for LMS operations.

LMS, a stateful hash-based signature scheme standardized in RFC 8554 and approved by NIST SP 800-208, is already incorporated into the latest version of the PKCS#11 specification. This quantum-resistant signature scheme is designed to be resistant to attacks from quantum computers and is best used in off-line signing operations such as firmware signing.

The addition of LMS support to wolfPKCS11 will enable applications using the PKCS#11 interface to leverage wolfSSL’s proven LMS implementation.

Key Features:

  • Complete PKCS#11 API support for LMS operations
  • Quantum-resistant stateful hash based signature scheme
  • Compliant with NIST SP 800-208 specifications

This enhancement demonstrates wolfSSL’s continued commitment to providing comprehensive support for post-quantum cryptography across our product line.

If you have questions about any of the above, please contact us at facts@wolfSSL.com or +1 425 245 8247.

Download wolfSSL Now

Our Post-Quantum Value Proposition

Research-focused cryptography startups deserve a lot of credit for the innovative work they do. They enrich the community and introduce solutions that may become crucial in the future. But their expertise is largely theoretical and academic, not practical and customer-aligned. wolfSSL, in contrast, is staffed by dedicated engineers with decades of experience delivering production quality solutions for critical infrastructure, crafting performant and portable code, often on short notice, for dozens of commercially significant architectures.

Given these hard-won advantages at wolfSSL, some cryptography providers have tried to differentiate themselves with custom hardware, promising a performance boost. Let’s test that proposition with a look at performance on lattice cryptography. The software implementation we’ll show is wolfSSL software production release 5.7.6, throughput per core on a commodity high performance CPU, in this case an AMD 7960X. The hardware-accelerated implementation we’ll show is PQShield’s PQPerform-Lattice, in a pre-production realization on Xilinx Zynq UltraScale+ at 322 MHz (see https://doi.org/10.1145/3689939.3695785).

Algorithm key operation ops/sec cycles/op ops/sec cycles/op
size wolfSSL wolfSSL PQShield PQShield
KYBER512 128 key gen 422907 9955.0 140000 2300
KYBER512 128 encap 231528 18184.5 100625 3200
KYBER512 128 decap 230252 18225.0 68511 4700

Even when application-specific proprietary silicon has a performance advantage–which PQShield’s pre-production FPGA realization does not–it complicates platform design and production timelines, introduces supply-chain vulnerabilities, increases BoM expenses, and complicates parallelization. And crucially, it restricts crypto-agility, given hardware resources that are specific to a narrow class of cryptographic algorithm. This matters. It is widely acknowledged that Kyber/ML-KEM is based on a fairly new and under-studied body of mathematics, and further investigation may yet uncover a fatal flaw in this, or any of the other novel algorithms working their way through the standards-making process.

wolfSSL demonstrates superior performance with an open source software solution, without tying your design to a particular class of cryptographic algorithm. Indeed, our latest software implementation of ML-KEM is even faster than pre-standardization Kyber, attaining well over 300k encapsulation and decapsulation ops/s per core on the CPU shown above.

When you work with wolfSSL, your priorities become our priorities. We have always focused our resources on development, guided and enabled by our proud history of organic growth and customer-centric philosophy.

We provide the best tested code, worldwide 24×7 technical support that is second to none, and on-site interactions to ensure your goals are met. Our technical prowess and decades of experience let us operate across the whole spectrum of runtimes, from bare metal microcontrollers to data center big iron, with hand-crafted assembly optimizations fully leveraging vector instruction extensions.

Beyond the technical dimension, the professionals at wolfSSL focus on making sure you fully understand your options for licensing, support, and consulting, tailoring plans for your specific requirements and preferences. We draw up NDAs, SOWs and legal contracts so that you as a business have everything you need to secure your operational necessities.

Our team will see your project through, not only to delivery, but for the entire lifecycle after delivery. We are your reliable partner through the entire process, laser-focused on delivered results. This is what we do, and we do it better than anyone. This makes wolfSSL your ideal partner as you embark on the transition to quantum-resistant cryptography.

If you have questions about any of the above, please contact us at facts@wolfssl.com or +1 425 245 8247.

Download wolfSSL Now

Posts navigation

1 2