wolfSSL and Static Memory on FreeRTOS

We are proud to announce that wolfSSL’s static memory feature with FreeRTOS received an update in our latest 3.14.0 release. This feature allows for memory allocation to stack memory instead of using the heap. In previous versions of the wolfSSL embedded TLS library, the library would not compile when trying to use FreeRTOS and static memory. With this update, when FREERTOS is defined, the static memory feature uses pvPortMalloc() instead of malloc() when WOLFSSL_NO_MALLOC is not defined and a heap hint is not used.

With this new behavior when handling memory allocation in an RTOS environment wolfSSL now supports using only stack where supported.

For more information about building wolfSSL on embedded, IoT, and/or RTOS environments with static memory enabled please visit our static buffer allocation documentation page.

wolfSSL Performance on Intel x86_64 (Part 6)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is the last part. In this blog, we will talk about the performance of Elliptic Curve (EC) operations over the P-256 curve.

Elliptic curve cryptography (ECC) is the alternative to finite field (FF) cryptography which has algorithms like RSA, DSA and DH. ECDSA is the elliptic curve variant of RSA and DSA while ECDH is the elliptic curve variant of DH. ECDSA and ECDH can be used anywhere their FF counterparts can be used. ECC requires a pre-defined curve to perform the operations on. The most commonly used curve is P-256 as it has 128-bit strength and is in many standards including TLS, for certificates in IETF, and NIST’s FIPS 186-4. Browsers and web servers are preferring ECDH over DH as it is much faster.

wolfSSL 3.13 and later have completely new implementations of the EC algorithms over the P-256 curve. The implementation is constant-time with respect to private key operations. The implementations include variants in C, and assembly code targeted at Intel x86_64 and x86_64 with BMI2 and ADX. There is a small code size variant of the assembly code that is about 1/3rd the size (smaller pre-computed tables) yet remains very fast.

The two charts below show the relative performance of the old wolfSSL code, new small wolfSSL assembly code, new fast wolfSSL assembly code and OpenSSL as compared to the new wolfSSL C implementation on Ivy Bridge and Skylake CPUs. Note that the OpenSSL super-app does not measure the speed of the ECDH key generation operation. The new C implementation is a lot faster than the old generic C/ASM code for both CPUs. The assembly code is many times better than the C code mostly due to the use of larger pre-computed tables of elliptic curve points. The OpenSSL code is around 10% slower than the new fast wolfSSL assembly code using the generic x86_64 code and between 5% and 35% slower than wolfSSL assembly code for x86_64 with BMI2 and ADX instructions.

Contact us at support@wolfssl.com with questions about the performance of the wolfSSL embedded TLS library.

P-256_x86_64P-256_BMI2_ADX

References:

ECDSA (Elliptic Curve Digital Signature Algorithm)
ECDH (Elliptic-curve Diffie–Hellman)

wolfSSL Performance on Intel x86_64 (Part 5)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 5. In this blog, we will talk about the performance of RSA and Diffie-Hellman (DH).

RSA is the most commonly used public key algorithm for certificates. When performing a TLS handshake, the server will sign a hash of the messages seen so far and the client will verify the signature of certificates in the certificate chain and verify the hash of messages with the public key in the certificate. Signing and verifying are the most time-consuming operations in a handshake.

DH has been the key exchange algorithm of choice in handshakes but is falling out of favor as the Elliptic Curve variants are considerably faster at the same security level. Performing the key exchange is the second most time-consuming operation in a TLS handshake.

wolfSSL 3.13 and later have completely new implementations of RSA and DH targeted at specific key sizes: 2048 and 3072 bits. The implementation is constant-time with respect to private key operations. The implementations include variants in C and assembly code targeted at Intel x86_64 and x86_64 with BMI2 and ADX. The new code is significantly better than the old generic code and is about the same speed as OpenSSL on older CPUs and a little faster on newer CPUs.

The two charts below show the relative performance of the old wolfSSL code, new wolfSSL assembly code and OpenSSL as compared to the new wolfSSL C implementation on Ivy Bridge and Skylake CPUs. Note that the OpenSSL super-app does not measure the speed of DH operations. The new C implementation is a lot faster than the old generic C/ASM code for both CPUs. The assembly code for x86_64 is better than the C code by between 23% and 46% on x86_64 and 92% and 144% using BMI2 and ADX instructions. The OpenSSL code is about the same speed as the wolfSSL assembly code.

Contact us at support@wolfssl.com for questions about the performance of the wolfSSL embedded TLS library, using it on your platform, our about our TLS 1.3 support!

RSA_DH_x86_64RSA_DH_BMI2_ADX

References:

RSA (Wikipedia)
Diffie-Hellman (Wikipedia)

wolfSSL Performance on Intel x86_64 (Part 4)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 4. In this blog, we will talk about the performance of Curve25519 and Ed25519.

Curve25519 is set of parameters for a Montgomery elliptic curve and has ~128-bit security. It is used in key exchange and has become popular due to its speed and inclusion in standards. The algorithm is included as part of TLS v1.3 and NIST is considering it as part of SP 800-186. Ed25519 is set of parameters for a Twisted Edwards curve and is mathematically related to Curve25519 and has the same security properties. A new signature scheme has been designed over Twisted Edwards curves that is fast and included as part of TLS v1.3. A draft specification has been written describing digital certificates using EdDSA with Ed25519.

In a TLS handshake, a key exchange operation should always be performed to ensure forward-secrecy. When used, it will be a significant amount of the processing time during the handshake. Improving the performance of Curve25519, therefore, increases the number of TLS connections that can be made per second.

Older releases of wolfSSL have a C implementation of the algorithms. While the C code was quite fast, the new assembly code is significantly better. There is assembly code for generic Intel x86_64 CPUs, and for CPUs with BMI2 and ADX (Broadwell and newer CPUs).

The two charts below show the relative performance of wolfSSL and OpenSSL compared to the C implementation on Ivy Bridge and Skylake CPUs. On the Ivy Bridge CPU, the new assembly code is between 20% and 60% better than the C code and is better than OpenSSL in the one operation that can be measured. On the Skylake CPU, the assembly code is between 60% and 86% faster. The OpenSSL code has not been optimized for this CPU and is significantly slower.

Contact us at support@wolfssl.com with questions about the performance of the wolfSSL embedded TLS library.

Curve25519_Ed25519_Intel_x86_64Curve25519_Ed25519_Intel_BMI2_ADX

References:

Curve25519: high-speed elliptic-curve cryptography
Ed25519

wolfSSL Performance on Intel x86_64 (Part 3)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 3. In this blog, we will talk about the performance of SHA-256 and SHA-512.

The most commonly used digest algorithms are SHA-256 and SHA-384. With the introduction of AES-GCM in TLS, SHA-256 and SHA-384 are less commonly used for application data authentication. But, they are still used for handshake message authentication, as a one-way function (as required in a pseudo-random number generator) and digital signatures.

The assembly code has been rewritten to take best advantage of the AVX1 and AVX2 instructions. The performance of SHA-256 and SHA-512 is now as good or better than OpenSSL. The four charts below show the performance of wolfSSL has significantly improved from small up to big block sizes. On AVX1, the performance has increased by between 19% and 60% for SHA-256 and between 25% and 53%. Similarly, on AVX2, the improvement has increased by between 22% and 40% for SHA-256 and between 23% and 37% for SHA-512. The new wolfSSL assembly code is also significantly better than OpenSSL for small blocks and is about the same at the largest block size. SHA-384 uses the same algorithm as SHA-512 and therefore has the same underlying implementation and thus the same performance improvements.

Please contact us at support@wolfssl.com with any questions about the performance of the wolfSSL embedded TLS library.

SHA-256-AVX1 SHA-256-AVX2-BMI2 SHA-512-AVX1 SHA-512-AVX2-BMI2

References:

Introduction to Intel® Advanced Vector Extensions
Advanced Vector Extensions (Wikipedia)

wolfSSL Performance on Intel x86_64 (Part 2)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 2. In this blog, we will talk about the performance of ChaCha20-Poly1305.

ChaCha20-Poly1305 is a relatively new authenticated encryption algorithm. It was designed as an alternative to AES-GCM. The algorithm is simple and fast on CPUs that do not have hardware acceleration for AES and GCM.

Older releases of wolfSSL did not have assembly code implementations of ChaCh20 or Poly1305. So, adding assembly code that uses AVX1 and AVX2 instructions has made a significant difference. The two charts below show the performance of wolfSSL with respect to OpenSSL on AVX1 and AVX2 chipsets. In both charts, the new assembly code is a clear improvement over the C code. Compared to OpenSSL, wolfSSL is between 2.5% and 23% faster on AVX1 and on AVX2 they are the same speed to wolfSSL being 16% faster!

If you have questions about the performance of the wolfSSL embedded TLS library, please contact us at support@wolfssl.com!

ChaCha-Poly1305 - AVX1

ChaCha-Poly1305 - AVX2

References:

ChaCha Stream Cipher
Poly1305 (Wikipedia)

wolfSSL Performance on Intel x86_64 (Part 1)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made which are being discussed over a six blog post series. In this first blog, we will talk about the performance of AES-GCM.

The assembly code for AES-GCM has been rewritten to take best advantage of the AVX1 and AVX2 instructions. The performance of AES-GCM is now as good or better than OpenSSL.

The two charts below show the relative performance of AES-128-GCM encryption on an Intel AVX1 and AVX2 chipsets. They compare the performance of wolfSSL and OpenSSL with an older version of wolfSSL (before the assembly code changes).

Small block size performance is important when dealing with locally stored data like keys or data in a database. Meanwhile, large block size performance is important for large data transfers in TLS.

The performance of wolfSSL has significantly improved from small up to big block sizes. On AVX1, the smallest block size performance has increased by over 130% and at the top end, there is a 42% improvement. Similarly, on AVX2, the improvement is over 150% for small block sizes to 11% for large block sizes. The new wolfSSL assembly code is also significantly better than OpenSSL for small blocks and is about the same at the largest block size. Similar performance improvements have been achieved for AES-256-GCM as well.

AES-128-GCM Enc - AVX1 AES-128-GCM Enc - AVX2 (with RORX)

If you have questions about using the wolfSSL embedded TLS library on your platform, or about performance optimization of the library, contact us at support@wolfssl.com.

References:

Introduction to Intel® Advanced Vector Extensions
Advanced Vector Extensions (Wikipedia)

wolfCrypt v4.0 is on the CMVP Implementation Under-Test List (#TLS13)

We are excited to announce that wolfCrypt v4.0 is currently in process for CMVP validation for FIPS 140-2!

We are adding more algorithms to our security boundary including ECDSA, ECDHE, AES-GCM, AES-CCM, SHA-3, and RSA-PSS. Also included is FIPS 186-4 compliant key generation for both RSA and ECC. We will be able to offer TLSv1.3 with FIPS-validated cryptography for embedded TLS and embedded IoT devices!

For more information about our upcoming wolfCrypt v4.0 FIPS validation or about the wolfSSL embedded TLS library, please email fips@wolfssl.com.

AES CFB and XTS

Two modes of AES have been added to the embedded TLS library wolfSSL; AES-CFB and AES-XTS.

AES CFB (Cipher FeedBack) mode is a stream cipher mode of AES. For the first 16 bytes it encrypts an IV using AES and xor’s the result with the plain text for encryption or the cipher text for decryption. For getting the rest of the output the previous 16 bytes is encrypted with AES then xor’d with either the plain text or the cipher text.

AES XTS (XEX encryption with Tweak and ciphertext Stealing) mode is also a stream cipher mode. It is used for disk encryption and has an xor encrypt xor model with a Galois field multiplication for counter. When the input is not a multiple of AES block size (16 bytes), stealing is done to fill out the input size to a complete AES block size. This is done by copying over from the last full AES block size produced.

Both of these modes can be used in IoT applications and take advantage of existing AES hardware acceleration supported by wolfSSL.

For more information about AES modes in wolfSSL contact facts@wolfssl.com.

wolfSSL SGX Updates (Including FIPS!)

wolfSSL is pleased to announce we are in the process of adding FIPS + SGX to our FIPS certificate!

We have updated our SGX-Linux support and are working on adding an example client and server to the existing SGX-Windows project for a complete solution.
If you are working with SGX and need FIPS validated crypto running in an Enclave contact us at fips@wolfssl.com or support@wolfssl.com with any questions. We would love the opportunity to field your questions and hear about your project!

Posts navigation

1 2 3 117 118 119 120 121 122 123 186 187 188