wolfSSL Performance on Intel x86_64 (Part 3)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 3. In this blog, we will talk about the performance of SHA-256 and SHA-512.

The most commonly used digest algorithms are SHA-256 and SHA-384. With the introduction of AES-GCM in TLS, SHA-256 and SHA-384 are less commonly used for application data authentication. But, they are still used for handshake message authentication, as a one-way function (as required in a pseudo-random number generator) and digital signatures.

The assembly code has been rewritten to take best advantage of the AVX1 and AVX2 instructions. The performance of SHA-256 and SHA-512 is now as good or better than OpenSSL. The four charts below show the performance of wolfSSL has significantly improved from small up to big block sizes. On AVX1, the performance has increased by between 19% and 60% for SHA-256 and between 25% and 53%. Similarly, on AVX2, the improvement has increased by between 22% and 40% for SHA-256 and between 23% and 37% for SHA-512. The new wolfSSL assembly code is also significantly better than OpenSSL for small blocks and is about the same at the largest block size. SHA-384 uses the same algorithm as SHA-512 and therefore has the same underlying implementation and thus the same performance improvements.

Please contact us at support@wolfssl.com with any questions about the performance of the wolfSSL embedded TLS library.

SHA-256-AVX1 SHA-256-AVX2-BMI2 SHA-512-AVX1 SHA-512-AVX2-BMI2

References:

Introduction to Intel® Advanced Vector Extensions
Advanced Vector Extensions (Wikipedia)

wolfSSL Performance on Intel x86_64 (Part 2)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made and are being discussed over six blog posts of which this is part 2. In this blog, we will talk about the performance of ChaCha20-Poly1305.

ChaCha20-Poly1305 is a relatively new authenticated encryption algorithm. It was designed as an alternative to AES-GCM. The algorithm is simple and fast on CPUs that do not have hardware acceleration for AES and GCM.

Older releases of wolfSSL did not have assembly code implementations of ChaCh20 or Poly1305. So, adding assembly code that uses AVX1 and AVX2 instructions has made a significant difference. The two charts below show the performance of wolfSSL with respect to OpenSSL on AVX1 and AVX2 chipsets. In both charts, the new assembly code is a clear improvement over the C code. Compared to OpenSSL, wolfSSL is between 2.5% and 23% faster on AVX1 and on AVX2 they are the same speed to wolfSSL being 16% faster!

If you have questions about the performance of the wolfSSL embedded TLS library, please contact us at support@wolfssl.com!

ChaCha-Poly1305 - AVX1

ChaCha-Poly1305 - AVX2

References:

ChaCha Stream Cipher
Poly1305 (Wikipedia)

wolfSSL Performance on Intel x86_64 (Part 1)

Recent releases of wolfSSL have included new assembly code targeted at the Intel x86_64 platform. Large performance gains have been made which are being discussed over a six blog post series. In this first blog, we will talk about the performance of AES-GCM.

The assembly code for AES-GCM has been rewritten to take best advantage of the AVX1 and AVX2 instructions. The performance of AES-GCM is now as good or better than OpenSSL.

The two charts below show the relative performance of AES-128-GCM encryption on an Intel AVX1 and AVX2 chipsets. They compare the performance of wolfSSL and OpenSSL with an older version of wolfSSL (before the assembly code changes).

Small block size performance is important when dealing with locally stored data like keys or data in a database. Meanwhile, large block size performance is important for large data transfers in TLS.

The performance of wolfSSL has significantly improved from small up to big block sizes. On AVX1, the smallest block size performance has increased by over 130% and at the top end, there is a 42% improvement. Similarly, on AVX2, the improvement is over 150% for small block sizes to 11% for large block sizes. The new wolfSSL assembly code is also significantly better than OpenSSL for small blocks and is about the same at the largest block size. Similar performance improvements have been achieved for AES-256-GCM as well.

AES-128-GCM Enc - AVX1 AES-128-GCM Enc - AVX2 (with RORX)

If you have questions about using the wolfSSL embedded TLS library on your platform, or about performance optimization of the library, contact us at support@wolfssl.com.

References:

Introduction to Intel® Advanced Vector Extensions
Advanced Vector Extensions (Wikipedia)

wolfCrypt v4.0 is on the CMVP Implementation Under-Test List (#TLS13)

We are excited to announce that wolfCrypt v4.0 is currently in process for CMVP validation for FIPS 140-2!

We are adding more algorithms to our security boundary including ECDSA, ECDHE, AES-GCM, AES-CCM, SHA-3, and RSA-PSS. Also included is FIPS 186-4 compliant key generation for both RSA and ECC. We will be able to offer TLSv1.3 with FIPS-validated cryptography for embedded TLS and embedded IoT devices!

For more information about our upcoming wolfCrypt v4.0 FIPS validation or about the wolfSSL embedded TLS library, please email fips@wolfssl.com.

AES CFB and XTS

Two modes of AES have been added to the embedded TLS library wolfSSL; AES-CFB and AES-XTS.

AES CFB (Cipher FeedBack) mode is a stream cipher mode of AES. For the first 16 bytes it encrypts an IV using AES and xor’s the result with the plain text for encryption or the cipher text for decryption. For getting the rest of the output the previous 16 bytes is encrypted with AES then xor’d with either the plain text or the cipher text.

AES XTS (XEX encryption with Tweak and ciphertext Stealing) mode is also a stream cipher mode. It is used for disk encryption and has an xor encrypt xor model with a Galois field multiplication for counter. When the input is not a multiple of AES block size (16 bytes), stealing is done to fill out the input size to a complete AES block size. This is done by copying over from the last full AES block size produced.

Both of these modes can be used in IoT applications and take advantage of existing AES hardware acceleration supported by wolfSSL.

For more information about AES modes in wolfSSL contact facts@wolfssl.com.

wolfSSL SGX Updates (Including FIPS!)

wolfSSL is pleased to announce we are in the process of adding FIPS + SGX to our FIPS certificate!

We have updated our SGX-Linux support and are working on adding an example client and server to the existing SGX-Windows project for a complete solution.
If you are working with SGX and need FIPS validated crypto running in an Enclave contact us at fips@wolfssl.com or support@wolfssl.com with any questions. We would love the opportunity to field your questions and hear about your project!

Job Posting: Embedded Systems Software Engineer

wolfSSL is a growing company looking to add a top notch embedded systems software engineer to our organization. wolfSSL develops, markets and sells the leading Open Source embedded SSL/TLS protocol implementation, wolfSSL. Our users are primarily building devices or applications that need security. Other products include wolfCrypt embedded cryptography engine, wolfMQTT client library, and wolfSSH.

Job Description:

Currently, we are seeking to add a senior level C software engineer with 5-10 years experience interested in a fun company with tremendous upside. Backgrounds that are useful to our team include networking, security, and hardware optimizations. Assembly experience is a plus. Experience with encryption software is a plus. RTOS experience is a plus.  Experience with hardware-based cryptography is a plus.

Operating environments of particular interest to us include Linux, Windows, Embedded Linux and RTOS varieties (VxWorks, QNX, ThreadX, uC/OS, MQX, FreeRTOS, etc). Experience with mobile environments such as Android and iOS is also a plus, but not required.

Location is flexible. For the right candidate, we’re open to this individual working from virtually any location.

How To Apply

To apply or discuss, please send your resume and cover letter to facts@wolfssl.com.

SHA-3 Support in wolfSSL #TLS13

We’ve fully added support for SHA-3 to the wolfSSL embedded TLS library. We have also included SHA-3 support to HMAC and HKDF. Our SHA-3 offering includes 224, 256, 384, and 512-bit digests. It is tied into our hashing and signature infrastructure, so it will be available to TLS v1.2 or TLS v1.3 when the IETF adds cipher suites using SHA-3. There are also two build flavors to trade between size and speed, good for large server environments and for small embedded applications. If you are a FIPS user, we shall have SHA-3 available inside of our FIPS boundary later this year.

For more information please email us at facts@wolfssl.com.

Nginx with wolfSSL #TLS13

At wolfSSL, we are dedicated to 3rd party integration and have been improving our support for Nginx. wolfSSL now has tested patches for Nginx 1.13.8, 1.12.2 and other point releases.

Nginx builds with OpenSSL by default and this makes getting FIPS 140-2 compliance difficult. Compiling Nginx with wolfSSL is simple and we can help you through the validation process for your platform.

No code changes to Nginx are required for FIPS but make sure your configuration is set appropriately. This includes using:

  • RSA with keys of 2048-bits or more
  • ECC with P-256 or P-384
  • Key exchange with (EC) Diffie-Hellman ephemeral over static
  • Ciphers AES-128 or AES-256 in GCM over CBC mode
  • Digest and MAC with SHA-256 or SHA-384

The recommended cipher suites are:

  • ECDHE-ECDSA-AES128-GCM-SHA256
  • ECDHE-RSA-AES128-GCM-SHA256
  • DHE-RSA-AES128-GCM-SHA256

Nginx has enabled support for TLS 1.3 and this is also available with wolfSSL. Note that the new draft revision of SP 800-52 requires, for government-only applications, the use of TLS v1.2 and should be configured to use TLS v1.3. wolfSSL has been implementing the TLS v1.3 drafts and performed interoperability testing. We are on track to support the final release of the TLS v1.3 specification.

STM32F Support Expanded

We’ve expanded our STM32F series support in the wolfSSL embedded TLS library to include the STM32F1, STM32F2, STM32F4 and STM32F7. This supports using either the CubeMX HAL or the Standard Peripheral Library. If the chip supports symmetric hardware crypto such as AES (CBC/GCM), 3DES, MD5, SHA1 or SHA256 we support using this from wolfCrypt native API’s or naturally through wolfSSL’s TLS client/server. The performance is about 10 times greater with the symmetric crypto hardware, making it a perfect fit for IoT TLS and performance-constrained devices. If the chip supports hardware based Random Number Generation (RNG) we support that as well.

You can find a list of build-time options for configuring this here:
https://github.com/wolfSSL/wolfssl/blob/master/wolfssl/wolfcrypt/settings.h#L988

You can find an example STM32Cube project here:
https://github.com/wolfSSL/wolfssl/tree/master/IDE/STM32Cube

For more information please email us at facts@wolfssl.com.

Posts navigation

1 2 3 119 120 121 122 123 124 125 187 188 189