TLS 1.3 Performance Part 2 – Full Handshake

Significant changes from TLS 1.2 have been made in TLS 1.3 that are targeted at performance. This is the second part of six blogs discussing the performance differences observed between TLS 1.2 and TLS 1.3 in wolfSSL and how to make the most of them in your applications. This blog discusses the performance differences with regard to full handshake with server authentication using certificates.

Let’s start with a look at the TLS 1.2 full handshake performing server-only authentication with certificates below.

A TLS 1.3 full handshake (without HelloRetryRequest) performing server-only authentication with certificates is below.

Notice that there is one less round trip until Application Data can be sent in TLS 1.3 as compared to TLS 1.2. This significantly improves performance especially on high latency networks. But, there is another source of performance improvement arising from the ordering of the handshake messages and when lengthy cryptographic operations are performed.

In the TLS handshake, the server waits on the ClientHello and then sends handshake messages as it produces them in separate packets. When packets are sent is dependent on the amount of processing required to produce the data. For example, to copy a chain of certificates into the Certificate messages is quick, while generating a TLS 1.2 ServerKeyExchange message is slow as it requires multiple public key operations.

The client receives the messages at various time deltas and also requires differing amounts of processing. For example, the Certificate message is likely to require at least one signature verification operation on the leaf certificate. This asymmetric processing of messages means that some handshake messages will be processed on arrival and some will have to wait for processing of previous messages to be completed.

The table below restates the TLS 1.2 handshake but includes processing of messages and the major cryptographic operations that are performed. Operations are on the same line if the they are performed at the same time relative to network latency.

From this we can see that for RSA, where Verify is very fast relative to Sign, a TLS 1.2 handshake is dependent on: 2 x Key Gen, 2 x Secret Gen, 1 x Sign and 1 x Verify. For ECDSA, where Verify is slower than Key Gen plus Sign: 1 x Key Gen, 2 x Secret Gen and 2 x Verify.

The table below is a restating of the TLS 1.3 handshake including processing of message and the major cryptographic operations.

From this we can see that a TLS 1.3 handshake with RSA, where Verify is a lot faster than Sign, is dependent on: 2 x Key Gen, 1 x Secret Gen, 1 x Sign. Therefore, a Secret Gen and Verify in TLS 1.2 are saved. For ECDSA, where Verify is a lot slower than Sign, the TLS 1.3 handshake is dependent on: 2 x Key Gen, 1 x Secret Gen, 2 x Verify. Therefore, a Secret Gen in TLS 1.2 is traded for a faster Key Gen.

Running both the client and server on the same computer results in about a 15% improvement in the performance of ephemeral DH with RSA handshakes – mostly due to the parallel operations. With ephemeral ECDH and RSA there is about a 6% improvement, and with ECDHE and ECDSA there is about a 7% improvement – mostly due to the saving in round-trips.

These improvements come for free when using TLS 1.3 without the HelloRetryRequest. The next blog will discuss handshakes using pre-shared keys.