wolfHSM and Concurrency

Systems that integrate an HSM often have multiple threads or subsystems performing cryptographic operations at the same time. wolfHSM is designed to support this kind of workload while keeping the request/response protocol simple and predictable.

wolfHSM concurrency is primarily achieved by the server processing requests from multiple client sessions in parallel. Each session processes requests sequentially, but multiple sessions can execute at the same time.

This post describes the concurrency patterns on both the client and server side, and the locking support added in wolfHSM v1.4.0 that enables multithreaded server designs.

Client-Side Concurrency

The wolfHSM protocol allows one in-flight request per client session. A request must complete and produce a response before the next request for that session is processed.

Because of this, concurrency on the client side is typically implemented by creating multiple client contexts, each representing an independent session with the server.

For example, a multithreaded application might assign one client context per worker thread:

Thread A uses whClientContext A
Thread B uses whClientContext B
Thread C uses whClientContext C

Each thread submits requests independently, allowing multiple operations to execute concurrently on the server.

This approach avoids complicated synchronization on the client side. Since each thread owns its own context, request ordering and response matching remain straightforward.

In general:

A whClientContext should be used by one thread at a time
Multiple contexts may exist concurrently
Each context maps to a single server session

Server-Side Concurrency

On the server side, each client connection is represented by a whServerContext. This structure contains the state associated with a single client/server session.

A server may create multiple whServerContext instances—one for each client—and process incoming requests using:

wh_Server_HandleRequestMessage()

The reference servers included with most wolfHSM platform ports use a single-threaded round-robin loop. In this design, the server iterates over each context and processes one request at a time.

This approach is simple and works well for many embedded deployments.

However, some systems may prefer more advanced scheduling strategies, such as:

event-driven dispatch
interrupt-driven handling
dedicated worker threads per client
other application-defined task scheduling

The wolfHSM API does not impose a particular scheduling model. Custom server applications are free to implement any mechanism that repeatedly calls wh_Server_HandleRequestMessage() for each active server context.

Multithreaded Server Pattern

One common design is to run a dedicated thread for each server context.

void* serverThread(void* arg)
{
    whServerContext* server = (whServerContext*)arg;
    int ret = WH_ERROR_OK;

    while (ret == WH_ERROR_OK) {
        ret = wh_Server_HandleRequestMessage(server);
    }

    return NULL;
}

In this model:

each client session has its own whServerContext
each context is serviced by a worker thread
multiple client requests may be processed concurrently

The underlying operating system determines scheduling and execution order.

Server Locking Support (wolfHSM v1.4.0)

When multiple server threads are active, certain internal resources must be protected from concurrent access. wolfHSM v1.4.0 introduces a generic locking framework that allows platform ports to register synchronization primitives.

Examples include:

atomic spinlocks
POSIX mutexes
FreeRTOS mutexes
other platform-specific locks

When configured, wolfHSM uses these locks internally to serialize access to shared resources such as:

non-volatile storage
global key caches
other shared server state

This allows multiple server threads to safely process requests while maintaining correct access to shared data.

Transport Independence

Server concurrency is independent of the transport layer. The transport only moves request and response messages between client and server.

Whether the server processes requests in:

a round-robin loop
an event-driven system
multiple threads

is entirely controlled by the application.

What About Crypto?

How cryptographic operations behave under concurrency depends on whether the implementation is software-based or uses hardware acceleration.

When operations are performed using software crypto, they execute entirely inside wolfCrypt using ephemeral operation contexts. These contexts are created per request and do not rely on shared global state, which means they work naturally with concurrent server threads.

When hardware crypto is involved, the behavior depends on how the platform integrates the accelerator.

One option is to rely on wolfCrypt’s hardware abstraction layer, which protects hardware access using its internal mutex mechanisms. In this configuration, multiple threads may invoke hardware-backed operations while wolfCrypt serializes access to the device. This is generally the preferred option when the platform’s synchronization primitives can reliably prevent deadlock scenarios within the application’s concurrency model.

Another option is for the server application to restrict hardware usage to a specific client context. This can be done by registering the hardware crypto callback (cryptoCb) for only a single privileged client session. This approach is useful when a high-priority client requires deterministic and uninterrupted access to the hardware accelerator, such as in safety-critical or real-time environments.

A third option is to use wolfHSM’s crypto affinity feature, allowing client applications to decide whether a request should prefer hardware or software crypto. In this model, the clients themselves coordinate hardware usage and avoid contention. This approach works well when client applications are trusted and able to enforce a cooperative usage policy or follow a predefined allocation scheme.

Overall, wolfHSM is designed to support a wide range of deployment models and provides the flexibility needed to accommodate many different system architectures and hardware environments. Ultimately, the correct approach depends on the specific requirements and constraints of the system as well as the capabilities of the underlying hardware platform. If you are unsure which approach is best for your deployment, the wolfSSL team can work with you to evaluate your architecture and help determine the most appropriate solution.

Summary

wolfHSM concurrency follows a straightforward model:

Each client session allows one request at a time
Concurrency is achieved by using multiple client sessions
The reference server for most platforms uses a single-threaded round-robin scheduler
Custom server applications may implement multithreaded server designs to increase concurrency and responsiveness as long as they follow the basic concurrent usage restrictions
wolfHSM does not dictate a specific threading model or runtime environment, allowing applications to choose what best fits their requirements

This design allows wolfHSM to support a wide range of deployment models, from simple embedded servers to multithreaded systems that combine multiple client sessions with shared hardware cryptographic accelerators.

If you have questions about any of the above, please contact us at facts@wolfssl.com or call us at +1 425 245 8247.

Download wolfSSL Now