Topic: Connection losses due to segmentation errors

Hi there,

(and sorry for the wall of text, but I have put quite some time into this issue and want to share my discoveries)

Problem: I am regularly experiencing connection losses to the mqtt broker.

I found out that the reason for that is a payload mix up in the tcp segments (or malformed packages?)

As you can see in the attached wireshark trace, the last segment before the reset (Packet starts at No. 191) is incorrect.
First, some data within the payload is sent before the start of the payload (No. 194).
Second the packet No. 195 is marked "malformed" in wireshark. It is shown as a "Publish release" packet, but should only be a normal "publish message" packet. That is not alway the case! Sometimes its a normal tcp packet, sometimes something different.

I am using: MQTT v1.4, Microchip Harmony 2.6, Processor PIC32MZ2048EFM144, FreeRTOS

I discovered that the issue is not present when I put all the web-tasks into the same FreeRTOS Task. (See example below)
Also, when I only send packages with a payload that is smaller than 1460 Bytes (the maximum segment size) the issue also does not happen.
So this seems to be a multitasking issue. (Using WOLFMQTT_MULTITHREAD does not fix the problem but introduced others on top)


I am writing this to you, since I made a test project with plain BSD-Sockets that does not show any signs of this issue.
Also in my real application, we are using tcp and http quite often with the same payloads as for mqtt and have never experiences those issues.
I am still unsure if the problem is in the wolfMQTT library or in microchip harmony. Probably it is only a issue with my specific configuration?

I did dive into the wolfmqtt implementation and could not find anything suspicious, but maybe something is happening in the background (callbacks, interrupts)?

I would be glad if you could look into it.


I have modified a harmony-example with the mqtt-example for azure. I got it running with a local mosquitto broker.
To reproduce:
* build the attached project (the project is using the "PIC32MZ EF Starter Kit".) DHCP is enabled.
* install mosquitto (https://mosquitto.org/) (the standard configuration is ok) and run it.
* Set the correct ip "local_broker_ip" in the attached python script an run it

The python script will continuously trigger a publish on the PIC and will stop if the answer is not correct. This could take a few minutes - the error happens typically within 15 minutes - sometimes much faster.
If errors happen, the PIC needs to be restarted so it can connect to the broker again.

In the system_tasks.c is a define (#define COMBINE_TASKS ) that will enable a configuration that does not have those issues.

Post's attachments

wolfmqtt_forum.zip 154.49 kb, 2 downloads since 2020-03-19 

You don't have the permssions to download the attachments of this post.

Share

Re: Connection losses due to segmentation errors

Hi DanielGruber,

Thanks for the detailed report for wolfMQTT and multithreading. I will see if I can reproduce and resolve.

David Garske, wolfSSL

Share

Re: Connection losses due to segmentation errors

Hi DanielGruber,

We can reproduce the issue and are working on a fix. Once it is available I will post a link here shortly.

Thanks,
David Garske, wolfSSL

Share

Re: Connection losses due to segmentation errors

Hi DanielGruber,

The issue turned out to be thread synchronization and switching to a binary semaphore resolved things. The code has been pushed here and is under peer review:
https://github.com/wolfSSL/wolfMQTT/pull/146

Thanks,
David Garske, wolfSSL

Share

Re: Connection losses due to segmentation errors

Hi David,

Thanks for the quick response!
But, well - it does not work.

Using the code from the pull request, the call to MqttClient_Connect never returns.
That happens because (from the FreeRTOS Reference manual):

Binary semaphores created using the xSemaphoreCreateBinary() function are created ‘empty’, so the semaphore must first be given before the semaphore can be taken (obtained) using a call to xSemaphoreTake().

Besides, I don't think you can interchange FreeRTOS-Mutexes and Binary Semaphores:

Binary Semaphores – A binary semaphore used for synchronization does not need to be ‘given’ back after it has been successfully ‘taken’ (obtained). Task synchronization is implemented by having one task or interrupt ‘give’ the semaphore, and another task ‘take’ the semaphore (see the xSemaphoreGiveFromISR() documentation).

FreeRTOS-BinarySemaphores work "the other way around" compared to FreeRTOS-Mutexes as far as I understand them (never used them though).

I have added "wm_SemUnlock(s);" after the Semaphore is created in wm_SemInit() just to see if it maybe runs anyways. It does not. The Issue is still there.

Furthermore the MqttClient_Ping() Method returns -1 every time I call it. But that was always the case when I tried to use WOLFMQTT_MULTITHREAD.

Share