Hi,

The latest code modification (github #240) did FIX the "Qos 2 early exit" issue.

However I found another issue, in the github code after the v1.9.0 release.
I did not want to flood this forum with numerous issues in not-yet-released code.
I am just trying to find the right procedure to report these issues to you.

Sorry if I am not using the correct procedures.
Kelvin

Hi,

I consider code development as a iterative process.

Since I am now examining your "development code" in github, I think that it is unfair and inappropriate to air my findings in this forum which is more about the stable releases.

If you do not mind, I will report my findings to the github #239, and give you a chance to assess and fix any issues that I might raise. It will also allow the issues to be tracked, the code to be modified & then tested by your QA system, and allow stabilised releases to be provided when the issues are resolved.

The only downside is when other forum users come across issues, and have not checked in github to see if their issues have been resolved in a pending release candidate.

At a later date, I will come back to each of my issues raised in this forum and indicate their successful conclusion.

I hope that this suggested approach is acceptable to you and the forum users.
Kelvin

Hi,

I am still bleeding !!
I tested the library code from github master pull request #238.
Unfortunately, the code changes up to this merge did not resolve the Qos 2 early exit issue.

Hint:
The "waitMatchFound" is still being set by the PUBREC message in code snippet (to show you what I am testing):

            if ((wait_type == MQTT_PACKET_TYPE_ANY ||
                 wait_type == packet_type ||
                 (MqttIsPubRespPacket(packet_type) &&
                  MqttIsPubRespPacket(wait_type))) &&
                (wait_packet_id == 0 || wait_packet_id == packet_id))
            {
                use_packet_obj = packet_obj;
                waitMatchFound = 1;
            }

Is there another update that I can try ?
Kelvin

Hi,

My apologies.
We only received MQTT v1.9.0 library at the beginning of August 2021, so it is this new official release that I have been testing.

There did not seem to be any other posts with similar issues, but it is good to know that you are improving the library in the background.

I have been trying to track down a potentially nasty issue, where an incoming message arrives during an outgoing process and subsequently crashes the library.
Consequently, I have been isolating the processes and coming across these other side issues along the way as I closely examine the operation of the Wolf library.

Once again my apologises if these issues have already been resolved.
I am a simple implementer and put my faith in the official releases.
I was unaware that I had to track your other activities.
Anyway, I can acquire your latest github changes, and retest (hopefully, all is good, no more posts from me and you can rest !!).

I guess that Wolf will do another official release soon - do you have a expected release date ?

Kelvin

The Qos level 2 protocol sequence for Publishing is:
PUBLISH -> PUBREC -> PUBREL -> PUBCOMP

The MQTT v1.9.0 library exists (as SUCCESS) after sending the PUBREL and does not wait for the arrival of the PUBCOMP.
This early exit opens up the opportunity for the system to begin other transfers and then experiencing the unexpectant arrival of the PUBCOMP message.

As a gentle hint, the "waitMatchFound" (line 900 in mqtt_client.c) is set by the arrival of the PUBREC message,
so the library does not "wait_again" (line 996 in mqtt_client.c).

At the time of the premature exit, there is insufficient indications (in the library variables) that allow me to do an external work-around.
Could I kindly ask you to look into and resolve this issue, as it does lead to instability in our system.
Kelvin

I came across this issue as I was trying to hunt down a different potential issue in the library.
Normally, I would not bother the MQTT guys about a configuration issue,
except for the fact that the other potential issue also manifests itself when using large payloads.
So there may be an associated aspect (concerning client buffers) between this issue and the more serious potential issue that I am encountering.

Back to this issue:
I was testing the new non-blocking "Publish Continue" feature in the MQTT v1.9.0 release.
This feature allows the application to identify when more tx data is pending for transmission to the network.

Some buffer sizes were changed to help invoke the new feature.
1. network maximum tx size  (defined by application stacks)
2. client tx buffer length        (defined by library's MAX_BUFFER_SIZE)
3. publish payload size          (test publish message, with a large amount of data)

The full transfer size is mqtt header (40bytes) + payload (2000 bytes).

Summary of investigation:
Net size   Buf len   Payload   Transfer sequence
1024       1000      2000      1000 + 1000 + pubContinue + 40
1000       1000      2000      1000 + 1000 + pubContinue + 40
980         1000      2000      980 + 20 + 980 + pubContinue + error(-2)

With certain buffer sizes and large payload sizes, the library returned error (-2), created at line 155 in the mqtt_socket.c file.
I retested the above conditions with the MQTT v1.8.0 library, which also exhibited the same results (with pubContinue replaced with the normal Continue).

The simple solution is to ensure that the "Network Tx Size is large enough to consume the entire Client Buffer Size".
It could be a Feature Request, but I doubt if it is really worthwhile modifying the library to deal with the small "Network Tx Size" case.

So this post is just an Advisory Note for users to "carefully" configure their system with this aspect in mind, when sending large payloads.
Kelvin

For a system, which has more than ONE client, it is desirable that the MQTT library resources are independent for each client.

There are many advantages to this approach, and for the most part, the library achieves this goal through its use of session contexts (eg the MQTTCtx context in examples).
However the current MQTT library (v1.9.0) has several static data resources which are shared across all clients.

1.  the "static MqttProp clientPropStack[MQTT_MAX_PROPS]" and its associated semaphore lock.
When running just TWO clients, this resource was depleted and the MQTT_MAX_PROPS value had to be increased as a work-around, by multiplying the default value by number of clients.
Although this is an acceptable solution, it is dependent on a single non-specified initialisation of the array (not done by library)
and is reliant on "MqttProps_Free" to be executed on ungraceful disconnects to clean up the array.
So the library has to be compiled for the highest expected number of clients in any system or compiled on a per-system basis where the number of clients is uniquely specified.

As a possible feature request, the library could be changed to have the "clientPropStack" (and semaphore) included within the session context, where it can be initialised and used independently on a per-client basis.
Each client would then be in control of its own private prop stack, just like all the other variables in the current session context structure.

2.  the "static byte will_done" within the SN_Client_Connect routine.
I am not using SN operation, but point this variable out as a potential issue if ever multiple clients are used.

I hope this note will help users to avoid the prop stack issue when running multiple clients,
and this feature request will help generalise the library.

Kelvin

I am not sure about my earlier reply, it seems to have gone missing.
If it re-appears, then this is a briefer replication of that post.

I have tested the code (wolfMQTT-83d3725d365c33981bbc7fcb794b9bd3bacad4d7),
and it is working.
The new code:
1. continues to process the packet when no props are included in the PUBREC packet
2. responds with a valid reason code (actually with no reason code) in the PUBREL packet
3. has removed the condition that raised a compiler warning, so clean build is achieved

Discussion:
It is proposed that the reporting of the "reason code" to the application layer would be handled as a "feature request".
This is an acceptable suggestion.
The "reason code" just provides additional information for the higher layers to make informed decisions, such as Re-authenticate etc. It also provides extra information when errors occur outside of the MQTT protocol, such as No subscribers to your published topic. It could be useful, but it is not a show stopper.

So, many thanks for resolving the fundamental issues.
Kelvin

I have tried your code (wolfMQTT-1541ea071de48d6cf3b4664d814cb0883d041850).
The issue of reading non-existing props is now working, but there are still some other issues that need attention.

Summary:
1. The PUBREC reason code is not reported to the application
2. The PUBREL contained an invalid reason code.
    (cause: the PUBREC reason code was sent back in the PUBREL packet, instead of one of the allowed codes).
3. A compiler warning on line 511 of mqtt_packet.c

For information:
1. The PUBREC reason code is lost as the code uses a temporary PublishResp,
   instead of referencing the session's MqttPublish.resp or the MqttClient.msg.publish_resp, both of which could have reported the reason code back to the application.
   The final stage PUBCOMP would report a reason code of 0 (success), if the PUBREL did not fail, so this is probably acceptable if it had completed successfully.
   
2. The MQTT-v5.0 standard mentions, in Table 3.6, that only values of 0 and 146 can be used as reasons codes for PUBREL,
   although there might be some confusion in the specification as section 4.3.3 which allows values under 0x80 (128).
   However the HiveMq broker rejected the PUBREC reason code of 16, so I guess that it should not have been used in the PUBREL.

3. The compiler warning is illustrated in the attached picture.
   I am not overly concerned, as long as error conditions are reliably captured, eg for values like MQTT_PROP_TYPE_MAX.
   Also, it is always nice to have a absolutely clean build.

Details for recreating issue:
  Broker was "broker.hivemq.com"
  Publish to any non-subscribed topic.
  PUBREC data received was 50 03 00 01 10,
        where 50=PUBREC, 03=Remaining Length, 0001=packet ID, 10=reason code
  PUBREL data sent was 62 03 00 01 10,
        where 62=PUBREL, 03=Remaining Length, 0001=packet ID, 10=reason code
  DISCONNECT data received was decoded as
       "property (31) reason: PUBREL with invalid reason code was sent. This is a protocol violation."

Many thanks
Kelvin

Many thanks for looking into this issue.

I believe that there is a simple fix,
but I must refrain from providing suggestions as I am not familiar with the history and implications of the code.

With MQTT v5.0 specification trying to provide backward compatibility with earlier versions,
you might check for other instances, whilst you are on the case, of similar loopholes that need plugging (or hopefully not).
Just a suggestion, but I am always trying to get more for the price of one !!

Once again, many thanks.

Kind regards,
Kelvin

This is a rather special case, but I believe it is allowed in the MQTT standards.

During our testing of the Wolf Library, we published to a topic which did not have any subscribers (that is, the subscribers had not yet been set up).
The Broker responded with a shorten PUBREC packet, with "reason code" (16) of "No matching Subscribers".

However, the Library did not support the shortened packet, discarded reason code, and returned "result code" (-2) of  MQTT_CODE_ERROR_OUT_OF_BUFFER.
The library actually tried to process "props" that were not in the shortened packet.

The MQTT-v5.0 standard mentions (in section 3.5.2.2.1) that "If the Remaining Length is less than 4 there is no Property Length and the value of 0 is used."

The reason code is discarded as the Library uses a NULL packet object for this part of the processing during MQTT_MSG_WAIT.
For reference, see line 820 of mqtt_client.c, where
   rc = MqttClient_DecodePacket(client, client->rx_buf,
          client->packet.buf_len, NULL, &packet_type, NULL, &packet_id);

Consequently, the broker's valid response is rejected, without telling the application the "reason code".
Can you please advise how to handle this situation.

Details for recreating issue:
  Broker was "broker.hivemq.com"
  Publish to any non-subscribed topic.
  PUBREC data received was 50 03 00 01 10,
        where 50=PUBREC, 03=Remaining Length, 0001=packet ID, 10=reason code

Many thanks
Kelvin