Multicast group deletion during FUOTA

finrut · October 18, 2019, 8:32am

Hi everyone,

I’m looking into the FUOTA,
and facing issues with fuota’s accomplishment for “heavy” firmware file ( > 6.5KB ),
instead, for small binarys (about 400 Bytes ~ 2 Frags on DR 5) it goes smoothly to the end.

This is our lab deploy :

dockerized loraserver components on x64 machine;
a PI gateway running semtech packet forwarder;
ST end-device with arm mbed fuota firmware;
posting the new firmware through API ( /api/devices/my_dev_deveui/fuota-deployments )

When uploading bigger firmwares, the node, switched to class C, receives at max 10-15 fragments and nothing more;
in loraserver logs, I can see it sending these first 10-15 but not the remaining frags,
and then multicast group is deleted (prematurely in my opinion), as you can see in log below :


time="2019-10-17T15:58:19Z" level=info msg="multicast queue-item created" f_cnt=35 gateway_id=aa555a0000000108 id=2279 multicast_group_id=6c3f6251-620f-46e2-9af0-d2065c47dfd6

time="2019-10-17T15:58:19Z" level=info msg="multicast queue-item created" f_cnt=36 gateway_id=aa555a0000000108 id=2280 multicast_group_id=6c3f6251-620f-46e2-9af0-d2065c47dfd6

time="2019-10-17T15:58:19Z" level=info msg="multicast queue-item created" f_cnt=37 gateway_id=aa555a0000000108 id=2281 multicast_group_id=6c3f6251-620f-46e2-9af0-d2065c47dfd6

time="2019-10-17T15:58:22Z" level=info msg="multicast queue-item deleted" id=2244

time="2019-10-17T15:58:22Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

time="2019-10-17T15:58:24Z" level=info msg="multicast queue-item deleted" id=2245

time="2019-10-17T15:58:24Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

time="2019-10-17T15:58:26Z" level=info msg="multicast queue-item deleted" id=2246

time="2019-10-17T15:58:26Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

time="2019-10-17T15:58:28Z" level=info msg="multicast queue-item deleted" id=2247

time="2019-10-17T15:58:28Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

time="2019-10-17T15:58:30Z" level=info msg="multicast queue-item deleted" id=2248

time="2019-10-17T15:58:30Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

time="2019-10-17T15:58:32Z" level=info msg="multicast queue-item deleted" id=2249

time="2019-10-17T15:58:32Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

time="2019-10-17T15:58:33Z" level=info msg="multicast queue-item deleted" id=2250

time="2019-10-17T15:58:33Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

time="2019-10-17T15:58:35Z" level=info msg="multicast queue-item deleted" id=2251

time="2019-10-17T15:58:35Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

time="2019-10-17T15:58:37Z" level=info msg="multicast queue-item deleted" id=2252

time="2019-10-17T15:58:37Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

time="2019-10-17T15:58:39Z" level=info msg="multicast queue-item deleted" id=2253

time="2019-10-17T15:58:39Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

time="2019-10-17T15:58:41Z" level=info msg="multicast queue-item deleted" id=2254

time="2019-10-17T15:58:41Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

time="2019-10-17T15:58:42Z" level=info msg="multicast-group deleted" id=6c3f6251-620f-46e2-9af0-d2065c47dfd6

time="2019-10-17T15:59:04Z" level=info msg="backend/gateway: downlink tx acknowledgement received" gateway_id=aa555a0000000108

while the node, after a pending status (multicast timeout) returns to class A.

Initially I thought the deletion timeout for the multicast group occured due to my incorrect fuota params, but also trying different params (unicast timeout, mc timeout and DR ) didn’t help.

Another thing I noticed when fuota fails,
sometimes there is no TX_ACK on latest fragment sent from the server and then the MC group is deleted ( I thought to a probably problem on our network ),
but I’ve seen that MC group is deleted also when no node is active.

So, I’m aware of experimental status of FUOTA capability,
but can anybody clarify the above situation, or have tips about it?

It would be very appreciated.

Thank in advance.

Eric · April 21, 2020, 5:40pm

I had these exact symptoms. Turns out there is a bug in mbed’s semtech radio driver when in continuous reception mode (class C). The driver assumes the buffer starting address is always the same, when that isn’t the case. Eventually this will cause the payloads to become corrupt, first drifting off by one byte, then 2, etc.

Find the two lines that say:

_rf_settings.lora_packet_handler.size = read_register(REG_LR_RXNBBYTES);
read_fifo(_data_buffer, _rf_settings.lora_packet_handler.size);

And add a line between them:

_rf_settings.lora_packet_handler.size = read_register(REG_LR_RXNBBYTES);
write_to_register(REG_LR_FIFOADDRPTR, read_register(REG_LR_FIFORXCURRENTADDR));
read_fifo(_data_buffer, _rf_settings.lora_packet_handler.size);

finrut · April 22, 2020, 8:47am

Hi Eric,

thank you for the suggestion,
I reported this problems also to original mbed fuota repo ( no answer anyway ).

Unfortunately at the moment I don’t have the hardware on which try the fix you have found,
but as soon as I can try again I will update here.

Thanks again

Eric · April 30, 2020, 12:52pm

The fixes should now be added to the current version of the rf-drivers for mbed 5.x and mbed 6.x (which recently absorbed the drivers):

finrut · May 5, 2020, 6:52am

Hi Eric,

thank you for reporting this, I soon as I can I’ll try,
just a related question, did you manage to do a Fuota with more than 15 fragments ? (242B each, about 4KB)

Because when I tried, using also another library, I could not go further than 15 pkts received from the ns.