Chirpstack reception
t+60ms *** 2024-03-01T13:16:57.242308Z INFO chirpstack::gateway::backend::mqtt: Message received from gateway region_id=“us915_1” topic=“us915_1/gateway/0cb…a/event/up” qos=0 json=false
My main issue is related to the time between Chirpstack reception and processing potentially growing above 1s and missing the downlink RX windows. The CPU load for Chirpstack is low 20%, pic at 40%, sometime 90% (of 1 CPU, system have 4 and load avg 2.6)
Questions:
is there some ways to increase parallelism on chirpstack packet processing ?
reducing logs seamed to had a positive impact, but I can’t measure it anymore ;), could it be the cause ?
between the reception & processing, I had other devices generating metrics, can it be related ?
do you see anything that can make it so long, I did not had a look at ?
Please note that ChirpStack v4.7 (currently test-release) contains many performance improvements:
Return database connections immediately back to the pool
Reduced number of database queries
Implementation of async PostgreSQL / Redis
Migration of device-sessions to PostgreSQL reduces the number of db connections even more depending the DevAddr re-usage
Some other tuning options could be tuning the max database connections that are kept in the connection pool. More connections means more work can be done in parallel (as once the max connections have been reached, the next request needs to wait until a connection has been returned to the pool).
Yes I’ve seen 4.7 have many improvement, it will also impact the core code for Helium intégration so I won’t jump on it too fast
Most of the problem seems related to mosquitto generating a lot of I/Os impacting the chiprstack internal performance (strangely not the java performance), I’m not sure why it impacts Chirpstack.
Mosquitto was like syncing on every 2 hours and for an hour some packets and finally dropped them apparently. I assume it’s related to persistence and lifetime … I will investigate this more.
I assume that you mean code related to your Helium integration that is interacting with ChirpStack?
Most of the problem seems related to mosquitto generating a lot of I/Os impacting the chiprstack internal performance (strangely not the java performance), I’m not sure why it impacts Chirpstack.
I’ve been doing a lot of performance testing last months and one observation that I made is that on high load, depending the configuration there might not be enough Redis connections in the connection pool to handle the uplinks. Then basically you get a big queue of tasks that are waiting for a connection before they can proceed. E.g. this get_redis_conn would hang until a connection becomes available:
You could increase the default max_open_connections=100 to a higher value. One indication of this is that you get many empty de-duplication set errors. Once a Redis connection becomes available to read the set, the key has already expired.
Again, ChirpStack v4.7 contains some optimizations that reduce the amount of queries upgrading might eventually be the best solution.
If you see any other parts in the code that could be optimized, then please let me know