Chirpstack performance and parallelism

disk91 · March 1, 2024, 2:19pm

Sometime, I’m getting such timeline in processing uplink with downlink

Bridge reception
t+0ms *** time=“2024-03-01T13:16:57.184055424Z” level=info msg=“integration/mqtt: publishing event” event=up qos=0 topic=us915_1/gateway/0…f1a/event/up uplink_id=1874

Chirpstack reception
t+60ms *** 2024-03-01T13:16:57.242308Z INFO chirpstack::gateway::backend::mqtt: Message received from gateway region_id=“us915_1” topic=“us915_1/gateway/0cb…a/event/up” qos=0 json=false

…

Chirpstack processing
t+1551ms *** 2024-03-01T13:16:58.793248Z INFO up{deduplication_id=d6eb40c4-58d1-4ae1-91bb-6af508521302}:data_up: chirpstack::storage::device_session: Device-session saved dev_eui=2…02 dev_addr=4800094f

…
Chirpstack end of processing:
t+1686ms *** 2024-03-01T13:16:58.928084Z INFO up{deduplication_id=d6eb40c4-58d1-4ae1-91bb-6af508521302}:data_up{dev_eui=“24…2”}:data_down{downlink_id=1529742702}: chirpstack::gateway::backend::mqtt: Sending downlink frame region_id=us915_1 gateway_id=0c…f1a topic=us915_1/gateway/0cb…a/command/down json=false

Bridge downlink / ack reception:
t+1687ms *** time=“2024-03-01T13:16:58.929482773Z” level=info msg=“integration/mqtt: downlink frame received” downlink_id=1529742702 gateway_id=0c…1a

Bridge downlink/ack push to Gw:
t+1852ms*** time=“2024-03-01T13:16:59.09400698Z” level=info msg=“integration/mqtt: publishing event” event=ack qos=0 topic=us915_1/gateway/0c…f1a/event/ack

My main issue is related to the time between Chirpstack reception and processing potentially growing above 1s and missing the downlink RX windows. The CPU load for Chirpstack is low 20%, pic at 40%, sometime 90% (of 1 CPU, system have 4 and load avg 2.6)

Questions:

is there some ways to increase parallelism on chirpstack packet processing ?
reducing logs seamed to had a positive impact, but I can’t measure it anymore ;), could it be the cause ?
between the reception & processing, I had other devices generating metrics, can it be related ?
do you see anything that can make it so long, I did not had a look at ?

Rq :
Chirpstack version 4.6.0

brocaar · March 12, 2024, 1:55pm

Please note that ChirpStack v4.7 (currently test-release) contains many performance improvements:

Return database connections immediately back to the pool
Reduced number of database queries
Implementation of async PostgreSQL / Redis
Migration of device-sessions to PostgreSQL reduces the number of db connections even more depending the DevAddr re-usage

Some other tuning options could be tuning the max database connections that are kept in the connection pool. More connections means more work can be done in parallel (as once the max connections have been reached, the next request needs to wait until a connection has been returned to the pool).

disk91 · March 12, 2024, 3:57pm

Yes I’ve seen 4.7 have many improvement, it will also impact the core code for Helium intégration so I won’t jump on it too fast

Most of the problem seems related to mosquitto generating a lot of I/Os impacting the chiprstack internal performance (strangely not the java performance), I’m not sure why it impacts Chirpstack.

Mosquitto was like syncing on every 2 hours and for an hour some packets and finally dropped them apparently. I assume it’s related to persistence and lifetime … I will investigate this more.

brocaar · March 13, 2024, 9:26am

I assume that you mean code related to your Helium integration that is interacting with ChirpStack?

Most of the problem seems related to mosquitto generating a lot of I/Os impacting the chiprstack internal performance (strangely not the java performance), I’m not sure why it impacts Chirpstack.

I’ve been doing a lot of performance testing last months and one observation that I made is that on high load, depending the configuration there might not be enough Redis connections in the connection pool to handle the uplinks. Then basically you get a big queue of tasks that are waiting for a connection before they can proceed. E.g. this get_redis_conn would hang until a connection becomes available:

github.com

chirpstack/chirpstack/blob/v4.6.0/chirpstack/src/uplink/mod.rs#L228


      
                  .await?;
          
              Ok(())
          }
          
          async fn deduplicate_put(key: &str, ttl: Duration, event: &gw::UplinkFrame) -> Result<()> {
              task::spawn_blocking({
                  let key = key.to_string();
                  let event_b = event.encode_to_vec();
                  move || -> Result<()> {
                      let mut c = get_redis_conn()?;
          
                      c.new_pipeline()
                          .atomic()
                          .cmd("SADD")
                          .arg(&key)
                          .arg(event_b)
                          .ignore()
                          .cmd("PEXPIRE")
                          .arg(&key)
                          .arg(ttl.as_millis() as usize)

You could increase the default max_open_connections=100 to a higher value. One indication of this is that you get many empty de-duplication set errors. Once a Redis connection becomes available to read the set, the key has already expired.

Again, ChirpStack v4.7 contains some optimizations that reduce the amount of queries upgrading might eventually be the best solution.

If you see any other parts in the code that could be optimized, then please let me know

ccall48 · March 13, 2024, 9:35am

Simulations with my helium connector were very impressive with up to 10k sensor runs, much improved over 4.6 well done.

disk91 · March 15, 2024, 7:14pm

Agree, these questions arrives with some pics and 20K devices running

disk91 · March 15, 2024, 7:18pm

Thank you for your response,

Is that max_open_connections in the postgresql section ? Because I don’t see this entry in the redis section

hum… got in the doc that the same param exists for redis