Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLE service discovery fails on low BLE connections : lld_pdu_get_tx_flush_nb HCI packet count mismatch (1, 2) (IDFGH-6671) #8303

Closed
QuentinFarizon opened this issue Jan 27, 2022 · 19 comments
Assignees
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally

Comments

@QuentinFarizon
Copy link

QuentinFarizon commented Jan 27, 2022

Environment

  • Development Kit: M5stack Atom
  • Module or chip used: ESP32-PICO-V3
  • IDF version : v4.3.2 and v4.4 (tested on both)
  • Build System: idf.py
  • Compiler version : xtensa-esp32-elf-gcc (crosstool-NG esp-2021r2) 8.4.0
  • Operating System: Linux
  • Using an IDE?: No
  • Power Supply: USB

Problem Description

I'm using Nimble on a esp32 (ESP32-PICO-V3) acting as central, connecting to a BLE peripheral (nRF 52832).
This project is used in production on a few hundreds devices.

When the peripheral is close, everything works fine, connect, discovery, writes, reads, and does long exchanges of data.
When the device to connect to is far (RSSI < -86), sometomes all goes well, but I often (randomly) get this error :

lld_pdu_get_tx_flush_nb HCI packet count mismatch (1, 2)
E (251969) NimBLE: Error: Service discovery failed; status=7 conn_handle=0
I (251969) NimBLE: GAP procedure initiated: terminate connection; conn_handle=0 hci_reason=19

See full log below

I have tried increasing connection max interval and supervision timeout, on the central (initial parameters) and peripheral, with no luck.
I have disabled data length extension in Nimble, with no luck.
Note that this error line happens relatively quickly after service discovery starts, and is not impacted by connection intervals, which does not point to a timeout.

Expected Behavior

No error, service discovery proceeds correctly

Actual Behavior

Disconnect during service discovery

Steps to reproduce

I have separated the two boards by a few meters, separated by a wall, to be able to reproduce the issue.
The more the RSSI drops, the more issue I encounter

Code to reproduce this issue

My code is based on the blecent example (examples/bluetooth/nimble/blecent/main)

I'm just pasting here the part where we launch the service discovery when connnected :

static int
blecent_gap_event(struct ble_gap_event *event, void *arg)
{
(...)
    case BLE_GAP_EVENT_CONNECT:
        /* A new connection was established or a connection attempt failed. */
        if (event->connect.status == 0) {
            /* Connection successfully established. */
            MODLOG_DFLT(INFO, "Connection established ");

            rc = ble_gap_conn_find(event->connect.conn_handle, &desc);
            assert(rc == 0);
            print_conn_desc(&desc);
            MODLOG_DFLT(INFO, "\n");

            /* Remember peer. */
            rc = peer_add(event->connect.conn_handle);
            if (rc != 0) {
                MODLOG_DFLT(ERROR, "Failed to add peer; rc=%d\n", rc);
                return 0;
            }

            /* Perform service discovery. */
            rc = peer_disc_all(event->connect.conn_handle,
                               blecent_on_disc_complete, NULL);
            if (rc != 0) {
                MODLOG_DFLT(ERROR, "Failed to discover services; rc=%d\n", rc);
                return 0;
            }
        } else {
            /* Connection attempt failed; resume scanning. */
            MODLOG_DFLT(ERROR, "Error: Connection failed; status=%d\n",
                        event->connect.status);
            blecent_scan();
        }

        return 0;
    (...)
}

Debug Logs

When error occurs :

I (129309) NimBLE_BLE_CENT: Scan finished, RSSI -92, connecting...
I (129319) NimBLE: GAP procedure initiated: connect; 
I (129329) NimBLE: peer_addr_type=1 peer_addr=
I (129329) NimBLE: ef:0a:ab:e4:b8:a5
I (129329) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=10 itvl_max=250 latency=0 supervision_timeout=800 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (129349) NimBLE: 

I (130049) NimBLE: Connection established 
I (130049) NimBLE: 

I (130049) NimBLE_BLE_CENT: Added peer
I (130049) NimBLE: GATT procedure initiated: discover service by uuid; uuid=
I (130059) NimBLE: a7630001-f491-4f21-95ea-846ba586e361
I (130059) NimBLE: 

**lld_pdu_get_tx_flush_nb HCI packet count mismatch (1, 2)
E (131909) NimBLE: Error: Service discovery failed; status=7 conn_handle=0**

I (131909) NimBLE: GAP procedure initiated: terminate connection; conn_handle=0 hci_reason=19

I (131949) NimBLE: disconnect; reason=574

When all goes well :

I (100629) NimBLE_BLE_CENT: Scan finished, RSSI -85, connecting...
I (100629) NimBLE: GAP procedure initiated: connect; 
I (100639) NimBLE: peer_addr_type=1 peer_addr=
I (100639) NimBLE: ef:0a:ab:e4:b8:a5
I (100649) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=10 itvl_max=250 latency=0 supervision_timeout=800 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (100659) NimBLE: 

I (100769) NimBLE: Connection established 
I (100769) NimBLE: 

I (100769) NimBLE_BLE_CENT: Added peer
I (100769) NimBLE: GATT procedure initiated: discover service by uuid; uuid=
I (100779) NimBLE: a7630001-f491-4f21-95ea-846ba586e361
I (100779) NimBLE: 

I (103509) NimBLE: GATT procedure initiated: discover all characteristics; 
I (103509) NimBLE: start_handle=14 end_handle=22

I (108199) NimBLE: GAP procedure initiated: 
I (108199) NimBLE: connection parameter update; conn_handle=0 itvl_min=6 itvl_max=60 latency=0 supervision_timeout=400 min_ce_len=0 max_ce_len=0
I (108209) NimBLE: 

I (110379) NimBLE: connection params update event; conn_handle=0 status=0

I (110389) NimBLE: 

I (110529) NimBLE: GATT procedure initiated: discover all descriptors; 
I (110529) NimBLE: chr_val_handle=18 end_handle=19

I (111209) NimBLE: GATT procedure initiated: discover all descriptors; 
I (111209) NimBLE: chr_val_handle=21 end_handle=22

I (111429) NimBLE: Service discovery complete; status=0 conn_handle=0

I (111429) NimBLE_BLE_CENT: Subscribing to notifications
I (111439) NimBLE: GATT procedure initiated: write; 
I (111439) NimBLE: att_handle=22 len=2

I (111809) NimBLE: Subscribe complete; status=0 conn_handle=0 attr_handle=22
@espressif-bot espressif-bot added the Status: Opened Issue is new label Jan 27, 2022
@github-actions github-actions bot changed the title BLE service discovery fails on low BLE connections : lld_pdu_get_tx_flush_nb HCI packet count mismatch (1, 2) BLE service discovery fails on low BLE connections : lld_pdu_get_tx_flush_nb HCI packet count mismatch (1, 2) (IDFGH-6671) Jan 27, 2022
@QuentinFarizon
Copy link
Author

This is not linked to my custom service, the error occurs while discovering a single standard service (0x1800, Generic Access) :

I (71727) NimBLE_BLE_CENT: Scan finished, RSSI -90, connecting...
I (71737) NimBLE: GAP procedure initiated: connect; 
I (71737) NimBLE: peer_addr_type=1 peer_addr=
I (71747) NimBLE: ef:0a:ab:e4:b8:a5
I (71747) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=10 itvl_max=250 latency=0 supervision_timeout=800 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (71767) NimBLE: 

I (71867) NimBLE: Connection established 
I (71867) NimBLE: 

I (71867) NimBLE_BLE_CENT: Added peer
I (71877) NimBLE: GATT procedure initiated: discover service by uuid; uuid=
I (71877) NimBLE: 0x1800
I (71887) NimBLE: 

lld_pdu_get_tx_flush_nb HCI packet count mismatch (1, 2)
E (73737) NimBLE: Error: Service discovery failed; status=7 conn_handle=0

I (73737) NimBLE: GAP procedure initiated: terminate connection; conn_handle=0 hci_reason=19

@QuentinFarizon
Copy link
Author

I found that the timing between "GATT procedure initiated: discover service by uuid" and "Error: Service discovery failed" is always equal to 6 times the max connection interval.

This looks like the connection is created and not established ? Quoting "Intro to Bluetooth Low energy" : "The one exception - where the supervision timeout does not apply - is after a connection is created, but not yet established. In this case, the master will consider the connection to be lost if it does not receive the first packet from the slave within: 6 * connInterval"

@QuentinFarizon
Copy link
Author

QuentinFarizon commented Jan 28, 2022

It seems to be indeed that the peripheral doesn't always receive the connection packet.

The event BLE_GAP_EVENT_CONNECT should only be sent if the connection is established, meaning that we received a response from the peripheral. Today, it seems that this BLE_GAP_EVENT_CONNECT is sent to controller as soon as the connection packet is sent, without waiting for a response.
When the peripheral doesn't respond, instead of getting a disconnnect like today, we should get a BLE_GAP_EVENT_CONNECT with a staus failed.

@QuentinFarizon
Copy link
Author

QuentinFarizon commented Feb 14, 2022

After finer analysis and data from our installed devices, I think there really is a an issue inside the closed-source Bluetooth part of ESP-IDF (https://github.com/espressif/esp32-bt-lib), with the HCI layer randomly dropping or missing the connection packet.

I have added a connection retry to 5 times, and some devices have this behaviour now :

  • connection fails 1-4 times with the lld_pdu_get_tx_flush_nb error
  • sometimes the connection succeeds
  • at this point, everything works well, we can exchange a lot of data, and we never get an unexpected disconnection. This suggests that it's a bug, not only something due to low signal.

It happens on devices with RSSI approximatively below -88, whereas esp should work fine until -94.
However, some devices with very low RSSI never miss a packet.

Could someone from Espressif look into this ? It seems that it's hitting a variety of customers, applications and boards, and of SDKs (ESP-IDF, Micropython, Arduino).

@QuentinFarizon
Copy link
Author

I have already tried with ESP-IDF v4.4 + manually pulling latest development versions of https://github.com/espressif/esp32-bt-lib and https://github.com/espressif/esp-nimble
No luck

@Weijian-Espressif
Copy link
Collaborator

@rahult-github please take a look

@jsw-davidhuang
Copy link

i got this issue too, can anyone tell me how to fix it? now i can only increase the connect retry times

@visicon-masc
Copy link

Oh, this is still not resolved? Could you please write down the version numbers you use?

@jsw-davidhuang
Copy link

Oh, this is still not resolved? Could you please write down the version numbers you use?

i am using esp-idf 4.3 stable version.

@asukiaaa
Copy link
Contributor

I also have problem about the error to communicate with xbox controller but it does not occur with using ESP32C3.
Is it reason that normal ESP32 does not support BLE5?
asukiaaa/arduino-XboxSeriesXControllerESP32#3

@mdenissov
Copy link

@QuentinFarizon Do you have any updates about this problem?

@QuentinFarizon
Copy link
Author

@mdenissov We are still hit hard by this bug, in production. Since then, we updated to esp-idf v5.0, which seemed to reduce the frequency of the errors, but we still encounter it a lot.

There is some new information on this ticket : #5105

@I-Connect
Copy link

same issue here, we are connecting from an ESP32 (with an external puck antenna) to a Nuki smart lock.

I do not know how to use wireshark but it is definetely related to rssi as when I move the lock closer to the esp32 the problem reduces (I see less retries)

Hope this can be resolved soon...

D NimBLEScan: >> stop()
D NimBLEScan: << stop()
[ 82478][D][NukiBle.cpp:139] connectBle(): connecting within: nukiTask
[ 82479][D][NukiBle.cpp:146] connectBle(): connection attempt 0
D NimBLEClient: >> connect(54:d2:72:4f:98:84)
D NimBLEClient: Got Client event 
I NimBLEClient: Connected event
lld_pdu_get_tx_flush_nb HCI packet count mismatch (1, 2)
D NimBLEClient: Got Client event
E NimBLEClient: Connection failed; status=574
D NimBLEClient: >> disconnect()
D NimBLEClient: Not connected to any peers
D NimBLEClient: << disconnect()
[ 83389][W][NukiBle.cpp:168] connectBle(): BLE Connect failed, 4 retries left
[ 83490][D][NukiBle.cpp:146] connectBle(): connection attempt 1
D NimBLEClient: >> connect(54:d2:72:4f:98:84)
D NimBLEClient: Got Client event 
I NimBLEClient: Connected event
lld_pdu_get_tx_flush_nb HCI packet count mismatch (1, 2)
D NimBLEClient: Got Client event
E NimBLEClient: Connection failed; status=574
D NimBLEClient: >> disconnect()
D NimBLEClient: Not connected to any peers
D NimBLEClient: << disconnect()
[ 84127][W][NukiBle.cpp:168] connectBle(): BLE Connect failed, 3 retries left
[ 84228][D][NukiBle.cpp:146] connectBle(): connection attempt 2
D NimBLEClient: >> connect(54:d2:72:4f:98:84)
D NimBLEClient: Got Client event 
I NimBLEClient: Connected event
lld_pdu_get_tx_flush_nb HCI packet count mismatch (1, 2)
D NimBLEClient: Got Client event
E NimBLEClient: Connection failed; status=574
D NimBLEClient: >> disconnect()
D NimBLEClient: Not connected to any peers
D NimBLEClient: << disconnect()
[ 84977][W][NukiBle.cpp:168] connectBle(): BLE Connect failed, 2 retries left
[ 85078][D][NukiBle.cpp:146] connectBle(): connection attempt 3
D NimBLEClient: >> connect(54:d2:72:4f:98:84)
D NimBLEClient: Got Client event 
I NimBLEClient: Connected event
lld_pdu_get_tx_flush_nb HCI packet count mismatch (1, 2)
D NimBLEClient: Got Client event
E NimBLEClient: Connection failed; status=574
D NimBLEClient: >> disconnect()
D NimBLEClient: Not connected to any peers
D NimBLEClient: << disconnect()
[ 85409][W][NukiBle.cpp:168] connectBle(): BLE Connect failed, 1 retries left
[ 85510][D][NukiBle.cpp:146] connectBle(): connection attempt 4
D NimBLEClient: >> connect(54:d2:72:4f:98:84)
D NimBLEClient: Got Client event 
I NimBLEClient: Connected event
lld_pdu_get_tx_flush_nb HCI packet count mismatch (1, 2)
D NimBLEClient: Got Client event 
E NimBLEClient: Connection failed; status=574
D NimBLEClient: >> disconnect()
D NimBLEClient: Not connected to any peers
D NimBLEClient: << disconnect()
[ 86149][W][NukiBle.cpp:168] connectBle(): BLE Connect failed, 0 retries left
[ 86250][W][NukiBle.cpp:182] connectBle(): BLE Connect failed
[ 86250][W][NukiBle.cpp:898] sendEncryptedMessage(): Send encr msg failed due to unable to connect

@I-Connect
Copy link

@ocESP any update on this?

@BrugerX
Copy link

BrugerX commented Nov 13, 2023

Having the same issue. Using NodeMCU 1.1v ESP32-S2 model.

@mdenissov
Copy link

If it helps anyone

Board ESP32 Dev Board CH340 USB-C.

  1. Cut out part of the develop board under the ESP module antenna
  2. Replace 3.3 voltage regulator

After these changes I got a stable and fast Bluetooth connection.

Also some time after flash via USB and restart, have problem with bt connetion, but if unplug usb and restart board with external power supply all works well. So board have some problem with CH340 ic.

@BrugerX
Copy link

BrugerX commented Nov 15, 2023

Having the same issue. Using NodeMCU 1.1v ESP32-S2 model.

So I have done two things, do not know which of them fixed it:

  1. In my ConnectToServer() method, where I do the actual connection, I have added retries with delays in between:
    if(NimBLEDevice::getClientListSize()) {
        /** Special case when we already know this device, we send false as the
         *  second argument in connect() to prevent refreshing the service database.
         *  This saves considerable time and power.
         */
        pClient = NimBLEDevice::getClientByPeerAddress(advDevice->getAddress());
        if(pClient){
            int connection_retries = 0;
            while(!pClient->connect(advDevice, false) & (connection_retries<MAX_CONNECT_TO_SERVER_RETRIES)) {
                connection_retries++;
                log_e("Reconnect failed: Trying again.");
                delay(mS_WAIT_BETWEEN_CONNECT_TO_SERVER_TRY);
            }
            if(connection_retries == MAX_CONNECT_TO_SERVER_RETRIES)
            {
                std::string error_message = "Failed to reconnect to " + advDevice->getAddress().toString() + ", with RSSI: " + std::to_string(advDevice->getRSSI()) + "\n";
                log_e("%s", error_message.c_str());
                throw std::runtime_error(error_message);
            }
            Serial.println("Reconnected client");
  1. Furthermore, I was using the NimBLE client example, where in the client callbacks, I have commented out the settings:
 void onConnect(NimBLEClient* pClient) {
        Serial.println("Connected");

        //Comment out this part: pClient->updateConnParams(120,120,0,60);
    };

Either way, my code now appears to work.

@dernasherbrezon
Copy link

It seems that calling service discovery too fast can cause such issue. Bluetooth connection needs some time to settle. Additionally BLE expect MTU negotiation to happen. So if you start sending service discovery before MTU, then it might cause such issue. I had the same issue and fixed it by waiting for BLE_GAP_EVENT_MTU:

static int ble_client_gap_event(struct ble_gap_event *event, void *arg) {
  ble_client *client = (ble_client *) arg;
  ESP_LOGD(TAG, "gap event: %d", event->type);
  switch (event->type) {
    case BLE_GAP_EVENT_CONNECT: {
      int status = event->connect.status;
      if (status == 0) {
        ESP_LOGI(TAG, "connection established");
        client->conn_handle = event->connect.conn_handle;
      } else {
        ESP_LOGE(TAG, "connection failed. ble code: %d", status);
        client->semaphore_result = ble_client_convert_ble_code(status);
        client->connected = false;
        xSemaphoreGive(client->semaphore);
      }
      break;
    }
      // notify connected only after MTU negotiation completes
    case BLE_GAP_EVENT_MTU: {
      ESP_LOGI(TAG, "MTU negotiated");
      client->semaphore_result = ESP_OK;
      client->connected = true;
      xSemaphoreGive(client->semaphore);
      break;
    }
    default:
      break;
  }
  return 0;
}

@espressif-bot espressif-bot assigned rahult-github and unassigned ocESP Feb 23, 2024
@rahult-github
Copy link
Collaborator

Release v4.4 is EOL. Also 5.x release versions of nimble have migrated to newer nimble versions. So closing this issue. Please open a new one if issue still observed on newer IDF

@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: Done Issue is done internally and removed Status: Opened Issue is new labels Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: Done Issue is done internally Status: Done Issue is done internally
Projects
None yet
Development

No branches or pull requests