Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

USB fails when ESP32-S3 board is plugged directly to a host port; using a hub succeeds #2943

Open
1 task done
dhalbert opened this issue Jan 13, 2025 · 18 comments
Open
1 task done
Labels

Comments

@dhalbert
Copy link
Contributor

Operating System

Linux

Board

Adafruit Metro ESP32-S3 and other ESP32-S3 boards

Firmware

Works: CircuitPython 9.2.1 using TinyUSB at 5217cee
Problem: CircuitPython 9.2.2 using TinyUSB at eabf68b (and also tip of master, right now 2495663)

What happened ?

USB devices do not appear when an ESP32-S3 board running CircuitPython 9.2.2 is plugged directly into a host computer port. When a USB hub is interposed it works.

This seems like some kind of timing problem.

I tried a few things unsuccessfully:

  • latest TinyUSB master commit: not fixed
  • tried bisecting TinyUSB but there are too many API changes and I could not do a successful bisect
  • Turned off MAX3421E support in CircuitPython
  • Turned off ESP32-S3 DMA support in tusb_config.h. I removed the || defined(CONFIG_IDF_TARGET_ESP32S3) below.
// Use DMA with the USB peripheral.
#if defined(CONFIG_IDF_TARGET_ESP32P4) || defined(CONFIG_IDF_TARGET_ESP32S2) || defined(CONFIG_IDF_TARGET_ESP32S3)
#define CFG_TUD_DWC2_DMA_ENABLE (1)
#define CFG_TUH_DWC2_DMA_ENABLE (1)
#endif

How to reproduce ?

  1. Install CircuitPython 9.2.2 (or latest main build) on ESP32-S3 board (try Metro or Feather ESP32-S3 4/2, though others may fail also).
  2. CIRCUITPY will probably appear.
  3. Unplug and replug board. CIRCUITPY and other USB devices will not appear.

3 will happen if the board is plugged in to a host computer port directly. USB2 vs USB3 does not seem to matter.
This problem does not happen if a USB hub is between the host computer port and the board. The USB hub I am using is a fancier one with a USB-C connection to the host, 3 USB3.1 A ports, Ethernet, HDMI, etc.

Other users have tested on Windows and macOS, and seen the same problem, so it's not peculiar to Linux.

See adafruit/circuitpython#9956

Debug Log as txt file (LOG/CFG_TUSB_DEBUG=2)

Here are two Beagle traces: 921.tdc (works), and 922.tdc (doesn't work)
921-tdc-and-922-tdc.zip

Screenshots

No response

I have checked existing issues, dicussion and documentation

  • I confirm I have checked existing issues, dicussion and documentation.
@dhalbert
Copy link
Contributor Author

I was able to get TinyUSB logging. At logging level 3, when the board does not connect to USB, only this is
printed (output from CircuitPython was edited out here):

USBD init on controller 0, speed = Full
sizeof(usbd_device_t) = 48
sizeof(dcd_event_t) = 12
sizeof(tu_fifo_t) = 20
sizeof(tu_edpt_stream_t) = 112
CDC init
MSC init
HID init
MIDI init
guid, gsnpsid, ghwcfg1, ghwcfg2, ghwcfg3, ghwcfg4
0x00000000, 0x4F54400A, 0x00000000, 0x224DD930, 0x00C804B5, 0xD3F0A030
Fullspeed PHY init
DMA = 1

The very first time the UF2 is loaded, without a power cycle, there is much more output, so logging is working well otherwise.

@dhalbert
Copy link
Contributor Author

This apparently also affects STM, since STM uses synopsis too: adafruit/circuitpython#9971. That may be another test platform.

Based on the recent post in #2074, I tried moving the AHBIDL test to the top and adding a delay. Byt that did not help this problem:

static void reset_core(dwc2_regs_t* dwc2) {
  while (!(dwc2->grstctl & GRSTCTL_AHBIDL)) {} // wait for AHB master IDLE    ///// MOVED from end of this routine

  // reset core
  dwc2->grstctl |= GRSTCTL_CSRST;

  if ((dwc2->gsnpsid & DWC2_CORE_REV_MASK) < (DWC2_CORE_REV_4_20a & DWC2_CORE_REV_MASK)) {
    // prior v42.0 CSRST is self-clearing
    while (dwc2->grstctl & GRSTCTL_CSRST) {}
  } else {
    // From v4.20a CSRST bit is write only, CSRT_DONE (w1c) is introduced for checking.
    // CSRST must also be explicitly cleared
    while (!(dwc2->grstctl & GRSTCTL_CSRST_DONE)) {}
    dwc2->grstctl =  (dwc2->grstctl & ~GRSTCTL_CSRST) | GRSTCTL_CSRST_DONE;
  }

  tusb_time_delay_ms_api(3);         ////// NEW
}

@HiFiPhile
Copy link
Collaborator

HiFiPhile commented Jan 17, 2025

Hi @dhalbert,

What's your host platform ? Apparently it's an platform dependant issue which I haven't met yet.

I've 4 platforms:

  • Ryzen 5800x + X570
  • RTX2080 Ti USB-C
  • Intel 1195G7
  • PCIe Card Fresco FL1100 just realized HIL test with a hub

@dhalbert
Copy link
Contributor Author

@HiFiPhile It fails on various platforms:

  • Dell Optiplex Intel i7-8700, Q370 chipset. Ubuntu 24.04.
  • Dell Optiplex Intel i5-9500T, Windows 11. Interestingly (?) when plugged in, the board shows up in Device Manager as "USB JTAG/serial debug unit" (!).
  • Mac Mini M1 running Sequoia 15.2.

I have more I could try: no AMD, but more Intel.

I am plugging directly into the front or rear panel USB2 or USB3 ports when the problem manifests. Interposing a fancy USB3 or an old USB2 hub fixes the problem. So try to get as close to the internal USB hub as possible without any intervening USB hubs.

@hathach
Copy link
Owner

hathach commented Jan 17, 2025

I am currently working on the fix for this and able to reproduce it with macos (it works with my amd debian 12 though). I have narrowed down its root cause. It is because S3 power up with DP/DM muxing to Serial JTAG USB. By the time circuitpython invoke dcd_init(), the JTAG USB is already enumerated. dcd_init() disconnect/connect and only cause several us reset, macos decide to just keep that. I am trying to find a best way to fix this. Latest IDF can also play a part, since I notice they also make lots of changes in the usb phy init code. @dhalbert I am updating tinyuf2 IDF to 5.3.2 as well (currently tinyuf2 is build on 5.1.4, maybe there is some mismatched usb phy init between tinyuf2 and cpy.

image

@roma-jam
Copy link
Contributor

Hi @dhalbert,

Based on the info, you are using 4 classes and 7 channels.
Config descriptor states, that device is a Bus-Powered and is required 100 mA.

Does your device not exceed this draining power from the bus?

Otherwise, it fits the states, that:

  • external hub helps
  • problem is platform dependent (maybe, not all manufacturers keep an eye on the draining power from the bus and the value)

Maybe it is worth trying to increase the bMaxPower to 500mA?

@hathach
Copy link
Owner

hathach commented Jan 17, 2025

thanks @roma-jam I am pretty sure this has something with new phy init code in latest IDF. After updating tinyuf2 to use IDF v5.3.2 from v5.1.4, tinyuf2 behaves the same as cpy, it couldn't enumerate properly on my intel macos (JTAG Serial is enumerated instead). I pretty much sure that in latest IDF usb phy change drop some code, maybe the RTC CONF USB to change the mux from JTAG to OTG or didn't do it quick enough and the JTAG is enumerating/ted when it do.

@dhalbert at least I got an easy and fast way to reproduc the issue now (was a bit confusing earlier since it work with my main PC (AMD 5950x debian 12)). I will do more testing, I think we are close enough :)

@dhalbert
Copy link
Contributor Author

It is because S3 power up with DP/DM muxing to Serial JTAG USB. By the time circuitpython invoke dcd_init(), the JTAG USB is already enumerated.

This explains perfectly why I saw JTAG in Windows Device Manager. 😄

I pretty much sure that in latest IDF usb phy change drop some code, maybe the RTC CONF USB to change the mux from JTAG to OTG or didn't do it quick enough and the JTAG is enumerating/ted when it do.

I wonder if an issue will need to be raised in ESP-IDF about this. I think I tried the latest TinyUSB with 5.3.1 and had the same issue, so it may be 5.3.x vs 5.2.x.

@hathach
Copy link
Owner

hathach commented Jan 17, 2025

I wonder if an issue will need to be raised in ESP-IDF about this. I think I tried the latest TinyUSB with 5.3.1 and had the same issue, so it may be 5.3.x vs 5.2.x.

test with 5.2.3, 5.3.1 and 5.4 as well, it still has the same issue, this quick hack will get this problem solved using the good-old phy init code.

#include "soc/rtc_cntl_struct.h"
#include "soc/usb_wrap_struct.h"

void init_usb_hardware(void) {
    #if CIRCUITPY_USB_DEVICE
    // Configure USB PHY
    #if 1
    (void) phy_hdl;
    periph_module_reset(PERIPH_USB_MODULE);
    periph_module_enable(PERIPH_USB_MODULE);

    USB_WRAP.otg_conf.pad_enable = 1;
    // USB_OTG use internal PHY
    USB_WRAP.otg_conf.phy_sel = 0;
    // phy_sel is controlled by the following register value
    RTCCNTL.usb_conf.sw_hw_usb_phy_sel = 1;
    // phy_sel=sw_usb_phy_sel=1, USB_OTG is connected with internal PHY
    RTCCNTL.usb_conf.sw_usb_phy_sel = 1;

    gpio_set_drive_capability(USBPHY_DM_NUM, GPIO_DRIVE_CAP_3);
    gpio_set_drive_capability(USBPHY_DP_NUM, GPIO_DRIVE_CAP_3);
    #else

    usb_phy_config_t phy_conf = {
        .controller = USB_PHY_CTRL_OTG,
        .target = USB_PHY_TARGET_INT,

        .otg_mode = USB_OTG_MODE_DEVICE,
        #ifdef CONFIG_IDF_TARGET_ESP32P4
        .otg_speed = USB_PHY_SPEED_HIGH,
        #else
        .otg_speed = USB_PHY_SPEED_FULL,
        #endif
    };
    usb_new_phy(&phy_conf, &phy_hdl);
    #endif

    // Pin the USB task to the same core as CircuitPython. This way we leave
    // the other core for networking.
    (void)xTaskCreateStaticPinnedToCore(usb_device_task,
        "usbd",
        USBD_STACK_SIZE,
        NULL,
        5,
        usb_device_stack,
        &usb_device_taskdef,
        xPortGetCoreID());
    #endif
}

I guess the new code does not switch from jtag to otg fast enough. Maybe @roma-jam can look more into this, it rather easy to reproduce with S3 + intel macos using https://github.com/adafruit/tinyuf2/tree/update-idf-5.3.2/ports/espressif

@hathach
Copy link
Owner

hathach commented Jan 20, 2025

I finally spot the troublesome code

https://github.com/espressif/esp-idf/blob/0f0068fff3ab159f082133aadfa9baf4fc0c7b8d/components/usb/usb_phy.c#L337-L339

    if (config->otg_speed != USB_PHY_SPEED_UNDEFINED) {
        ESP_ERROR_CHECK(usb_phy_otg_dev_set_speed(*handle_ret, config->otg_speed));
    }

I am not entirely sure but The DM/DP pull override probably mess up the dwc2/jtag reset sequence or something. this is a timing/racing issue. Since the issue does not appear when adding a LOGE before the set_speed() e.g

    if (config->otg_speed != USB_PHY_SPEED_UNDEFINED) {
        ESP_LOGE(USBPHY_TAG, "otg_speed: %d", config->otg_speed);
        ESP_ERROR_CHECK(usb_phy_otg_dev_set_speed(*handle_ret, config->otg_speed));
    }

@hathach
Copy link
Owner

hathach commented Jan 20, 2025

@dhalbert The current walkaround have the speed as USB_PHY_SPEED_UNDEFINED to skip the usb_phy_otg_dev_set_speed() in question.

  usb_phy_config_t phy_conf = {
    .controller = USB_PHY_CTRL_OTG,
    .target = USB_PHY_TARGET_INT,
    .otg_mode = USB_OTG_MODE_DEVICE,
    // https://github.com/hathach/tinyusb/issues/2943#issuecomment-2601888322
    // Set speed to undefined (auto-detect) to avoid timinng/racing issue with S3 with host such as macOS
    .otg_speed = USB_PHY_SPEED_UNDEFINED,
  };

  usb_new_phy(&phy_conf, &phy_hdl);

@tore-espressif @roma-jam it is rather easy to reproduce the issue using tinyuf2 in PR here and an macos adafruit/tinyuf2#426 (merged soon to master once ci passed). Let me know if you think that makes sense for us to file an issue on IDF regarding this issue for a proper investigation/fixing.

@roma-jam
Copy link
Contributor

Hi @hathach,

no, no need to file an issue, we are already aware of the problem.

Thanks you so much for such detailed search and description of the problem, we'll fix it asap.

@hathach
Copy link
Owner

hathach commented Jan 20, 2025

Hi @hathach,

no, no need to file an issue, we are already aware of the problem.

Thanks you so much for such detailed search and description of the problem, we'll fix it asap.

thank you, that is pretty quick response 👍

@Zychon
Copy link

Zychon commented Jan 22, 2025

This apparently also affects STM, since STM uses synopsis too: adafruit/circuitpython#9971. That may be another test platform.

Based on the recent post in #2074, I tried moving the AHBIDL test to the top and adding a delay. Byt that did not help this problem:

static void reset_core(dwc2_regs_t* dwc2) {
while (!(dwc2->grstctl & GRSTCTL_AHBIDL)) {} // wait for AHB master IDLE ///// MOVED from end of this routine

// reset core
dwc2->grstctl |= GRSTCTL_CSRST;

if ((dwc2->gsnpsid & DWC2_CORE_REV_MASK) < (DWC2_CORE_REV_4_20a & DWC2_CORE_REV_MASK)) {
// prior v42.0 CSRST is self-clearing
while (dwc2->grstctl & GRSTCTL_CSRST) {}
} else {
// From v4.20a CSRST bit is write only, CSRT_DONE (w1c) is introduced for checking.
// CSRST must also be explicitly cleared
while (!(dwc2->grstctl & GRSTCTL_CSRST_DONE)) {}
dwc2->grstctl = (dwc2->grstctl & ~GRSTCTL_CSRST) | GRSTCTL_CSRST_DONE;
}

tusb_time_delay_ms_api(3); ////// NEW
}

Hi @dhalbert , you have set the delay on the wrong place, based on my workaround in #2074 It must be after dwc2_phy_init and befor reset_core.

@dhalbert
Copy link
Contributor Author

@Zychon Thanks for pointing out my mistake. It turns out our STM problem was a miscounting of endpoints: see #2891 and #2901. But your diagnosis may be yet another problem. Maybe you could submit a PR to tinyusb?

@roma-jam
Copy link
Contributor

@hathach @HiFiPhile

it rather easy to reproduce with S3 + intel macos using https://github.com/adafruit/tinyuf2/tree/update-idf-5.3.2/ports/espressif

guys, apparently I need some assistance from your side. I tried the tinyuf2 self_update example and espressif_esp32s3_devkitc_1 board on macbook (Intel i7) and I have the drive device: "S3DKC1BOOT" at every connection.

Do I need to provide the valid application? Based on the example, I have the debug output:

TinyUF2
App invalid
Start DFU mode

but I don't see any difference with and without workaround from here.

Any notes regarding the possibility to reproduce the problem?

@hathach
Copy link
Owner

hathach commented Jan 22, 2025

@roma-jam I am glad you are giving it a try. I haven't try using self-update target, the PR is merged to master of tinyuf2 now,. maybe try again with the tinyuf2 target using this CMakeLists.txt (you can compile and flash with idf.py-DBOARD=espressif_esp32s3_devkitc_1) with following modification to line https://github.com/adafruit/tinyuf2/blob/master/ports/espressif/boards/boards.c#L271

  usb_phy_config_t phy_conf = {
    .controller = USB_PHY_CTRL_OTG,
    .target = USB_PHY_TARGET_INT,
    .otg_mode = USB_OTG_MODE_DEVICE,
    // https://github.com/hathach/tinyusb/issues/2943#issuecomment-2601888322
    // Set speed to undefined (auto-detect) to avoid timinng/racing issue with S3 with host such as macOS
    // .otg_speed = USB_PHY_SPEED_UNDEFINED, comment this out and 
    .otg_speed = USB_PHY_SPEED_FULL // add following line
  };

Since it is racing, maybe it is not reproducible with your macos. Here is my binaries that cause the isue with my macos
tinyuf2-s3-race.zip (flash with --flash_mode dio --flash_freq 80m --flash_size 8MB 0x0 bootloader/bootloader.bin 0x410000 tinyuf2.bin 0x8000 partition_table/partition-table.bin 0xe000 ota_data_initial.bin)

@dhalbert
Copy link
Contributor Author

@roma-jam Connect directly to the host USB port, without an intervening hub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants