Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

esp_bignum aborts randomly when in task with no CPU affinity / Timed out waiting for RSA operation (op_reg 0x3ff02810 int_reg 0x1) #239

Closed
nathanjel opened this issue Jan 15, 2017 · 6 comments

Comments

@nathanjel
Copy link
Contributor

nathanjel commented Jan 15, 2017

Hi,

A heavy duty program (every 10ms UART IO, continious messaging with available speed thru MQTT over WebSockets over TLS), using mongoose over the esp-idf net/ssl stack.

After several minutes of operation under load, the device aborts resulting in error messages as below, so far noticed only during re-establishing connection to broker. Any guidance towards where to look for resolution welcomed.

The surrounding code in esp_bignum.c says /* indicates a fundamental problem with driver */ ;-)

Resulting abort:

E (235656) bignum: Timed out waiting for RSA operation (op_reg 0x3ff02810 int_reg 0x1)

Stack trace:

0x40081884 in esp_log_write (level=<optimized out>, tag=<optimized out>, 
    format=0x3f40ffec "\033[0;31mE (%d) %s: Timed out waiting for RSA operation (op_reg 0x%x int_reg 0x%x)\033[0m\n")
    at /home/nathan/xnode/esp-idf/components/log/./log.c:188
188	    (*s_log_print_func)(format, list);
(gdb) bt
#0  0x40081884 in esp_log_write (level=<optimized out>, tag=<optimized out>, 
    format=0x3f40ffec "\033[0;31mE (%d) %s: Timed out waiting for RSA operation (op_reg 0x%x int_reg 0x%x)\033[0m\n")
    at /home/nathan/xnode/esp-idf/components/log/./log.c:188
#1  0x4012f7b8 in gettimeofday (ptimeval=0x1, ptimezone=0x3ffd0b08) at ../../../.././newlib/libc/syscalls/sysgettod.c:12
#2  0x40113095 in ecp_double_jac (grp=0x3ffde854, R=0x3ffed17c, P=0x3ffed17c)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ecp.c:924
#3  0x40113f2b in ecp_precompute_comb (grp=0x3ffde854, T=0x3ffed158, P=<optimized out>, w=<optimized out>, d=64)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ecp.c:1212
#4  0x40114142 in ecp_mul_comb (grp=0x3ffde854, R=0x3ffd0d80, m=0x3ffde8d0, P=0x3ffde900, 
    f_rng=0x401396c8 <mbedtls_ctr_drbg_random>, p_rng=0x3ffb9b38 <ctr_drbg>)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ecp.c:1377
#5  0x40114614 in mbedtls_ecp_mul (grp=0x3ffde854, R=0x3ffd0d80, m=0x3ffde8d0, P=0x3ffde900, 
    f_rng=0x401396c8 <mbedtls_ctr_drbg_random>, p_rng=0x3ffb9b38 <ctr_drbg>)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ecp.c:1631
#6  0x401425e6 in mbedtls_ecdh_compute_shared (grp=0x3ffde854, z=0x3ffde924, Q=0x3ffde900, d=0x3ffde8d0, 
    f_rng=0x401396c8 <mbedtls_ctr_drbg_random>, p_rng=0x3ffb9b38 <ctr_drbg>)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ecdh.c:69
#7  0x401427f4 in mbedtls_ecdh_calc_secret (ctx=0x3ffde854, olen=0x3ffdebf0, buf=0x3ffdec34 "", blen=1024, 
    f_rng=0x401396c8 <mbedtls_ctr_drbg_random>, p_rng=<optimized out>)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ecdh.c:251
#8  0x40144cb7 in ssl_write_client_key_exchange (ssl=0x3ffd7750)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ssl_cli.c:2779
#9  0x4014528b in mbedtls_ssl_handshake_client_step (ssl=0x3ffd7750)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ssl_cli.c:3330
#10 0x4013b8d0 in mbedtls_ssl_handshake_step (ssl=0x3ffd7750)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ssl_tls.c:6286
#11 0x4013b926 in mbedtls_ssl_handshake (ssl=0x3ffd7750)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ssl_tls.c:6310
#12 0x4012a112 in mg_ssl_if_handshake (nc=0x3ffd6908) at /home/nathan/xnode/SNESP32/components/mongoose/./mongoose.c:4535
#13 0x4012acc0 in mg_ssl_begin (nc=0x3ffd6908) at /home/nathan/xnode/SNESP32/components/mongoose/./mongoose.c:3466
#14 0x4012af41 in mg_write_to_socket (nc=0x3ffd6908) at /home/nathan/xnode/SNESP32/components/mongoose/./mongoose.c:3358
#15 0x4012b080 in mg_mgr_handle_conn (nc=0x3ffd6908, fd_flags=<optimized out>, now=1484451057.7952161)
    at /home/nathan/xnode/SNESP32/components/mongoose/./mongoose.c:3556
#16 0x4012b2c4 in mg_socket_if_poll (iface=<optimized out>, timeout_ms=50)
    at /home/nathan/xnode/SNESP32/components/mongoose/./mongoose.c:3742
#17 0x401299f1 in mg_mgr_poll (m=0x3ffb6db4 <mgr>, timeout_ms=50)
    at /home/nathan/xnode/SNESP32/components/mongoose/./mongoose.c:2287
---Type <return> to continue, or q <return> to quit---
#18 0x40109734 in NetworkTask (pvParameters=<optimized out>) at /home/nathan/xnode/SNESP32/main/./network.c:632


@nathanjel
Copy link
Contributor Author

I start to believe the fundamental problem there is, actually lies elsewhere, noticing now the surprising call to gettimeofday in the details of the backtrace. Indeed my software quite frequently (every few hundred ms or so) will call gettimeofday in one of the threads. Will explore this, anyhow any guidance welcomed.

@nathanjel
Copy link
Contributor Author

Another backtrace, this time without gettimeofday

Remote debugging using /dev/ttyUSB1
0x40081890 in esp_log_write (level=<optimized out>, tag=<optimized out>, 
    format=0x3f40ffec "\033[0;31mE (%d) %s: Timed out waiting for RSA operation (op_reg 0x%x int_reg 0x%x)\033[0m\n")
    at /home/nathan/xnode/esp-idf/components/log/./log.c:188
188	    (*s_log_print_func)(format, list);
(gdb) bt
#0  0x40081890 in esp_log_write (level=<optimized out>, tag=<optimized out>, 
    format=0x3f40ffec "\033[0;31mE (%d) %s: Timed out waiting for RSA operation (op_reg 0x%x int_reg 0x%x)\033[0m\n")
    at /home/nathan/xnode/esp-idf/components/log/./log.c:188
#1  0x40084e14 in vTaskExitCritical (mux=0x1) at /home/nathan/xnode/esp-idf/components/freertos/./tasks.c:4127
#2  0x401130a1 in ecp_double_jac (grp=0x3ffddc30, R=0x3ffdf8f0, P=0x3ffdf8f0)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ecp.c:924
#3  0x40113f37 in ecp_precompute_comb (grp=0x3ffddc30, T=0x3ffdf860, P=<optimized out>, w=<optimized out>, d=52)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ecp.c:1212
#4  0x4011414e in ecp_mul_comb (grp=0x3ffddc30, R=0x3ffddcb8, m=0x3ffddcac, P=0x3ffddc58, 
    f_rng=0x401396d4 <mbedtls_ctr_drbg_random>, p_rng=0x3ffb9b50 <ctr_drbg>)
    at /home/nathan/xnode/esp-idf/components/mbedtls/library/ecp.c:1377
.....

@nathanjel nathanjel changed the title esp_bignum aborts with E (235656) bignum: Timed out waiting for RSA operation (op_reg 0x3ff02810 int_reg 0x1) esp_bignum aborts with E (235656) bignum when in task with no CPU affinity / Timed out waiting for RSA operation (op_reg 0x3ff02810 int_reg 0x1) Jan 15, 2017
@nathanjel
Copy link
Contributor Author

nathanjel commented Jan 15, 2017

Resolved this problem by locking the Task core affinity to single core. My best bet is that the BIGNUM "co-procesor" perhipal is suffering the same issue as the generic FPU co-procesor, which is it can't start and stop at a different core.

I am pretty sure this is NOT a re-entrancy issue, as this is a) well protected in the library, b) I call these functions from a signle task only, so there should be no parallel execution of that code.

So there is indeed a fundamental problem in the driver. Need to add similar somewhere in the lines of execute_op in esp_bignum similar code to as in components/freertos/xtensa_vectors.S which blocks the task to single core, starting from line 935.

    /* TODO: Remove this as soon as coprocessor state moving works across cores - JD */
    /* FPU operations are incompatible with non-pinned tasks. If we have a FPU operation
       here, to keep the entire thing from crashing, it's better to pin the task to whatever
       core we're running on now. */
    movi    a2, pxCurrentTCB
    getcoreid a3
    addx4     a2,  a3, a2
    l32i    a2, a2, 0                       /* a2 = start of pxCurrentTCB[cpuid] */
    addi    a2, a2, TASKTCB_XCOREID_OFFSET  /* offset to xCoreID in tcb struct */
    s32i    a3, a2, 0                       /* store current cpuid */

@nathanjel nathanjel changed the title esp_bignum aborts with E (235656) bignum when in task with no CPU affinity / Timed out waiting for RSA operation (op_reg 0x3ff02810 int_reg 0x1) esp_bignum aborts randomly when in task with no CPU affinity / Timed out waiting for RSA operation (op_reg 0x3ff02810 int_reg 0x1) Jan 15, 2017
@nathanjel
Copy link
Contributor Author

The above resolution method does not remove the occurence of this problem. What seems to help is #256

@projectgus
Copy link
Contributor

Sorry I didn't reply to this earlier, I just commented on the PR linked above.

@nathanjel
Copy link
Contributor Author

This did not re-occur after introduction of recent SPI flash fixes.

0xFEEDC0DE64 pushed a commit to 0xFEEDC0DE64/esp-idf that referenced this issue May 5, 2021
* add DFRobot FireBeetle-ESP32 support

* add variant for fireBeetle-ESP32

* The name of the env variant is changed to firebeetle32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants