-
Notifications
You must be signed in to change notification settings - Fork 7.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTA boot partition selection & coredump not working (IDFGH-1717) #3954
Comments
Hi @ammaree! As I see, you are not using the ROLLBACK and ANTI-ROLLBACK options from IDF, right? Which size of your FW? and could you show your partiton_table.csv. Thanks. |
@KonstantinKondrashov We are not using ROLLBACK nor ANTI-ROLLBACK from IDF. We have separate logic that monitors the health of the device with the running firmware, primarily using the time since 1st boot and number of reboots since to determine whether we should roll back. Size of firmware is 967,344 bytes
Output of partitions from running system is:
|
If you suspect that the hang occurred in esp_err_t esp_ota_set_boot_partition(const esp_partition_t *partition)
{
if (partition == NULL) {
return ESP_ERR_INVALID_ARG;
}
if (image_validate(partition, ESP_IMAGE_VERIFY) != ESP_OK) {
return ESP_ERR_OTA_VALIDATE_FAILED;
}
// if set boot partition to factory bin ,just format ota info partition
if (partition->type == ESP_PARTITION_TYPE_APP) {
if (partition->subtype == ESP_PARTITION_SUBTYPE_APP_FACTORY) {
const esp_partition_t *find_partition = esp_partition_find_first(ESP_PARTITION_TYPE_DATA, ESP_PARTITION_SUBTYPE_DATA_OTA, NULL);
if (find_partition != NULL) {
return esp_partition_erase_range(find_partition, 0, find_partition->size);
} else {
return ESP_ERR_NOT_FOUND;
}
} else {
return esp_rewrite_ota_data(partition->subtype);
}
} else {
return ESP_ERR_INVALID_ARG;
}
} |
@ammaree Thanks for reporting the issue, would you please help provide more logs as suggested by @KonstantinKondrashov ? Thanks. |
@Alvin1Zhang @KonstantinKondrashov Gents, apologies. I have found the cause, it was due to the same task trying to take the same mutex twice, all in the process of logging messages whilst setting up the reboot. Solved by check for the current task already owning the mutex. The issue of coredump not working still remains. Shall I close this issue and open a new one for the coredump problem or shall I leave this open as a reminder ? |
HI @ammaree! |
Just an update. I have traced the problem as far as possible. The problem is related to the IDF 4.0 SPI Flash driver implementation. Using the legacy implementation the coredump works, 17 tasks are dumped to flash and the application uploads the coredump to cloud on reboot. PERFECT. When using the IDF 4.0 SPI Flash drive the core dump hangs in Line 130 of core_dump_flash.c where spi_flash_erase_range() is called. I did not debug further into spi_flash_erase_range() but the fault lies here. After a short delay the system reboots due to watchdog timeout as below.
I hope this help to narrow down the problem. |
Environment
git describe --tags
to find it):v4.0-dev-1443-g39f090a4f
xtensa-esp32-elf-gcc --version
to find it):(crosstool-NG esp32-2019r1) 8.2.0
Problem Description
The normal OTA functionality continues to work perfectly and we can download new FW via OTA as and when required. But, due to the stability of our FW we have not needed to use rollback during the last ~5 months, until very recently when it failed.
From tracing the flow we found the process hangs in esp_ota_set_boot_partition(). Is there any clear reason, possibly due to the significant changes that has been made over the last ~5 months, that would cause the hangup?
Expected Behavior
Actual Behavior
Steps to reproduce
Very easy and consistently reproduced in our dev environment. Only have to request a FW revert or select a specific boot image to trigger the hangup.
Also easily reproduced, just hard reboot a device in our test environment.
Code to reproduce this issue
Very difficult to provide sample code since the code is part of a very large code base, BUT the exact same code has been running since Jan 2018. Only update was done some months March 13 ( #1650 ) to calculate the core dump image size and discard the magic number that was removed.
Debug Logs
esp_ota_set_boot_partition() related:
Nothing to share here other than that it hangs in the function. No logs are shown.
Coredump related:
Other items if possible
sdkconfig.h.txt
The text was updated successfully, but these errors were encountered: