Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running custom classifier - Slow Inference (TFMIC-13) #25

Closed
william-hazem opened this issue Nov 11, 2022 · 7 comments
Closed

Running custom classifier - Slow Inference (TFMIC-13) #25

william-hazem opened this issue Nov 11, 2022 · 7 comments

Comments

@william-hazem
Copy link

william-hazem commented Nov 11, 2022

Hi Everyone!

I built my own classifier and embed it into esp32. But all time it's do an inference the watchdog time is triggered.

I (13060) Projeto: img 1 -> Inference 0 = car
E (18070) task_wdt: Task watchdog got triggered. The following tasks did not reset the watchdog in time:
E (18070) task_wdt:  - IDLE (CPU 0)
E (18070) task_wdt: Tasks currently running:
E (18070) task_wdt: CPU 0: main
E (18070) task_wdt: CPU 1: IDLE
E (18070) task_wdt: Print CPU 0 (current core) backtrace


Backtrace: 0x400E33EF:0x3FFB0B90 0x400827F9:0x3FFB0BB0 0x400E17D1:0x3FFDC310 0x400DBF6A:0x3FFDC3B0 0x400DF0A6:0x3FFDC750 0x400D65F6:0x3FFDC780 0x400D56E3:0x3FFDC7A0 0x400D55D2:0x3FFDC7D0 0x400F283B:0x3FFDC7F0 0x40087D31:0x3FFDC810
0x400e33ef: task_wdt_isr at C:/Espressif/frameworks/esp-idf-v4.4-2/components/esp_system/task_wdt.c:183 (discriminator 3)

0x400827f9: _xt_lowint1 at C:/Espressif/frameworks/esp-idf-v4.4-2/components/freertos/port/xtensa/xtensa_vectors.S:1111

0x400e17d1: esp_nn_conv_s8_opt at C:/Users/willi/OneDrive/Documentos/PlatformIO/IDF/projeto/components/esp-nn/src/convolution/esp_nn_conv_opt.c:157 (discriminator 2)

0x400dbf6a: tflite::(anonymous namespace)::Eval(TfLiteContext*, TfLiteNode*) at C:/Users/willi/OneDrive/Documentos/PlatformIO/IDF/projeto/components/tflite-lib/tensorflow/lite/micro/kernels/esp_nn/conv.cc:230
 (inlined by) Eval at C:/Users/willi/OneDrive/Documentos/PlatformIO/IDF/projeto/components/tflite-lib/tensorflow/lite/micro/kernels/esp_nn/conv.cc:293

0x400df0a6: tflite::MicroGraph::InvokeSubgraph(int) at C:/Users/willi/OneDrive/Documentos/PlatformIO/IDF/projeto/components/tflite-lib/tensorflow/lite/micro/micro_graph.cc:172

0x400d65f6: tflite::MicroInterpreter::Invoke() at C:/Users/willi/OneDrive/Documentos/PlatformIO/IDF/projeto/components/tflite-lib/tensorflow/lite/micro/micro_interpreter.cc:285

0x400d56e3: loop at C:/Users/willi/OneDrive/Documentos/PlatformIO/IDF/projeto/main/main_functions.cc:165

0x400d55d2: app_main at C:/Users/willi/OneDrive/Documentos/PlatformIO/IDF/projeto/main/main.cc:21 (discriminator 1)

0x400f283b: main_task at C:/Espressif/frameworks/esp-idf-v4.4-2/components/freertos/port/port_common.c:141 (discriminator 2)

0x40087d31: vPortTaskWrapper at C:/Espressif/frameworks/esp-idf-v4.4-2/components/freertos/port/xtensa/port.c:131


E (18070) task_wdt: Print CPU 1 backtrace


Backtrace: 0x40084221:0x3FFB1190 0x400827F9:0x3FFB11B0 0x4000BFED:0x3FFDD610 0x40087FE2:0x3FFDD620 0x400E364C:0x3FFDD640 0x400E3657:0x3FFDD670 0x400D20A5:0x3FFDD690 0x400866BA:0x3FFDD6B0 0x40087D31:0x3FFDD6D0
0x40084221: esp_crosscore_isr at C:/Espressif/frameworks/esp-idf-v4.4-2/components/esp_system/crosscore_int.c:92

0x400827f9: _xt_lowint1 at C:/Espressif/frameworks/esp-idf-v4.4-2/components/freertos/port/xtensa/xtensa_vectors.S:1111

0x40087fe2: vPortClearInterruptMaskFromISR at C:/Espressif/frameworks/esp-idf-v4.4-2/components/freertos/port/xtensa/include/freertos/portmacro.h:571
 (inlined by) vPortExitCritical at C:/Espressif/frameworks/esp-idf-v4.4-2/components/freertos/port/xtensa/port.c:319

0x400e364c: esp_task_wdt_reset at C:/Espressif/frameworks/esp-idf-v4.4-2/components/esp_system/task_wdt.c:330

0x400e3657: idle_hook_cb at C:/Espressif/frameworks/esp-idf-v4.4-2/components/esp_system/task_wdt.c:80

0x400d20a5: esp_vApplicationIdleHook at C:/Espressif/frameworks/esp-idf-v4.4-2/components/esp_system/freertos_hooks.c:51 (discriminator 1)

0x400866ba: prvIdleTask at C:/Espressif/frameworks/esp-idf-v4.4-2/components/freertos/tasks.c:3987 (discriminator 1)

0x40087d31: vPortTaskWrapper at C:/Espressif/frameworks/esp-idf-v4.4-2/components/freertos/port/xtensa/port.c:131


I (19400) Projeto: img 2 -> Inference 0 = car

My loop function:

void loop() {

  static int x = 0;

  if(imBuffer == NULL)
  {
    imBuffer = (float*) malloc(sizeof(float)*WIDTH*HEIGHT);
    if(!imBuffer)
    {
      ESP_LOGE(TAG, "Espaço não alocado - ABORTANDO");
      return;
    }

    ESP_LOGI(TAG, "Espaço alocado");
  }
  // retrieve image from flash
  getImage(imBuffer, x);
  for(int i = 0; i < WIDTH*HEIGHT; i++)
  {
    input->data.f[i] = imBuffer[i];
  }

  vTaskDelay(1);
  
  TfLiteStatus statusInvoke = interpreter->Invoke();
  if(statusInvoke != TfLiteStatus::kTfLiteOk)
  {
    ESP_LOGE(TAG, "Ocorreu um erro na inferência (%d)", statusInvoke);
    return;
  }
  
  float result = output->data.f[0];
  int pred = result > 0.5 ? 1 : 0;
  ESP_LOGI(TAG, "img %d -> Inference %d = %s", x, pred, labels[pred]);

  x = x < 4 ? x+1 : 0;
  vTaskDelay(10 / portTICK_PERIOD_MS);
}

Is there any way to improve speed without trigger watchdog?

@william-hazem william-hazem changed the title Running custom classifier Running custom classifier - Slow Inference Nov 13, 2022
@vikramdattu
Copy link
Collaborator

Hi @william-hazem The code you are using is already optimized and is taking benefits of esp-nn under the hood.

What is the size of the model you're using? and what is the inference time you get?
Please make sure to turn on QIO flash option and make sure the CPU is clocking at 240MHz if you're not.

@william-hazem
Copy link
Author

Hi @vikramdattu, thanks for answering.

My model has 14832 Bytes and tensors arena consumes around 155 kBytes. Currently i'm getting an inference between 4 and 5 secs. My model uses float data, and fews layers...

My esp is running with following:

  • CPU 240MHz
  • Flash Mode: QIO
  • Flash SPI speed 40MHz
  • ESP-NN: Optimized versions

@vikramdattu
Copy link
Collaborator

@william-hazem can you please change the SPI speed to 80MHz and check? Also, can you please use internal memory for Arena if that's a possibility?

Optionally, via menuconfig, you may want to set Task Watchdog timeout to 5 seconds.

@william-hazem
Copy link
Author

Changing SPI speed to 80MHz didn't improve speed. But I set compiler level to -O2 ("Performance") and then It runned better, the inference time dropped and now is running each inference near to 2400 ms instead of ~4400 ms.

I'm actually using this operators, and convolution2D are being very expensive for inference time and using a lot of memory.

static tflite::MicroMutableOpResolver<7> resolver;
resolver.AddFullyConnected();
resolver.AddQuantize();
resolver.AddDequantize();
resolver.AddAveragePool2D();
resolver.AddConv2D();
resolver.AddLogistic();
resolver.AddMean();

I think I need to make convolutional layers light, and keep compiling my idf project using -O2 flag to gain Performance.

@vikramdattu
Copy link
Collaborator

Hi, @william-hazem thanks for sharing the experiment of using the optimization flag.

Can you please analyze/share the breakup of the contribution of each layer type? You can find a reference here: https://github.com/espressif/tflite-micro-esp-examples/blob/master/examples/person_detection/main/main_functions.cc#L175

Also, you may want to check the share of each layer.
Please note that ESP-NN optimizations are not as effective on ESP32 as on ESP32-S3 which has AI instructions.

@SaketNer
Copy link

Hi, @william-hazem have you tried pruning the model and reducing the model size by quantisation to int8. In my case it helped improve the inference speed.

@github-actions github-actions bot changed the title Running custom classifier - Slow Inference Running custom classifier - Slow Inference (TFMIC-13) Dec 31, 2023
@william-hazem
Copy link
Author

I achieved better performance by adding some pooling layers to reduce the model size and changing the convolution kernel to reduce computational cost. The model was already quantized using int8, but I noticed a slight latency reduction when using it

Thanks by supporting @vikramdattu @SaketNer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants