Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICE, insn does not satisfy its constraints #14

Open
igrr opened this issue Nov 20, 2020 · 14 comments
Open

ICE, insn does not satisfy its constraints #14

igrr opened this issue Nov 20, 2020 · 14 comments

Comments

@igrr
Copy link

igrr commented Nov 20, 2020

Recently we ran into this issue on a couple of programs, when compiling code which uses floating point at -O2 optimization level:

$ xtensa-esp32-elf-gcc -c test.c  -mlongcalls -O2 -Wall -Werror -Wextra -Wpedantic -fdump-rtl-all
test.c: In function 'd':
test.c:15:1: error: insn does not satisfy its constraints:
 }
 ^
(insn 74 27 19 4 (set (reg:SF 19 f0 [orig:43 iftmp$0_6 ] [43])
        (mem/u/c:SF (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0  S4 A32])) "test.c":11 47 {movsf_internal}
     (nil))
during RTL pass: postreload
dump file: test.c.279r.postreload
test.c:15:1: internal compiler error: in extract_constrain_insn, at recog.c:2210

test.c (as minimal as i could c-reduce it):

extern unsigned char a();
extern float b;
extern unsigned char c, h;

void d(signed char unused)
{
  (void) unused;
  const unsigned char e = a();
  for (unsigned char f = c; f; f++) {
    for (unsigned char g = 0; g <= h; g++) {
      b = e ? b : 0;
      d(f);
    }
  }
}

I'm slowly learning my way through the RTL dumps, hoping to find the cause, but wanted to post just in case there is any obvious issue here.

GCC branch I'm using is https://github.com/espressif/gcc/tree/esp_based_on_8_4_0 (a few patches on top of GCC 8.4).

@jcmvbkbc
Copy link
Owner

Thanks for the report and reproducer. I can reproduce it. Looking at it.

@jcmvbkbc
Copy link
Owner

AFAICS movsf_internal pattern for xtensa does not have suitable constraints to match loading SF mode constant from a literal pool into an FP register, that's what offending RTL pattern tries to do. I'm not sure how to fix it properly yet. I guess movsf_internal need to be split into patterns with more restrictive predicates that don't allow loading FP registers from memory designated by symbol. Looking into it...

@bhcuong2008
Copy link

Hi, is there any update on this? I've also faced this when compiling ulab transform.c:67, movsf_internal, with esp-idf 3.3.5

Thank you.

mocleiri pushed a commit to mocleiri/tensorflow-micropython-examples that referenced this issue Aug 19, 2021
Fixes #25

At the -O2 optimization level we get gcc compiler errors which break the
build:

root@907bbbd0af42:/opt/tflite-micro-micropython/tensorflow# xtensa-esp32-elf-g++ -DNDEBUG -std=c++11  -fstrict-vol
atile-bitfields -mlongcalls -nostdlib -fno-rtti -fno-exceptions -fno-threadsafe-statics -Werror -fno-unwind-tables
 -ffunction-sections -fdata-sections -fmessage-length=0 -DTF_LITE_STATIC_MEMORY -DTF_LITE_DISABLE_X86_NEON -Wsign-
compare -Wdouble-promotion -Wshadow -Wunused-variable -Wmissing-field-initializers -Wunused-function -Wswitch -Wvl
a -Wall -Wextra -Wstrict-aliasing -Wno-unused-parameter -DESP -Wno-return-type -Wno-strict-aliasing -Wno-ignored-q
ualifiers -Wno-return-type -Wno-strict-aliasing -O2 -I. -Itensorflow/lite/micro/tools/make/downloads/gemmlowp -Ite
nsorflow/lite/micro/tools/make/downloads/flatbuffers/include -Itensorflow/lite/micro/tools/make/downloads/ruy -Ite
nsorflow/lite/micro/tools/make/gen/esp_xtensa-esp32_default/genfiles/ -Itensorflow/lite/micro/tools/make/downloads
/kissfft -c tensorflow/lite/micro/kernels/l2_pool_2d.cc -o tensorflow/lite/micro/tools/make/gen/esp_xtensa-esp32_d
efault/obj/kernels/tensorflow/lite/micro/kernels/l2_pool_2d.o
tensorflow/lite/micro/kernels/l2_pool_2d.cc: In function 'TfLiteStatus tflite::{anonymous}::L2Eval(TfLiteContext*,
 TfLiteNode*)':
tensorflow/lite/micro/kernels/l2_pool_2d.cc:128:1: error: insn does not satisfy its constraints:
 }
 ^
(insn 274 18 183 28 (set (reg/v:SF 20 f1 [orig:77 sum_squares ] [77])
        (mem/u/c:SF (symbol_ref/u:SI ("*.LC28") [flags 0x2]) [0  S4 A32])) "./tensorflow/lite/kernels/internal/ref
erence/pooling.h":169 47 {movsf_internal}
     (expr_list:REG_EQUAL (const_double:SF 0.0 [0x0.0p+0])
        (nil)))
during RTL pass: postreload
tensorflow/lite/micro/kernels/l2_pool_2d.cc:128:1: internal compiler error: in extract_constrain_insn, at recog.c:
2210
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://gcc.gnu.org/bugs/> for instructions.

Also refer to: jcmvbkbc/gcc-xtensa#14

This failure doesn't happen without the micropython specific build
options:
-fstrict-volatile-bitfields
-mlongcalls
-nostdlib

and it only happens at the -O2 optimization level.

This commit works around the problem by switching to the -O3
optimization level.
jcmvbkbc pushed a commit that referenced this issue May 8, 2023
Currently on xstormy16 SImode shifts by a single bit require two
instructions, and shifts by other non-zero integer immediate constants
require five instructions.  This patch implements the obvious optimization
that shifts by two bits can be done in four instructions, by using two
single-bit sequences.

Hence, ashift_2 was previously generated as:
        mov r7,r2 | shl r2,#2 | shl r3,#2 | shr r7,#14 | or r3,r7
        ret
and with this patch we now generate:
        shl r2,#1 | rlc r3,#1 | shl r2,#1 | rlc r3,#1
        ret

2023-04-23  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/stormy16/stormy16.cc (xstormy16_output_shift): Implement
	SImode shifts by two by performing a single bit SImode shift twice.

gcc/testsuite/ChangeLog
	* gcc.target/xstormy16/shiftsi.c: New test case.
@kisvegabor
Copy link

Facing the same issue with ESP-IDF v5.2.1 when trying to compile ThorVG for ESP32-S3.

@igrr
Copy link
Author

igrr commented Mar 13, 2024

@kisvegabor As a workaround, please try adding -fno-if-conversion (espressif/esp-idf#11696 (comment))

@kisvegabor
Copy link

I didn't help in my case, however espressif/esp-idf#11696 (comment) suggests that it's any optimization issue. So I added

set_source_files_properties(components/lvgl/src/libts/thorvg/tvgSwImage.cpp PROPERTIES COMPILE_FLAGS "-O0")

but it didn't help either.

However with -O0 optimization in menuconfig it worked. It seems set_source_files_properties has no affect. I've tried with absolute path too.

@igrr
Copy link
Author

igrr commented Mar 13, 2024

Most likely you are not using the set_source_files_properties command in the same directory where the library containing the file is defined.

From CMake manual:

By default, source file properties are only visible to targets added in the same directory (CMakeLists.txt).

If you give a pointer to the branch where you have tried using set_source_files_properties, I can suggest a way to fix this.

@kisvegabor
Copy link

Most likely you are not using the set_source_files_properties command in the same directory where the library containing the file is defined.

Ah, I didn't know that. Here is my lvgl forg with ThorVG included: https://github.com/kisvegabor/lvgl_upstream/tree/demo/ebike

It's required to add add_compile_options("-Wno-dangling-pointer") due to a ThorVG warning.

jcmvbkbc pushed a commit that referenced this issue Mar 22, 2024
Consider

  constexpr int VAL = 1;
  struct foo {
      template <int B>
      void bar(typename std::conditional<B==VAL, int, float>::type arg) { }
  };
  template void foo::bar<1>(int arg);

where we since r11-291 fail to emit the code for the explicit
instantiation.  That's because cp_walk_subtrees/TYPENAME_TYPE now
walks TYPE_CONTEXT ('conditional' here) as well, and in a template
finds the B==VAL template argument.  VAL is constexpr, which implies const,
which in the global scope implies static.  constrain_visibility_for_template
then makes "struct conditional<(B == VAL), int, float>" non-TREE_PUBLIC.
Then symtab_node::needed_p checks TREE_PUBLIC, sees it's 0, and we don't
emit any code.

I thought the fix would be some ODR-esque check to not consider
constexpr variables/fns that are used just for their value.  But
it turned out to be tricky.  For instance, we can't skip
determine_visibility in a template; we can't even skip it for value-dep
expressions.  For example, no-linkage-expr1.C has

  using P = struct {}*;
  template <int N>
  void f(int(*)[((P)0, N)]) {}

where ((P)0, N) is value-dep, but N is not relevant here: we have to
ferret out the anonymous type.  When instantiating, it's already gone.

This patch uses decl_constant_var_p.  This is to implement (an
approximation) [basic.def.odr]#14.5.1 and [basic.def.odr]#5.2.

	PR c++/110323

gcc/cp/ChangeLog:

	* decl2.cc (min_vis_expr_r) <case VAR_DECL>: Do nothing for
	decl_constant_var_p VAR_DECLs.

gcc/testsuite/ChangeLog:

	* g++.dg/template/explicit-instantiation6.C: New test.
	* g++.dg/template/explicit-instantiation7.C: New test.
@jjsuwa-sys3175
Copy link

@igrr
I was interested in this issue, so I tried the above sample in my local environment (based on 10.3, with as many diffs as possible applied from master), but it seems that the problem cannot be reproduced:

[test.c.282r.ira]

(insn 22 446 26 2 (set (reg:SF 167 [ iftmp$0_97 ])
        (mem/u/c:SF (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0  S4 A32])) "test.c":11:17 57 {movsf_internal}
     (expr_list:REG_EQUIV (mem/u/c:SF (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0  S4 A32])
        (nil)))
...
(insn 516 187 179 36 (set (reg:SF 75 [ iftmp$0_97 ])
        (reg:SF 167 [ iftmp$0_97 ])) "test.c":11:17 57 {movsf_internal}
     (nil))

[test.c.284r.postreload]

(insn 516 187 749 36 (set (reg:SF 10 a10 [orig:75 iftmp$0_97 ] [75])
        (mem/u/c:SF (symbol_ref/u:SI ("*.LC0") [flags 0x2]) [0  S4 A32])) "test.c":11:17 57 {movsf_internal}
     (nil))

Of course TARGET_HARD_FLOAT was enabled :)

The most impactful change recently is transition from reload to LRA, so IMHO this issue should be resolved with that.

@igrr
Copy link
Author

igrr commented Nov 11, 2024

Hi @jjsuwa-sys3175, thanks for testing and letting me know.

The most impactful change recently is transition from reload to LRA, so IMHO this issue should be resolved with that.

Can you please point me to the commit where this change has occurred? I'll try to build ESP toolchain based on that and try to reproduce.

@jjsuwa-sys3175
Copy link

jjsuwa-sys3175 commented Nov 11, 2024

@igrr

The most impactful change recently is transition from reload to LRA, so IMHO this issue should be resolved with that.

Can you please point me to the commit where this change has occurred? I'll try to build ESP toolchain based on that and try to reproduce.

24d5e0b
The register allocation logic has been completely revamped before and after this commit.

See also: GCC Wiki - reload GCC Wiki - LRAIsDefault

Q: What is reload?
A: Reload is the GCC equivalent of Satan.

Addendum: It would be better to also apply fb7b829.

@jjsuwa-sys3175
Copy link

@jcmvbkbc
A question about something that probably doesn't apply to the esp32, but is technically related:

The ISA refman (Cadence version) describes an instruction (CONST.S instruction) that assigns some immediate constants to HW FP registers, and requires only the "Floating-Point Coprocessor Option" as configuration option, but which HW version is this instruction available from?

If this instruction were available, the above movsf_internal would turn into legitimate and look like this:

(insn 16 7 19 2 (set (reg:SF 19 f0 [45])
        (const_double:SF 0.0 [0x0.0p+0])) "test.c":3:1 62 {movsf_internal}
     (expr_list:REG_EQUIV (const_double:SF 0.0 [0x0.0p+0])
        (nil)))

then, will result in the assembler output const.s f0, 0.

@jcmvbkbc
Copy link
Owner

which HW version is this instruction available from?

It's associated with the "Floating-Point Coprocessor option" (as opposed to the "Floating-Point 2000 Coprocessor option" which doesn't have it). The distinction between these two options is not really represented in the include/xtensa-config.h, but in the full configuration overlay it is marked by the macros XCHAL_HAVE_DFP, XCHAL_HAVE_DFPU_SINGLE_ONLY or XCHAL_HAVE_DFPU_SINGLE_DOUBLE.

@jameszah
Copy link

jameszah commented Jan 1, 2025

I have bumped into this with Arduino 1.8.19 using some 1 year old code (board library 2.??), being recompiled with esp32 board library 3.04.

The problem was a function call with 2 x int and 4 x 1-byte parameters -- the code that calls that function gets the "insn does not satisfy constraints"

So far I gave copied the code and dropped the function call to work-around the problem.
Written up over here: https://gist.github.com/jameszah/c8b03712ff9b286541811ac10437a645

Newer Update:

This innocuous piece of code is the problem --- got rid of the float.

      float factor = 2.0;
      if (k == 0 ) {
        factor = 2.0;
      } else if (k == 1) {
        factor = 1.0;
      } else {
        factor = 0.5;
      }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants