Update build time to utc 0am for gccefi2linux #2

hanzhan1 · 2021-03-12T03:08:14Z

Update build time to utc 0am for gccefi2linux nightly build.

…f SUBS/ADDS-immediate In this PR we end up generating an invalid instruction: adds x1,xzr,#2 because the pattern accepts zero as an operand in the comparison, but the instruction doesn't. Fix it by adjusting the predicate and constraints. gcc/ChangeLog: PR target/99822 * config/aarch64/aarch64.md (sub<mode>3_compare1_imm): Do not allow zero in operand 1. gcc/testsuite/ChangeLog: PR target/99822 * gcc.c-torture/compile/pr99822.c: New test.

gcc/ada/ * init.c (__gnat_raise_program_error): Fix parameter type. (Raise_From_Signal_Handler): Likewise and mark as no-return. * raise-gcc.c (__gnat_others_value): Fix type. (__gnat_all_others_value): Likewise. (__gnat_unhandled_others_value): Likewise. * seh_init.c (Raise_From_Signal_Handler): Fix parameter type. * libgnat/a-except.ads (Raise_From_Signal_Handler): Use convention C and new symbol name, move declaration to... (Raise_From_Controlled_Operation): Minor tweak. * libgnat/a-except.adb (Raise_From_Signal_Handler): ...here. * libgnat/a-exexpr.adb (bool): New C compatible boolean type. (Is_Handled_By_Others): Use it as return type for the function.

The fixed error is: ==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900 #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172 #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161 #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517 #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686 #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567 gcc-mirror#5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656 gcc-mirror#6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657 gcc-mirror#7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667 gcc-mirror#8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773 gcc-mirror#9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191 gcc-mirror#10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001 gcc-mirror#11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154 gcc-mirror#12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289 gcc-mirror#13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269 gcc-mirror#14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537 gcc-mirror#15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482 gcc-mirror#16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210 gcc-mirror#17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349 gcc-mirror#18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39 gcc-mirror#19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332 gcc-mirror#20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd) 0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920) allocated by thread T0 here: #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102 #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156 gcc/ChangeLog: * gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].

This extends the type name conflict detection mechanism to variables. gcc/c-family/ * c-ada-spec.c (check_name): Rename into... (check_type_name_conflict): ...this. Minor tweak. (dump_ada_function_declaration): Adjust to above renaming. (dump_ada_array_domains): Fix oversight. (dump_ada_declaration): Call check_type_name_conflict for variables.

gcc/ada/ * Make-generated.in: Add -f switch to ensure cp will never fail.

gcc/ada/ * libgnat/s-osprim__x32.adb: Add missing with clause.

As observed by Jakub in comment #2 of PR 98865, the expression -(a>>63) is optimized in GENERIC but not in GIMPLE. Investigating further it turns out that this is one of a few transformations performed by fold_negate_expr in fold-const.c that aren't yet performed by match.pd. This patch moves/duplicates them there, and should be relatively safe as these transformations are already performed by the compiler, but just in different passes. This revised patch adds a Boolean simplify argument to tree-ssa-sccvn.c's vn_nary_build_or_lookup_1 to control whether simplification should be performed before value numbering, updating the callers, but then avoiding simplification when constructing/value-numbering NEGATE_EXPR. This avoids the regression of gcc.dg/tree-ssa/ssa-free-88.c, and enables the new test case(s) to pass. 2021-09-22 Roger Sayle <roger@nextmovesoftware.com> Richard Biener <rguenther@suse.de> gcc/ChangeLog * match.pd (negation simplifications): Implement some negation folding transformations from fold-const.c's fold_negate_expr. * tree-ssa-sccvn.c (vn_nary_build_or_lookup_1): Add a SIMPLIFY argument, to control whether the op should be simplified prior to looking up/assigning a value number. (vn_nary_build_or_lookup): Update call to vn_nary_build_or_lookup_1. (vn_nary_simplify): Likewise. (visit_nary_op): Likewise, but when constructing a NEGATE_EXPR now call vn_nary_build_or_lookup_1 disabling simplification. gcc/testsuite/ChangeLog * gcc.dg/fold-negate-1.c: New test case.

Fixes: ==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178 READ of size 4 at 0x00000666ca5c thread T0 #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855 #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916 #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887 #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829 #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427 gcc-mirror#5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346 gcc-mirror#6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967 gcc-mirror#7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808 gcc-mirror#8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146 for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d. gcc/d/ChangeLog: * d-attribs.cc (parse_optimize_options): Check index before accessing cl_options.

As described in detail in the PR, in C++20 implicitly capturing 'this' via a '=' capture default is deprecated, and in C++17 adding an explicit 'this' capture alongside a '=' capture default is diagnosed as redundant (and is strictly speaking ill-formed). This means it's impossible to write, in a forward-compatible way, a C++17 lambda that has a '=' capture default and that also captures 'this' (implicitly or explicitly): [=] { this; } // #1 deprecated in C++20, OK in C++17 // GCC issues a -Wdeprecated warning in C++20 mode [=, this] { } // #2 ill-formed in C++17, OK in C++20 // GCC issues an unconditional warning in C++17 mode This patch resolves this dilemma by downgrading the warning for #2 into a -pedantic one. In passing, move it into the -Wc++20-extensions class of warnings and adjust its wording accordingly. PR c++/100493 gcc/cp/ChangeLog: * parser.c (cp_parser_lambda_introducer): In C++17, don't diagnose a redundant 'this' capture alongside a by-copy capture default unless -pedantic. Move the diagnostic into -Wc++20-extensions and adjust wording accordingly. gcc/testsuite/ChangeLog: * g++.dg/cpp1z/lambda-this1.C: Adjust expected diagnostics. * g++.dg/cpp1z/lambda-this8.C: New test. * g++.dg/cpp2a/lambda-this3.C: Compile with -pedantic in C++17 to continue to diagnose redundant 'this' captures.

…imize or target pragmas [PR103012] The following testcases ICE when an optimize or target pragma is followed by a long line (4096+ chars). This is because on such long lines we can't use columns anymore, but the cpp_define calls performed by c_cpp_builtins_optimize_pragma or from the backend hooks for target pragma are done on temporary buffers and expect to get columns from whatever line they appear on (which happens to be the long line after optimize/target pragma), and we run into: #0 fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986 #1 0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502 #2 0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827 #3 0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898 #4 0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592 gcc-mirror#5 0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394 gcc-mirror#6 0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601 gcc-mirror#7 0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639 gcc-mirror#8 0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589 gcc-mirror#9 0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513 gcc-mirror#10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522 gcc-mirror#11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>) at ../../gcc/c-family/c-cppbuiltin.c:600 assertion that LC_RENAME doesn't happen first. I think the right fix is emit those predefined macros upon optimize/target pragmas with BUILTINS_LOCATION, like we already do for those macros at the start of the TU, they don't appear in columns of the next line after it. Another possibility would be to force them at the location of the pragma. 2021-12-30 Jakub Jelinek <jakub@redhat.com> PR c++/103012 gcc/ * config/i386/i386-c.c (ix86_pragma_target_parse): Perform cpp_define/cpp_undef calls with forced token locations BUILTINS_LOCATION. * config/arm/arm-c.c (arm_pragma_target_parse): Likewise. * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise. * config/s390/s390-c.c (s390_pragma_target_parse): Likewise. gcc/c-family/ * c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform cpp_define_unused/cpp_undef calls with forced token locations BUILTINS_LOCATION. gcc/testsuite/ PR c++/103012 * g++.dg/cpp/pr103012.C: New test. * g++.target/i386/pr103012.C: New test.

This is a "canonical types differ for identical types" ICE, which started with r11-4682. It's a bit tricky to explain. Consider: template <typename T> struct S { S<T> bar() noexcept(T::value); // #1 S<T> foo() noexcept(T::value); // #2 }; template <typename T> S<T> S<T>::foo() noexcept(T::value) {} // #3 We ICE because #3 and #2 have the same type, but their canonical types differ: TYPE_CANONICAL (#3) == #2 but TYPE_CANONICAL (#2) == #1. The member functions #1 and #2 have the same type. However, since their noexcept-specifier is deferred, when parsing them, we create a variant for both of them, because DEFERRED_PARSE cannot be compared. In other words, build_cp_fntype_variant's tree v = TYPE_MAIN_VARIANT (type); for (; v; v = TYPE_NEXT_VARIANT (v)) if (cp_check_qualified_type (v, type, type_quals, rqual, raises, late)) return v; will *not* find an existing variant when creating a method_type for #2, so we have to create a new one. But then we perform delayed parsing and call fixup_deferred_exception_variants for #1 and #2. f_d_e_v will replace TYPE_RAISES_EXCEPTIONS with the newly parsed noexcept-specifier. It also sets TYPE_CANONICAL (#2) to #1. Both noexcepts turned out to be the same, so now we have two equivalent variants in the list! I.e., +-----------------+ +-----------------+ +-----------------+ | main | | #2 | | #1 | | S S::<T379>(S*) |----->| S S::<T37c>(S*) |----->| S S::<T37a>(S*) |----->NULL | - | | noex(T::value) | | noex(T::value) | +-----------------+ +-----------------+ +-----------------+ Then we get to #3. As for #1 and #2, grokdeclarator calls build_memfn_type, which ends up calling build_cp_fntype_variant, which will use the loop above to look for an existing variant. The first one that matches cp_check_qualified_type will be used, so we use #2 rather than #1, and the TYPE_CANONICAL mismatch follows. Hopefully that makes sense. As for the fix, I didn't think I could rewrite the method_type #2 with #1 because the type may have escaped via decltype. So my approach is to elide #2 from the list, so when looking for a matching variant, we always find #1 (#2 remains live though, which admittedly sounds sort of dodgy). PR c++/101715 gcc/cp/ChangeLog: * tree.cc (fixup_deferred_exception_variants): Remove duplicate variants after parsing the exception specifications. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept72.C: New test. * g++.dg/cpp0x/noexcept73.C: New test.

gcc/ada/ PR ada/104027 * gnat1drv.adb (Gnat1drv): Only call Exit_Program when not generating code, otherwise instead go to End_Of_Program.

The following adjusts the earlier change to still allow an uncritical replacement. 2022-02-09 Richard Biener <rguenther@suse.de> PR middle-end/104464 * gimple-isel.cc (gimple_expand_vec_cond_expr): Postpone throwing check to after unproblematic replacement. * gcc.dg/pr104464.c: New testcase.

…04617] On #define A(n) int foo1##n(void) { return 1##n; } #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9) E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325) B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642) testcase with ./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto -O0 -o foo1.o foo1.c -ffunction-sections ./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o /tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized (testcase too slow to be included into testsuite). The problem is clearly reported by readelf: readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321 readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323 readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section. readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section. because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE inclusive. Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be used instead and .symtab_shndx section should contain the real section index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >= SHN_LORESERVE value is needed it should put those into Shdr[0].sh_{size,link}. But, sh_{link,info} are 32-bit fields which can contain any section index. Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before 2011) used to mishandle the > 63.75K sections case and assumed there is a hole in between the sections, but what simple_object_elf_copy_lto_debug_sections does wouldn't help in that case for the debug temp object creation, we'd need to detect the case also in that routine and take it into account in the remapping etc. I think it is not worth it given that it is over 10 years, if somebody needs 63.75K or more sections, better use more recent binutils. 2022-02-22 Jakub Jelinek <jakub@redhat.com> PR lto/104617 * simple-object-elf.c (simple_object_elf_match): Fix up URL in comment. (simple_object_elf_copy_lto_debug_sections): Remap sh_info and sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE range (inclusive).

Testing cc1 on pr93032-mztools-unsigned-char.c Benchmark #1: (without patch) Time (mean ± σ): 338.8 ms ± 13.6 ms [User: 323.2 ms, System: 14.2 ms] Range (min … max): 326.7 ms … 363.1 ms 10 runs Benchmark #2: (with patch) Time (mean ± σ): 332.3 ms ± 12.8 ms [User: 316.6 ms, System: 14.3 ms] Range (min … max): 322.5 ms … 357.4 ms 10 runs Summary ./cc1.new ran 1.02 ± 0.06 times faster than ./cc1.old gcc/analyzer/ChangeLog: * store.cc (store::store): Presize m_cluster_map. Signed-off-by: David Malcolm <dmalcolm@redhat.com>

DR 2352 changed the definitions of reference-related (so that it uses "similar type" instead of "same type") and of reference-compatible (use a standard conversion sequence). That means that reference-related is now more broad, which means that we will be binding more things directly. The original patch for DR 2352 caused some problems, which were fixed in r276251 by creating a "fake" ck_qual in direct_reference_binding, so that in void f(int *); // #1 void f(const int * const &); // #2 int *x; int main() { f(x); // call #1 } we call #1. The extra ck_qual in #2 causes compare_ics to select #1, which is a better match for "int *" because then we don't have to do a qualification conversion. Let's turn to the problem in this PR. We have void f(const int * const &); // #1 void f(const int *); // #2 int *x; int main() { f(x); } We arrive in compare_ics to decide which one is better. The ICS for #1 looks like ck_ref_bind <- ck_qual <- ck_identity const int *const & const int *const int * and the ICS for #2 is ck_qual <- ck_rvalue <- ck_identity const int * int * int * We strip the reference and then comp_cv_qual_signature when comparing two ck_quals sees that "const int *" is a proper subset of "const int *const" and we return -1. But that's wrong; presumably the top-level "const" should be ignored and the call should be ambiguous. This patch adjust the type of the "fake" ck_qual so that this problem doesn't arise. PR c++/97296 gcc/cp/ChangeLog: * call.cc (direct_reference_binding): strip_top_quals when creating a ck_qual. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/ref-bind4.C: Add dg-error. * g++.dg/cpp0x/ref-bind8.C: New test.

Here ever since r12-6022-gbb2a7f80a98de3 we stopped deeming the partial specialization #2 to be more specialized than #1 ultimately because dependent operator expressions now have a DEPENDENT_OPERATOR_TYPE type instead of an empty type, and this made unify stop deducing T(2) == 1 for K during partial ordering for #1 and #2. This minimal patch fixes this by making the relevant logic in unify treat DEPENDENT_OPERATOR_TYPE like an empty type. PR c++/105425 gcc/cp/ChangeLog: * pt.cc (unify) <case TEMPLATE_PARM_INDEX>: Treat DEPENDENT_OPERATOR_TYPE like an empty type. gcc/testsuite/ChangeLog: * g++.dg/template/partial-specialization13.C: New test.

Here during cp_parser_single_declaration for #2, we were calling associate_classtype_constraints for TPL<T> (the primary template type) before maybe_process_partial_specialization could get a chance to notice that we're in fact declaring a distinct constrained partial spec and not redeclaring the primary template. This caused us to emit a bogus error about differing constraints b/t the primary template and #2's constraints. This patch fixes this by moving the call to associate_classtype_constraints after the call to shadow_tag (which calls maybe_process_partial_specialization) and adjusting shadow_tag to use the return value of m_p_p_s. Moreover, if we later try to define a constrained partial specialization that's been declared earlier (as in the third testcase), then maybe_new_partial_specialization correctly notices it's a redeclaration and returns NULL_TREE. But in this case we also need to update TYPE to point to the redeclared partial spec (it'll otherwise continue pointing to the primary template type, eventually leading to a bogus error). PR c++/96363 gcc/cp/ChangeLog: * decl.cc (shadow_tag): Use the return value of maybe_process_partial_specialization. * parser.cc (cp_parser_single_declaration): Call shadow_tag before associate_classtype_constraints. * pt.cc (maybe_new_partial_specialization): Change return type to bool. Take 'type' argument by mutable reference. Set 'type' to point to the correct constrained specialization when appropriate. (maybe_process_partial_specialization): Adjust accordingly. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-partial-spec12.C: New test. * g++.dg/cpp2a/concepts-partial-spec12a.C: New test. * g++.dg/cpp2a/concepts-partial-spec13.C: New test.

As explained in r11-4959-gde6f64f9556ae3, the atom cache assumes two equivalent expressions (according to cp_tree_equal) must use the same template parameters (according to find_template_parameters). This assumption turned out to not hold for TARGET_EXPR, which was addressed by that commit. But this assumption apparently doesn't hold for PARM_DECL either: find_template_parameters walks its DECL_CONTEXT but cp_tree_equal by default doesn't consider DECL_CONTEXT unless comparing_specializations is set. Thus in the first testcase below, the atomic constraints of #1 and #2 are equivalent according to cp_tree_equal, but according to find_template_parameters the former uses T and the latter uses both T and U (surprisingly). We could fix this assumption violation by setting comparing_specializations in the atom_hasher, which would make cp_tree_equal return false for the two atoms, but that seems overly pessimistic here. Ideally the atoms should continue being considered equivalent and we instead fix find_template_paremeters to return just T for #2's atom. To that end this patch makes for_each_template_parm_r stop walking the DECL_CONTEXT of a PARM_DECL. This should be safe to do because tsubst_copy / tsubst_decl only substitutes the TREE_TYPE of a PARM_DECL and doesn't bother substituting the DECL_CONTEXT, thus the only relevant template parameters are those used in its type. any_template_parm_r is currently responsible for walking its TREE_TYPE, but I suppose it now makes sense for for_each_template_parm_r to do so instead. In passing this patch also makes for_each_template_parm_r stop walking the DECL_CONTEXT of a VAR_/FUNCTION_DECL since doing so after walking DECL_TI_ARGS is redundant, I think. I experimented with not walking DECL_CONTEXT for CONST_DECL, but the second testcase below demonstrates it's necessary to walk it. PR c++/105797 gcc/cp/ChangeLog: * pt.cc (for_each_template_parm_r) <case FUNCTION_DECL, VAR_DECL>: Don't walk DECL_CONTEXT. <case PARM_DECL>: Likewise. Walk TREE_TYPE. <case CONST_DECL>: Simplify. (any_template_parm_r) <case PARM_DECL>: Don't walk TREE_TYPE. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-decltype4.C: New test. * g++.dg/cpp2a/concepts-memfun3.C: New test.

This patch implements C++23 P2255R2, which adds two new type traits to detect reference binding to a temporary. They can be used to detect code like std::tuple<const std::string&> t("meow"); which is incorrect because it always creates a dangling reference, because the std::string temporary is created inside the selected constructor of std::tuple, and not outside it. There are two new compiler builtins, __reference_constructs_from_temporary and __reference_converts_from_temporary. The former is used to simulate direct- and the latter copy-initialization context. But I had a hard time finding a test where there's actually a difference. Under DR 2267, both of these are invalid: struct A { } a; struct B { explicit B(const A&); }; const B &b1{a}; const B &b2(a); so I had to peruse [over.match.ref], and eventually realized that the difference can be seen here: struct G { operator int(); // #1 explicit operator int&&(); // #2 }; int&& r1(G{}); // use #2 (no temporary) int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&) The implementation itself was rather straightforward because we already have the conv_binds_ref_to_prvalue function. The main function here is ref_xes_from_temporary. I've changed the return type of ref_conv_binds_directly to tristate, because previously the function didn't distinguish between an invalid conversion and one that binds to a prvalue. Since it no longer returns a bool, I removed the _p suffix. The patch also adds the relevant class and variable templates to <type_traits>. PR c++/104477 gcc/c-family/ChangeLog: * c-common.cc (c_common_reswords): Add __reference_constructs_from_temporary and __reference_converts_from_temporary. * c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY. gcc/cp/ChangeLog: * call.cc (ref_conv_binds_directly_p): Rename to ... (ref_conv_binds_directly): ... this. Add a new bool parameter. Change the return type to tristate. * constraint.cc (diagnose_trait_expr): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. * cp-tree.h: Include "tristate.h". (enum cp_trait_kind): Add CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. (ref_conv_binds_directly_p): Rename to ... (ref_conv_binds_directly): ... this. (ref_xes_from_temporary): Declare. * cxx-pretty-print.cc (pp_cxx_trait_expression): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. * method.cc (ref_xes_from_temporary): New. * parser.cc (cp_parser_primary_expression): Handle RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY. (cp_parser_trait_expr): Likewise. (warn_for_range_copy): Adjust to call ref_conv_binds_directly. * semantics.cc (trait_expr_value): Handle CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY. (finish_trait_expr): Likewise. libstdc++-v3/ChangeLog: * include/std/type_traits (reference_constructs_from_temporary, reference_converts_from_temporary): New class templates. (reference_constructs_from_temporary_v, reference_converts_from_temporary_v): New variable templates. (__cpp_lib_reference_from_temporary): Define for C++23. * include/std/version (__cpp_lib_reference_from_temporary): Define for C++23. * testsuite/20_util/variable_templates_for_traits.cc: Test reference_constructs_from_temporary_v and reference_converts_from_temporary_v. * testsuite/20_util/reference_from_temporary/value.cc: New test. * testsuite/20_util/reference_from_temporary/value2.cc: New test. * testsuite/20_util/reference_from_temporary/version.cc: New test. gcc/testsuite/ChangeLog: * g++.dg/ext/reference_constructs_from_temporary1.C: New test. * g++.dg/ext/reference_converts_from_temporary1.C: New test.

This patch implements some additional zero-extension and sign-extension related optimizations in simplify-rtx.cc. The original motivation comes from PR rtl-optimization/71775, where in comment #2 Andrew Pinksi sees: Failed to match this instruction: (set (reg:DI 88 [ _1 ]) (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0))) On many platforms the result of DImode CTZ is constrained to be a small unsigned integer (between 0 and 64), hence the truncation to 32-bits (using a SUBREG) and the following sign extension back to 64-bits are effectively a no-op, so the above should ideally (often) be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))". To implement this, and some closely related transformations, we build upon the existing val_signbit_known_clear_p predicate. In the first chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x)) can itself be simplified. The second transformation is that we can canonicalized SIGN_EXTEND to ZERO_EXTEND (as in the PR 71775 case above) when the operand's sign-bit is known to be clear. The final two chunks are for SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a truncating SUBREG respectively. The nonzero_bits of a truncating SUBREG pessimistically thinks that the upper bits may have an arbitrary value (by taking the SUBREG), so we need look deeper at the SUBREG's operand to confirm that the high bits are known to be zero. Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with default architecture options is undefined at zero, so we can't be sure the upper bits of reg:DI 88 will be sign extended (all zeros or all ones). nonzero_bits knows this, so the above transformations don't trigger, but the transformations themselves are perfectly valid for other operations such as FFS, POPCOUNT and PARITY, and on other targets/-march settings where CTZ is defined at zero. 2022-08-03 Roger Sayle <roger@nextmovesoftware.com> Segher Boessenkool <segher@kernel.crashing.org> Richard Sandiford <richard.sandiford@arm.com> gcc/ChangeLog * simplify-rtx.cc (simplify_unary_operation_1) <ABS>: Add optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS and LSHIFTRT that are all positive to complement the existing FFS and idempotent ABS simplifications. <SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when val_signbit_known_clear_p is true of the operand. Simplify sign extensions of SUBREG truncations of operands that are already suitably (zero) extended. <ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations of operands that are already suitably zero extended.

In my previous patches I've been extending our std::move warnings, but this tweak actually dials it down a little bit. As reported in bug 89780, it's questionable to warn about expressions in templates that were type-dependent, but aren't anymore because we're instantiating the template. As in, template <typename T> Dest withMove() { T x; return std::move(x); } template Dest withMove<Dest>(); // #1 template Dest withMove<Source>(); // #2 Saying that the std::move is pessimizing for #1 is not incorrect, but it's not useful, because removing the std::move would then pessimize #2. So the user can't really win. At the same time, disabling the warning just because we're in a template would be going too far, I still want to warn for template <typename> Dest withMove() { Dest x; return std::move(x); } because the std::move therein will be pessimizing for any instantiation. So I'm using the suppress_warning machinery to that effect. Problem: I had to add a new group to nowarn_spec_t, otherwise suppressing the -Wpessimizing-move warning would disable a whole bunch of other warnings, which we really don't want. PR c++/89780 gcc/cp/ChangeLog: * pt.cc (tsubst_copy_and_build) <case CALL_EXPR>: Maybe suppress -Wpessimizing-move. * typeck.cc (maybe_warn_pessimizing_move): Don't issue warnings if they are suppressed. (check_return_expr): Disable -Wpessimizing-move when returning a dependent expression. gcc/ChangeLog: * diagnostic-spec.cc (nowarn_spec_t::nowarn_spec_t): Handle OPT_Wpessimizing_move and OPT_Wredundant_move. * diagnostic-spec.h (nowarn_spec_t): Add NW_REDUNDANT enumerator. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/Wpessimizing-move3.C: Remove dg-warning. * g++.dg/cpp0x/Wredundant-move2.C: Likewise.

The eliminate reg-reg move by inverting the condition of a cmove #2 peephole2 converts the following sequence: 473: bx:DI=[r14:DI*0x8+r12:DI] 960: r15:DI=r8:DI 485: {flags:CCC=cmp(r15:DI+bx:DI,bx:DI);r15:DI=r15:DI+bx:DI;} 737: r15:DI={(geu(flags:CCC,0))?r15:DI:bx:DI} to: 1110: {flags:CCC=cmp(r8:DI+bx:DI,bx:DI);r8:DI=r8:DI+bx:DI;} 1111: r15:DI=[r14:DI*0x8+r12:DI] 1112: r15:DI={(geu(flags:CCC,0))?r8:DI:r15:DI} Please note that(insn 1110) uses register BX, but its initialization was eliminated. Avoid conversion if eliminated move intialized a register, used in the moved instruction. 2022-11-03 Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog: PR target/107404 * config/i386/i386.md (eliminate reg-reg move by inverting the condition of a cmove #2 peephole2): Check if eliminated move initialized a register, used in the moved instruction. gcc/testsuite/ChangeLog: PR target/107404 * g++.target/i386/pr107404.C: New test.

While looking at PR 105549, which is about fixing the ABI break introduced in GCC 9.1 in parameter alignment with bit-fields, we noticed that the GCC 9.1 warning is not emitted in all the cases where it should be. This patch fixes that and the next patch in the series fixes the GCC 9.1 break. We split this into two patches since patch #2 introduces a new ABI break starting with GCC 13.1. This way, patch #1 can be back-ported to release branches if needed to fix the GCC 9.1 warning issue. The main idea is to add a new global boolean that indicates whether we're expanding the start of a function, so that aarch64_layout_arg can emit warnings for callees as well as callers. This removes the need for aarch64_function_arg_boundary to warn (with its incomplete information). However, in the first patch there are still cases where we emit warnings were we should not; this is fixed in patch #2 where we can distinguish between GCC 9.1 and GCC.13.1 ABI breaks properly. The fix in aarch64_function_arg_boundary (replacing & with &&) looks like an oversight of a previous commit in this area which changed 'abi_break' from a boolean to an integer. We also take the opportunity to fix the comment above aarch64_function_arg_alignment since the value of the abi_break parameter was changed in a previous commit, no longer matching the description. 2022-11-28 Christophe Lyon <christophe.lyon@arm.com> Richard Sandiford <richard.sandiford@arm.com> gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Fix comment. (aarch64_layout_arg): Factorize warning conditions. (aarch64_function_arg_boundary): Fix typo. * function.cc (currently_expanding_function_start): New variable. (expand_function_start): Handle currently_expanding_function_start. * function.h (currently_expanding_function_start): Declare. gcc/testsuite/ChangeLog: * gcc.target/aarch64/bitfield-abi-warning-align16-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align32-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c: New test. * gcc.target/aarch64/bitfield-abi-warning-align8-O2.c: New test. * gcc.target/aarch64/bitfield-abi-warning.h: New test. * g++.target/aarch64/bitfield-abi-warning-align16-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning-align16-O2-extra.C: New test. * g++.target/aarch64/bitfield-abi-warning-align32-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning-align32-O2-extra.C: New test. * g++.target/aarch64/bitfield-abi-warning-align8-O2.C: New test. * g++.target/aarch64/bitfield-abi-warning.h: New test.

Here the ahead-of-time overload set pruning in finish_call_expr is unintentionally returning a CALL_EXPR whose (pruned) callee is wrapped in an ADDR_EXPR, despite the original callee not being wrapped in an ADDR_EXPR. This ends up causing a bogus declaration mismatch error in the below testcase because the call to min in #1 gets expressed as a CALL_EXPR of ADDR_EXPR of FUNCTION_DECL, whereas the level-lowered call to min in #2 gets expressed instead as a CALL_EXPR of FUNCTION_DECL. This patch fixes this by stripping the spurious ADDR_EXPR appropriately. Thus the first call to min now also gets expressed as a CALL_EXPR of FUNCTION_DECL, matching the behavior before r12-6075-g2decd2cabe5a4f. PR c++/107461 gcc/cp/ChangeLog: * semantics.cc (finish_call_expr): Strip ADDR_EXPR from the selected callee during overload set pruning. gcc/testsuite/ChangeLog: * g++.dg/template/call9.C: New test.

After r13-5684-g59e0376f607805 the (pruned) callee of a non-dependent CALL_EXPR is a bare FUNCTION_DECL rather than ADDR_EXPR of FUNCTION_DECL. This innocent change revealed that cp_tree_equal doesn't first check dependence of a CALL_EXPR before treating a FUNCTION_DECL callee as a dependent name, which leads to us incorrectly accepting the first two testcases below and rejecting the third: * In the first testcase, cp_tree_equal incorrectly returns true for the two non-dependent CALL_EXPRs f(0) and f(0) (whose CALL_EXPR_FN are different FUNCTION_DECLs) which causes us to treat #2 as a redeclaration of #1. * Same issue in the second testcase, for f<int*>() and f<char>(). * In the third testcase, cp_tree_equal incorrectly returns true for f<int>() and f<void(*)(int)>() which causes us to conflate the two dependent specializations A<decltype(f<int>()(U()))> and A<decltype(f<void(*)(int)>()(U()))>. This patch fixes this by making called_fns_equal treat two callees as dependent names only if the overall CALL_EXPRs are dependent, via a new convenience function call_expr_dependent_name that is like dependent_name but also checks dependence of the overall CALL_EXPR. PR c++/107461 gcc/cp/ChangeLog: * cp-tree.h (call_expr_dependent_name): Declare. * pt.cc (iterative_hash_template_arg) <case CALL_EXPR>: Use call_expr_dependent_name instead of dependent_name. * tree.cc (call_expr_dependent_name): Define. (called_fns_equal): Adjust to take two CALL_EXPRs instead of CALL_EXPR_FNs thereof. Use call_expr_dependent_name instead of dependent_name. (cp_tree_equal) <case CALL_EXPR>: Adjust call to called_fns_equal. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/overload5.C: New test. * g++.dg/cpp0x/overload5a.C: New test. * g++.dg/cpp0x/overload6.C: New test.

… in asm in different mode See gcc.c-torture/execute/20030222-1.c. Consider the code for 32-bit (e.g. BE) target: int i, v; long x; x = v; asm ("" : "=r" (i) : "0" (x)); We generate the following RTL with reload insns: 1. subreg:si(x:di, 0) = 0; 2. subreg:si(x:di, 4) = v:si; 3. t:di = x:di, dead x; 4. asm ("" : "=r" (subreg:si(t:di,4)) : "0" (t:di)) 5. i:si = subreg:si(t:di,4); If we assign hard reg of x to t, dead code elimination will remove insn #2 and we will use unitialized hard reg. So exclude the hard reg of x for t. We could ignore this problem for non-empty asm using all x value but it is hard to check that the asm are expanded into insn realy using x and setting r. The old reload pass used the same approach. gcc/ChangeLog * lra-constraints.cc (match_reload): Exclude some hard regs for multi-reg inout reload pseudos used in asm in different mode.

Currently on xstormy16 SImode shifts by a single bit require two instructions, and shifts by other non-zero integer immediate constants require five instructions. This patch implements the obvious optimization that shifts by two bits can be done in four instructions, by using two single-bit sequences. Hence, ashift_2 was previously generated as: mov r7,r2 | shl r2,#2 | shl r3,#2 | shr r7,gcc-mirror#14 | or r3,r7 ret and with this patch we now generate: shl r2,#1 | rlc r3,#1 | shl r2,#1 | rlc r3,#1 ret 2023-04-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.cc (xstormy16_output_shift): Implement SImode shifts by two by performing a single bit SImode shift twice. gcc/testsuite/ChangeLog * gcc.target/xstormy16/shiftsi.c: New test case.

I noticed that for member class templates of a class template we were unnecessarily substituting both the template and its type. Avoiding that duplication speeds compilation of this silly testcase from ~12s to ~9s on my laptop. It's unlikely to make a difference on any real code, but the simplification is also nice. We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation of the template class, but it makes more sense to do that in tsubst_template_decl anyway. #define NC(X) \ template <class U> struct X##1; \ template <class U> struct X##2; \ template <class U> struct X##3; \ template <class U> struct X##4; \ template <class U> struct X#gcc-mirror#5; \ template <class U> struct X#gcc-mirror#6; #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f) #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E) template <int I> struct A { NC3(am) }; template <class...Ts> void sink(Ts...); template <int...Is> void g() { sink(A<Is>()...); } template <int I> void f() { g<__integer_pack(I)...>(); } int main() { f<1000>(); } gcc/cp/ChangeLog: * pt.cc (instantiate_class_template): Skip the RECORD_TYPE of a class template. (tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.

…i parts [2] [part #2 of PR/109279] SPEC2017 deepsjeng uses large constants which currently generates less than ideal code. This fix improves codegen for large constants which have same low and hi parts: e.g. long long f(void) { return 0x0101010101010101ull; } Before li a5,0x1010000 addi a5,a5,0x101 mv a0,a5 slli a5,a5,32 add a0,a5,a0 ret With patch li a5,0x1010000 addi a5,a5,0x101 slli a0,a5,32 add a0,a0,a5 ret This is testsuite clean. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_split_integer): if loval is equal to hival, ASHIFT the corresponding regs. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>

This patch decreses one machine instruction from "single bit extraction with shifting" operation, and tries to eliminate the conditional branch if CST2_POW2 doesn't fit into signed 12 bits with the help of ifcvt optimization. /* example #1 */ int test0(int x) { return (x & 1048576) != 0 ? 1024 : 0; } extern int foo(void); int test1(void) { return (foo() & 1048576) != 0 ? 16777216 : 0; } ;; before test0: movi a9, 0x400 srai a2, a2, 10 and a2, a2, a9 ret.n test1: addi sp, sp, -16 s32i.n a0, sp, 12 call0 foo extui a2, a2, 20, 1 slli a2, a2, 20 beqz.n a2, .L2 movi.n a2, 1 slli a2, a2, 24 .L2: l32i.n a0, sp, 12 addi sp, sp, 16 ret.n ;; after test0: extui a2, a2, 20, 1 slli a2, a2, 10 ret.n test1: addi sp, sp, -16 s32i.n a0, sp, 12 call0 foo l32i.n a0, sp, 12 extui a2, a2, 20, 1 slli a2, a2, 24 addi sp, sp, 16 ret.n In addition, if the left shift amount ('exact_log2(CST2_POW2)') is between 1 through 3 and a either addition or subtraction with another register follows, emit a ADDX[248] or SUBX[248] machine instruction instead of separate left shift and add/subtract ones. /* example #2 */ int test2(int x, int y) { return ((x & 1048576) != 0 ? 4 : 0) + y; } int test3(int x, int y) { return ((x & 2) != 0 ? 8 : 0) - y; } ;; before test2: movi.n a9, 4 srai a2, a2, 18 and a2, a2, a9 add.n a2, a2, a3 ret.n test3: movi.n a9, 8 slli a2, a2, 2 and a2, a2, a9 sub a2, a2, a3 ret.n ;; after test2: extui a2, a2, 20, 1 addx4 a2, a2, a3 ret.n test3: extui a2, a2, 1, 1 subx8 a2, a2, a3 ret.n gcc/ChangeLog: * config/xtensa/predicates.md (addsub_operator): New. * config/xtensa/xtensa.md (*extzvsi-1bit_ashlsi3, *extzvsi-1bit_addsubx): New insn_and_split patterns. * config/xtensa/xtensa.cc (xtensa_rtx_costs): Add a special case about ifcvt 'noce_try_cmove()' to handle constant loads that do not fit into signed 12 bits in the patterns added above.

This patch is inspired by Jakub's work on PR rtl-optimization/110717. The bitfield example described in comment #2, looks like: struct S { __int128 a : 69; }; unsigned type bar (struct S *p) { return p->a; } which on x86_64 with -O2 currently generates: bar: movzbl 8(%rdi), %ecx movq (%rdi), %rax andl $31, %ecx movq %rcx, %rdx salq $59, %rdx sarq $59, %rdx ret The ANDL $31 is interesting... we first extract an unsigned 69-bit bitfield by masking/clearing the top bits of the most significant word, and then it gets sign-extended, by left shifting and arithmetic right shifting. Obviously, this bit-wise AND is redundant, for signed bit-fields, we don't require these bits to be cleared, if we're about to set them appropriately. This patch eliminates this redundancy in the middle-end, during RTL expansion, but extending the extract_bit_field APIs so that the integer UNSIGNEDP argument takes a special value; 0 indicates the field should be sign extended, 1 (any non-zero value) indicates the field should be zero extended, but -1 indicates a third option, that we don't care how or whether the field is extended. By passing and checking this sentinel value at the appropriate places we avoid the useless bit masking (on all targets). For the test case above, with this patch we now generate: bar: movzbl 8(%rdi), %ecx movq (%rdi), %rax movq %rcx, %rdx salq $59, %rdx sarq $59, %rdx ret 2023-08-04 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * expmed.cc (extract_bit_field_1): Document that an UNSIGNEDP value of -1 is equivalent to don't care. (extract_integral_bit_field): Indicate that we don't require the most significant word to be zero extended, if we're about to sign extend it. (extract_fixed_bit_field_1): Document that an UNSIGNEDP value of -1 is equivalent to don't care. Don't clear the most significant bits with AND mask when UNSIGNEDP is -1. gcc/testsuite/ChangeLog * gcc.target/i386/pr110717-2.c: New test case.

Given below example for VLS mode void test (vl_t *u) { vl_t t; long long *p = (long long *)&t; p[0] = p[1] = 2; *u = t; } The vec_set will simplify the insn to vmv.s.x when index is 0, without merged operand. That will result in some problems in DCE, aka: 1: 137[DI] = a0 2: 138[V2DI] = 134[V2DI] // deleted by DCE 3: 139[DI] = #2 // deleted by DCE 4: 140[DI] = #2 // deleted by DCE 5: 141[V2DI] = vec_dup:V2DI (139[DI]) // deleted by DCE 6: 138[V2DI] = vslideup_imm (138[V2DI], 141[V2DI], 1) // deleted by DCE 7: 135[V2DI] = 138[V2DI] // deleted by DCE 8: 142[V2DI] = 135[V2DI] // deleted by DCE 9: 143[DI] = #2 10: 142[V2DI] = vec_dup:V2DI (143[DI]) 11: (137[DI]) = 142[V2DI] The higher 64 bits of 142[V2DI] is unknown here and it generated incorrect code when store back to memory. This patch would like to fix this issue by adding a new SCALAR_MOVE_MERGED_OP for vec_set. Please note this patch doesn't enable VLS for vec_set, the underlying patches will support this soon. gcc/ChangeLog: * config/riscv/autovec.md: Bugfix. * config/riscv/riscv-protos.h (SCALAR_MOVE_MERGED_OP): New enum. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/scalar-move-merged-run-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>

Improve stack protector patterns and peephole2s even more: a. Use unrelated register clears with integer mode size <= word mode size to clear stack protector scratch register. b. Use unrelated register initializations in front of stack protector sequence to clear stack protector scratch register. c. Use unrelated register initializations using LEA instructions to clear stack protector scratch register. These stack protector improvements reuse 6914 unrelated register initializations to substitute the clear of stack protector scratch register in 12034 instances of stack protector sequence in recent linux defconfig build. gcc/ChangeLog: * config/i386/i386.md (@stack_protect_set_1_<PTR:mode>_<W:mode>): Use W mode iterator instead of SWI48. Output MOV instead of XOR for TARGET_USE_MOV0. (stack_protect_set_1 peephole2): Use integer modes with mode size <= word mode size for operand 3. (stack_protect_set_1 peephole2 #2): New peephole2 pattern to substitute stack protector scratch register clear with unrelated register initialization, originally in front of stack protector sequence. (*stack_protect_set_3_<PTR:mode>_<SWI48:mode>): New insn pattern. (stack_protect_set_1 peephole2): New peephole2 pattern to substitute stack protector scratch register clear with unrelated register initialization involving LEA instruction.

Use unrelated register initializations using zero/sign-extend instructions to clear stack protector scratch register. Hanlde only SI -> DImode extensions for 64-bit targets, as this is the only extension that triggers the peephole in a non-negligible number. Also use explicit check for word_mode instead of mode iterator in peephole2 patterns to avoid pattern explosion. gcc/ChangeLog: * config/i386/i386.md (stack_protect_set_1 peephole2): Explicitly check operand 2 for word_mode. (stack_protect_set_1 peephole2 #2): Ditto. (stack_protect_set_2 peephole2): Ditto. (stack_protect_set_3 peephole2): Ditto. (*stack_protect_set_4z_<mode>_di): New insn patter. (*stack_protect_set_4s_<mode>_di): Ditto. (stack_protect_set_4 peephole2): New peephole2 pattern to substitute stack protector scratch register clear with unrelated register initialization involving zero/sign-extend instruction.

Since the last import from upstream libsanitizer, the output has changed and now looks more like this: READ of size 6 at 0x7ff7beb2a144 thread T0 #0 0x101cf7796 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) sanitizer_common_interceptors.inc:813 #1 0x101cf7b99 in memcmp sanitizer_common_interceptors.inc:840 #2 0x108a0c39f in __stack_chk_guard+0xf (dyld:x86_64+0x8039f) so let's adjust the pattern accordingly. gcc/testsuite/ChangeLog: * c-c++-common/asan/memcmp-1.c: Adjust pattern on darwin.

…-int (PR target/112413) On m68k the compiler assumes that the PC-relative jump-via-jump-table instruction and the jump table are adjacent with no padding in between. When -mlong-jump-table-offsets is combined with -malign-int, a 2-byte nop may be inserted before the jump table, causing the jump to add the fetched offset to the wrong PC base and thus jump to the wrong address. Fixed by referencing the jump table via its label. On the test case in the PR the object code change is (the moveal at 16 is the nop): a: 6536 bcss 42 <f+0x42> c: e588 lsll #2,%d0 e: 203b 0808 movel %pc@(18 <f+0x18>,%d0:l),%d0 - 12: 4efb 0802 jmp %pc@(16 <f+0x16>,%d0:l) + 12: 4efb 0804 jmp %pc@(18 <f+0x18>,%d0:l) 16: 284c moveal %a4,%a4 18: 0000 0020 orib gcc-mirror#32,%d0 1c: 0000 002c orib gcc-mirror#44,%d0 Bootstrapped and tested on m68k-linux-gnu, no regressions. Note: I don't have commit rights to I would need assistance applying this. PR target/112413 gcc/ * config/m68k/linux.h (ASM_RETURN_CASE_JUMP): For TARGET_LONG_JUMP_TABLE_OFFSETS, reference the jump table via its label. * config/m68k/m68kelf.h (ASM_RETURN_CASE_JUMP): Likewise. * config/m68k/netbsd-elf.h (ASM_RETURN_CASE_JUMP): Likewise.

During partial ordering, we want to look through dependent alias template specializations within template arguments and otherwise treat them as opaque in other contexts (see e.g. r7-7116-g0c942f3edab108 and r11-7011-g6e0a231a4aa240). To that end template_args_equal was given a partial_order flag that controls this behavior. This flag does the right thing when a dependent alias template specialization appears as template argument of the partial specialization, e.g. in template<class T, class...> using first_t = T; template<class T> struct traits; template<class T> struct traits<first_t<T, T&>> { }; // #1 template<class T> struct traits<first_t<const T, T&>> { }; // #2 we correctly consider #2 to be more specialized than #1. But if the alias specialization appears as a nested template argument of another class template specialization, e.g. in template<class T> struct traits<A<first_t<T, T&>>> { }; // #1 template<class T> struct traits<A<first_t<const T, T&>>> { }; // #2 then we incorrectly consider #1 and #2 to be unordered. This is because 1. we don't propagate the flag to recursive template_args_equal calls 2. we don't use structural equality for class template specializations written in terms of dependent alias template specializations This patch fixes the first issue by turning the partial_order flag into a global. This patch fixes the second issue by making us propagate structural equality appropriately when building a class template specialization. In passing this patch also improves hashing of specializations that use structural equality. PR c++/90679 gcc/cp/ChangeLog: * cp-tree.h (comp_template_args): Remove partial_order parameter. (template_args_equal): Likewise. * pt.cc (comparing_for_partial_ordering): New global flag. (iterative_hash_template_arg) <case tcc_type>: Hash the template and arguments for specializations that use structural equality. (template_args_equal): Remove partial order parameter and use comparing_for_partial_ordering instead. (comp_template_args): Likewise. (comp_template_args_porder): Set comparing_for_partial_ordering instead. Make static. (any_template_arguments_need_structural_equality_p): Return true for an argument that's a dependent alias template specialization or a class template specialization that itself needs structural equality. * tree.cc (cp_tree_equal) <case TREE_VEC>: Adjust call to comp_template_args. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/alias-decl-75a.C: New test. * g++.dg/cpp0x/alias-decl-75b.C: New test.

This patch adjusts the costs so that we treat REG and SUBREG expressions the same for costing. This was motivated by bt_skip_func and bt_find_func in xz and results in nearly a 5% improvement in the dynamic instruction count for input #2 and smaller, but definitely visible improvements pretty much across the board. Exceptions would be perlbench input #1 and exchange2 which showed very small regressions. In the bt_find_func and bt_skip_func cases we have something like this: > (insn 10 7 11 2 (set (reg/v:DI 136 [ x ]) > (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip} > (nil)) > (insn 11 10 12 2 (set (reg:DI 142 [ _1 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3} > (nil)) [ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ]) > (plus:DI (reg/v:DI 136 [ x ]) > (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3} > (nil)) Note the two uses of (reg 136). The best way to handle that in combine might be a 3->2 split. But there's a much better approach if we look at fwprop... (set (reg:DI 142 [ _1 ]) (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0)) (reg/v:DI 139 [ b ]))) change not profitable (cost 4 -> cost 8) So that should be the same cost as a regular DImode addition when the ZBA extension is enabled. But it ends up costing more because the clause to cost this variant isn't prepared to handle a SUBREG. That results in the RTL above having too high a cost and fwprop gives up. One approach would be to replace the REG_P with REG_P || SUBREG_P in the costing code. I ultimately decided against that and instead check if the operand in question passes register_operand. By far the most important case to handle is the DImode PLUS. But for the sake of consistency, I changed the other instances in riscv_rtx_costs as well. For those other cases we're talking about improvements in the .000001% range. While we are into stage4, this just hits cost modeling which we've generally agreed is still appropriate (though we were mostly talking about vector). So I'm going to extend that general agreement ever so slightly and include scalar cost modeling :-) gcc/ * config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG similarly. gcc/testsuite/ * gcc.target/riscv/reg_subreg_costs.c: New test. Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>

…o_debug_section [PR116614] cat abc.C #define A(n) struct T##n {} t##n; #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9) #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9) #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9) #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9) E(1) E(2) E(3) int main () { return 0; } ./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c ./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2 (not included in testsuite as it takes a while to compile) FAILs with lto-wrapper: fatal error: Too many copied sections: Operation not supported compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status The following patch fixes that. Most of the 64K+ section support for reading and writing was already there years ago (and especially reading used quite often already) and a further bug fixed in it in the PR104617 fix. Yet, the fix isn't solely about removing the if (new_i - 1 >= SHN_LORESERVE) { *err = ENOTSUP; return "Too many copied sections"; } 5 lines, the missing part was that the function only handled reading of the .symtab_shndx section but not copying/updating of it. If the result has less than 64K-epsilon sections, that actually wasn't needed, but e.g. with -fdebug-types-section one can exceed that pretty easily (reported to us on WebKitGtk build on ppc64le). Updating the section is slightly more complicated, because it basically needs to be done in lock step with updating the .symtab section, if one doesn't need to use SHN_XINDEX in there, the section should (or should be updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would be overwise stored but couldn't fit. But repeating due to that all the symtab decisions what to discard and how to rewrite it would be ugly. So, the patch instead emits the .symtab_shndx section (or sections) last and prepares the content during the .symtab processing and in a second pass when going just through .symtab_shndx sections just uses the saved content. 2024-09-07 Jakub Jelinek <jakub@redhat.com> PR lto/116614 * simple-object-elf.c (SHN_COMMON): Align comment with neighbouring comments. (SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for consistency. (simple_object_elf_find_sections): Formatting fixes. (simple_object_elf_fetch_attributes): Likewise. (simple_object_elf_attributes_merge): Likewise. (simple_object_elf_start_write): Likewise. (simple_object_elf_write_ehdr): Likewise. (simple_object_elf_write_shdr): Likewise. (simple_object_elf_write_to_file): Likewise. (simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy over .symtab_shndx sections, though emit those last and compute their section content when processing associated .symtab sections. Handle simple_object_internal_read failure even in the .symtab_shndx reading case.

This is another case of load hoisting breaking UID order in the preheader, this time between two hoistings. The easiest way out is to do what we do for the main stmt - copy instead of move. PR tree-optimization/116902 PR tree-optimization/116842 * tree-vect-stmts.cc (sort_after_uid): Remove again. (hoist_defs_of_uses): Copy defs instead of hoisting them so we can zero their UID. (vectorizable_load): Separate analysis and transform call, do transform on the stmt copy. * g++.dg/torture/pr116902.C: New testcase.

Whenever C1 and C2 are integer constants, X is of a wrapping type, and cmp is a relational operator, the expression X +- C1 cmp C2 can be simplified in the following cases: (a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial comparison in the following way: X +- C1 <= C2 -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1) -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions) -INF -+ C1 <= X <= +INF (due to (1)) -INF -+ C1 <= X (eliminate the right hand side since it holds for any X) (b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following sequence of transformations: X +- C1 >= C2 +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1) +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions) +INF -+ C1 >= X >= -INF (due to (1)) +INF -+ C1 >= X (eliminate the right hand side since it holds for any X) (c) The > and < cases are negations of (a) and (b), respectively. This transformation allows to occasionally save add / sub instructions, for instance the expression 3 + (uint32_t)f() < 2 compiles to cmn w0, #4 cset w0, ls instead of add w0, w0, 3 cmp w0, 2 cset w0, ls on aarch64. Testcases that go together with this patch have been split into two separate files, one containing testcases for unsigned variables and the other for wrapping signed ones (and thus compiled with -fwrapv). Additionally, one aarch64 test has been adjusted since the patch has caused the generated code to change from cmn w0, #2 csinc w0, w1, wzr, cc (x < -2) to cmn w0, #3 csinc w0, w1, wzr, cs (x <= -3) This patch has been bootstrapped and regtested on aarch64, x86_64, and i386, and additionally regtested on riscv32. gcc/ChangeLog: PR tree-optimization/116024 * match.pd: New transformation around integer comparison. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr116024-2.c: New test. * gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto. * gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.

PR jit/117275 reports various jit test failures seen on powerpc64le-unknown-linux-gnu due to hitting this assertion in varasm.cc on the 2nd compilation in a process: #2 0x00007ffff63e67d0 in assemble_external_libcall (fun=0x7ffff2a4b1d8) at ../../src/gcc/varasm.cc:2650 2650 gcc_assert (!pending_assemble_externals_processed); (gdb) p pending_assemble_externals_processed $1 = true We're not properly resetting state in varasm.cc after a compile for libgccjit. Fixed thusly. gcc/ChangeLog: PR jit/117275 * toplev.cc (toplev::finalize): Call varasm_cc_finalize. * varasm.cc (varasm_cc_finalize): New. * varasm.h (varasm_cc_finalize): New decl. Signed-off-by: David Malcolm <dmalcolm@redhat.com>

We currently crash upon the following invalid code (notice the "void void**" parameter) === cut here === using size_t = decltype(sizeof(int)); void *operator new(size_t, void void **p) noexcept { return p; } int x; void f() { int y; new (&y) int(x); } === cut here === The problem is that in this case, we end up with a NULL_TREE parameter list for the new operator because of the error, and (1) coerce_new_type wrongly complains about the first parameter type not being size_t, (2) std_placement_new_fn_p blindly accesses the parameter list, hence a crash. This patch does NOT address #1 since we can't easily distinguish between a new operator declaration without parameters from one with erroneous parameters (and it's not worth the risk to refactor and break things for an error recovery issue) hence a dg-bogus in new52.C, but it does address #2 and the ICE by simply checking the first parameter against NULL_TREE. It also adds a new testcase checking that we complain about new operators with no or invalid first parameters, since we did not have any. PR c++/117101 gcc/cp/ChangeLog: * init.cc (std_placement_new_fn_p): Check first_arg against NULL_TREE. gcc/testsuite/ChangeLog: * g++.dg/init/new52.C: New test. * g++.dg/init/new53.C: New test.

The second source register of insn "*extzvsi-1bit_addsubx" cannot be the same as the destination register, because that register will be overwritten with an intermediate value after insn splitting. /* example #1 */ int test1(int b, int a) { return ((a & 1024) ? 4 : 0) + b; } ;; result #1 (incorrect) test1: extui a2, a3, 10, 1 ;; overwrites A2 before used addx4 a2, a2, a2 ret.n This patch fixes that. ;; result #1 (correct) test1: extui a3, a3, 10, 1 ;; uses A3 and then overwrites addx4 a2, a3, a2 ret.n However, it should be noted that the first source register can be the same as the destination without any problems. /* example #2 */ int test2(int a, int b) { return ((a & 1024) ? 4 : 0) + b; } ;; result (correct) test2: extui a2, a2, 10, 1 ;; uses A2 and then overwrites addx4 a2, a2, a3 ret.n gcc/ChangeLog: * config/xtensa/xtensa.md (*extzvsi-1bit_addsubx): Add '&' to the destination register constraint to indicate that it is 'earlyclobber', append '0' to the first source register constraint to indicate that it can be the same as the destination register, and change the split condition from 1 to reload_completed so that the insn will be split only after RA in order to obtain allocated registers that satisfy the above constraints.

…R117557] The testcase #include <stdint.h> #include <string.h> #define N 8 #define L 8 void f(const uint8_t * restrict seq1, const uint8_t *idx, uint8_t *seq_out) { for (int i = 0; i < L; ++i) { uint8_t h = idx[i]; memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2); } } compiled at -O3 -mcpu=neoverse-n1+sve miscompiles to: ld1w z31.s, p3/z, [x23, z29.s, sxtw] ld1w z29.s, p7/z, [x23, z30.s, sxtw] st1w z29.s, p7, [x24, z12.s, sxtw] st1w z31.s, p7, [x24, z12.s, sxtw] rather than ld1w z31.s, p3/z, [x23, z29.s, sxtw] ld1w z29.s, p7/z, [x23, z30.s, sxtw] st1w z29.s, p7, [x24, z12.s, sxtw] addvl x3, x24, #2 st1w z31.s, p3, [x3, z12.s, sxtw] Where two things go wrong, the wrong mask is used and the address pointers to the stores are wrong. This issue is happening because the codegen loop in vectorizable_store is a nested loop where in the outer loop we iterate over ncopies and in the inner loop we loop over vec_num. For SLP ncopies == 1 and vec_num == SLP_NUM_STMS, but the loop mask is determined by only the outerloop index and the pointer address is only updated in the outer loop. As such for SLP we always use the same predicate and the same memory location. This patch flattens the two loops and instead iterates over ncopies * vec_num and simplified the indexing. This does not fully fix the gcc_r miscompile error in SPECCPU 2017 as the error moves somewhere else. I will look at that next but fixes some other libraries that also started failing. gcc/ChangeLog: PR tree-optimization/117557 * tree-vect-stmts.cc (vectorizable_store): Flatten the ncopies and vec_num loops. gcc/testsuite/ChangeLog: PR tree-optimization/117557 * gcc.target/aarch64/pr117557.c: New test.

This PR reports a missed optimization. When we have: Str str{"Test"}; callback(str); as in the test, we're able to evaluate the Str::Str() call at compile time. But when we have: callback(Str{"Test"}); we are not. With this patch (in fact, it's Patrick's patch with a little tweak), we turn callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr 5 __ct_comp D.2890 (struct Str *) <<< Unknown tree: void_cst >>> (const char *) "Test" >>>>) into callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>) I explored the idea of calling maybe_constant_value for the whole TARGET_EXPR in cp_fold. That has three problems: - we can't always elide a TARGET_EXPR, so we'd have to make sure the result is also a TARGET_EXPR; - the resulting TARGET_EXPR must have the same flags, otherwise Bad Things happen; - getting a new slot is also problematic. I've seen a test where we had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer D.2680, we can't replace it with D.2681, and things break. With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C. FAIL: g++.dg/tree-ssa/pr90883.C scan-tree-dump dse1 "Deleted redundant store: .*.a = {}" is easy. Previously, we would call C::C, so .gimple has: D.2590 = {}; C::C (&D.2590); D.2597 = D.2590; return D.2597; Then .einline inlines the C::C call: D.2590 = {}; D.2590.a = {}; // #1 D.2590.b = 0; // #2 D.2597 = D.2590; D.2590 ={v} {CLOBBER(eos)}; return D.2597; then #2 is removed in .fre1, and #1 is removed in .dse1. So the test passes. But with the patch, .gimple won't have that C::C call, so the IL is of course going to look different. The .optimized dump looks the same though so there's no problem. pr78687.C is XFAILed because the test passes with r15-5746 but not with r15-5747 as well. I opened <https://gcc.gnu.org/PR117971>. PR c++/116416 gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold TARGET_EXPR_INITIAL and replace it with the folded result if it's TREE_CONSTANT. gcc/testsuite/ChangeLog: * g++.dg/analyzer/pr97116.C: Adjust dg-message. * g++.dg/tree-ssa/pr78687.C: Add XFAIL. * g++.dg/tree-ssa/pr90883.C: Adjust dg-final. * g++.dg/cpp0x/constexpr-prvalue1.C: New test. * g++.dg/cpp1y/constexpr-prvalue1.C: New test. Co-authored-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jason Merrill <jason@redhat.com>

The ASRD instruction on SVE performs an arithmetic shift right by an immediate for divide. This patch enables the use of ASRD with Neon modes. For example: int in[N], out[N]; void foo (void) { for (int i = 0; i < N; i++) out[i] = in[i] / 4; } compiles to: ldr q31, [x1, x0] cmlt v30.16b, v31.16b, #0 and z30.b, z30.b, 3 add v30.16b, v30.16b, v31.16b sshr v30.16b, v30.16b, 2 str q30, [x0, x2] add x0, x0, 16 cmp x0, 1024 but can just be: ldp q30, q31, [x0], 32 asrd z31.b, p7/m, z31.b, #2 asrd z30.b, p7/m, z30.b, #2 stp q30, q31, [x1], 32 cmp x0, x2 This patch also adds the following overload: aarch64_ptrue_reg (machine_mode pred_mode, machine_mode data_mode) Depending on the data mode, the function returns a predicate with the appropriate bits set. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_ptrue_reg): New overload. * config/aarch64/aarch64-protos.h (aarch64_ptrue_reg): Likewise. * config/aarch64/aarch64-sve.md: Extended sdiv_pow2<mode>3 and *sdiv_pow2<mode>3 to support Neon modes. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/sve-asrd.c: New test. Co-authored-by: Richard Sandiford <richard.sandiford@arm.com> Signed-off-by: Soumya AR <soumyaa@nvidia.com>

This crash started with my r12-7803 but I believe the problem lies elsewhere. build_vec_init has cleanup_flags whose purpose is -- if I grok this correctly -- to avoid destructing an object multiple times. Let's say we are initializing an array of A. Then we might end up in a scenario similar to initlist-eh1.C: try { call A::A in a loop // #0 try { call a fn using the array } finally { // #1 call A::~A in a loop } } catch { // #2 call A::~A in a loop } cleanup_flags makes us emit a statement like D.3048 = 2; at #0 to disable performing the cleanup at #2, since #1 will take care of the destruction of the array. But if we are not emitting the loop because we can use a constant initializer (and use a single { a, b, ...}), we shouldn't generate the statement resetting the iterator to its initial value. Otherwise we crash in gimplify_var_or_parm_decl because it gets the stray decl D.3048. PR c++/117985 gcc/cp/ChangeLog: * init.cc (build_vec_init): Pop CLEANUP_FLAGS if we're not generating the loop. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/initlist-array23.C: New test. * g++.dg/cpp0x/initlist-array24.C: New test.

This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS tunable and use_new_vector_costs entry in aarch64-tuning-flags.def and makes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend the default. To that end, the function aarch64_use_new_vector_costs_p and its uses were removed. To prevent costing vec_to_scalar operations with 0, as described in https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665481.html, we adjusted vectorizable_store such that the variable n_adjacent_stores also covers vec_to_scalar operations. This way vec_to_scalar operations are not costed individually, but as a group. As suggested by Richard Sandiford, the "known_ne" in the multilane-check was replaced by "maybe_ne" in order to treat nunits==1+1X as a vector rather than a scalar. Two tests were adjusted due to changes in codegen. In both cases, the old code performed loop unrolling once, but the new code does not: Example from gcc.target/aarch64/sve/strided_load_2.c (compiled with -O2 -ftree-vectorize -march=armv8.2-a+sve -mtune=generic -moverride=tune=none): f_int64_t_32: cbz w3, .L92 mov x4, 0 uxtw x3, w3 + cntd x5 + whilelo p7.d, xzr, x3 + mov z29.s, w5 mov z31.s, w2 - whilelo p6.d, xzr, x3 - mov x2, x3 - index z30.s, #0, #1 - uqdecd x2 - ptrue p5.b, all - whilelo p7.d, xzr, x2 + index z30.d, #0, #1 + ptrue p6.b, all .p2align 3,,7 .L94: - ld1d z27.d, p7/z, [x0, #1, mul vl] - ld1d z28.d, p6/z, [x0] - movprfx z29, z31 - mul z29.s, p5/m, z29.s, z30.s - incw x4 - uunpklo z0.d, z29.s - uunpkhi z29.d, z29.s - ld1d z25.d, p6/z, [x1, z0.d, lsl 3] - ld1d z26.d, p7/z, [x1, z29.d, lsl 3] - add z25.d, z28.d, z25.d + ld1d z27.d, p7/z, [x0, x4, lsl 3] + movprfx z28, z31 + mul z28.s, p6/m, z28.s, z30.s + ld1d z26.d, p7/z, [x1, z28.d, uxtw 3] add z26.d, z27.d, z26.d - st1d z26.d, p7, [x0, #1, mul vl] - whilelo p7.d, x4, x2 - st1d z25.d, p6, [x0] - incw z30.s - incb x0, all, mul #2 - whilelo p6.d, x4, x3 + st1d z26.d, p7, [x0, x4, lsl 3] + add z30.s, z30.s, z29.s + incd x4 + whilelo p7.d, x4, x3 b.any .L94 .L92: ret Example from gcc.target/aarch64/sve/strided_store_2.c (compiled with -O2 -ftree-vectorize -march=armv8.2-a+sve -mtune=generic -moverride=tune=none): f_int64_t_32: cbz w3, .L84 - addvl x5, x1, #1 mov x4, 0 uxtw x3, w3 - mov z31.s, w2 + cntd x5 whilelo p7.d, xzr, x3 - mov x2, x3 - index z30.s, #0, #1 - uqdecd x2 - ptrue p5.b, all - whilelo p6.d, xzr, x2 + mov z29.s, w5 + mov z31.s, w2 + index z30.d, #0, #1 + ptrue p6.b, all .p2align 3,,7 .L86: - ld1d z28.d, p7/z, [x1, x4, lsl 3] - ld1d z27.d, p6/z, [x5, x4, lsl 3] - movprfx z29, z30 - mul z29.s, p5/m, z29.s, z31.s - add z28.d, z28.d, #1 - uunpklo z26.d, z29.s - st1d z28.d, p7, [x0, z26.d, lsl 3] - incw x4 - uunpkhi z29.d, z29.s + ld1d z27.d, p7/z, [x1, x4, lsl 3] + movprfx z28, z30 + mul z28.s, p6/m, z28.s, z31.s add z27.d, z27.d, #1 - whilelo p6.d, x4, x2 - st1d z27.d, p7, [x0, z29.d, lsl 3] - incw z30.s + st1d z27.d, p7, [x0, z28.d, uxtw 3] + incd x4 + add z30.s, z30.s, z29.s whilelo p7.d, x4, x3 b.any .L86 .L84: ret The patch was bootstrapped and tested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * tree-vect-stmts.cc (vectorizable_store): Extend the use of n_adjacent_stores to also cover vec_to_scalar operations. * config/aarch64/aarch64-tuning-flags.def: Remove use_new_vector_costs as tuning option. * config/aarch64/aarch64.cc (aarch64_use_new_vector_costs_p): Remove. (aarch64_vector_costs::add_stmt_cost): Remove use of aarch64_use_new_vector_costs_p. (aarch64_vector_costs::finish_cost): Remove use of aarch64_use_new_vector_costs_p. * config/aarch64/tuning_models/cortexx925.h: Remove AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS. * config/aarch64/tuning_models/fujitsu_monaka.h: Likewise. * config/aarch64/tuning_models/generic_armv8_a.h: Likewise. * config/aarch64/tuning_models/generic_armv9_a.h: Likewise. * config/aarch64/tuning_models/neoverse512tvb.h: Likewise. * config/aarch64/tuning_models/neoversen2.h: Likewise. * config/aarch64/tuning_models/neoversen3.h: Likewise. * config/aarch64/tuning_models/neoversev1.h: Likewise. * config/aarch64/tuning_models/neoversev2.h: Likewise. * config/aarch64/tuning_models/neoversev3.h: Likewise. * config/aarch64/tuning_models/neoversev3ae.h: Likewise. gcc/testsuite/ * gcc.target/aarch64/sve/strided_load_2.c: Adjust expected outcome. * gcc.target/aarch64/sve/strided_store_2.c: Likewise.

Update build time to utc 0am for gccefi2linux

6ec1945

nstester approved these changes Mar 12, 2021

View reviewed changes

nstester merged commit fbd52a4 into nstester:master Mar 12, 2021

nstester pushed a commit that referenced this pull request Jul 5, 2021

[Ada] Use runtime from base compiler during stage1 #2

796b616

gcc/ada/ * Make-generated.in: Add -f switch to ensure cp will never fail.

nstester pushed a commit that referenced this pull request Jul 25, 2021

[Ada] Declare time_t uniformly based on a system parameter #2

b454c40

gcc/ada/ * libgnat/s-osprim__x32.adb: Add missing with clause.

nstester pushed a commit that referenced this pull request Jan 31, 2022

[Ada] Fix up handling of ghost units PR104027 #2

2dbc237

gcc/ada/ PR ada/104027 * gnat1drv.adb (Gnat1drv): Only call Exit_Program when not generating code, otherwise instead go to End_Of_Program.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update build time to utc 0am for gccefi2linux #2

Update build time to utc 0am for gccefi2linux #2

hanzhan1 commented Mar 12, 2021

Update build time to utc 0am for gccefi2linux #2

Update build time to utc 0am for gccefi2linux #2

Conversation

hanzhan1 commented Mar 12, 2021