Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update build time to utc 0am for gccefi2linux #2

Merged

Conversation

hanzhan1
Copy link

Update build time to utc 0am for gccefi2linux nightly build.

@nstester nstester merged commit fbd52a4 into nstester:master Mar 12, 2021
nstester pushed a commit that referenced this pull request Mar 30, 2021
…f SUBS/ADDS-immediate

In this PR we end up generating an invalid instruction:
adds x1,xzr,#2

because the pattern accepts zero as an operand in the comparison, but the instruction doesn't.
Fix it by adjusting the predicate and constraints.

gcc/ChangeLog:

	PR target/99822
	* config/aarch64/aarch64.md (sub<mode>3_compare1_imm): Do not allow zero
	in operand 1.

gcc/testsuite/ChangeLog:

	PR target/99822
	* gcc.c-torture/compile/pr99822.c: New test.
nstester pushed a commit that referenced this pull request May 7, 2021
gcc/ada/

	* init.c (__gnat_raise_program_error): Fix parameter type.
	(Raise_From_Signal_Handler): Likewise and mark as no-return.
	* raise-gcc.c (__gnat_others_value): Fix type.
	(__gnat_all_others_value): Likewise.
	(__gnat_unhandled_others_value): Likewise.
	* seh_init.c (Raise_From_Signal_Handler): Fix parameter type.
	* libgnat/a-except.ads (Raise_From_Signal_Handler): Use convention C
	and new symbol name, move declaration to...
	(Raise_From_Controlled_Operation): Minor tweak.
	* libgnat/a-except.adb (Raise_From_Signal_Handler): ...here.
	* libgnat/a-exexpr.adb (bool): New C compatible boolean type.
	(Is_Handled_By_Others): Use it as return type for the function.
nstester pushed a commit that referenced this pull request Jun 14, 2021
The fixed error is:

==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900
    #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172
    #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161
    #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517
    #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686
    #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567
    gcc-mirror#5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656
    gcc-mirror#6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657
    gcc-mirror#7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667
    gcc-mirror#8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773
    gcc-mirror#9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191
    gcc-mirror#10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001
    gcc-mirror#11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154
    gcc-mirror#12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289
    gcc-mirror#13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269
    gcc-mirror#14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537
    gcc-mirror#15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482
    gcc-mirror#16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210
    gcc-mirror#17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349
    gcc-mirror#18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39
    gcc-mirror#19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332
    gcc-mirror#20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd)

0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920)
allocated by thread T0 here:
    #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102
    #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156

gcc/ChangeLog:

	* gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].
nstester pushed a commit that referenced this pull request Jul 1, 2021
This extends the type name conflict detection mechanism to variables.

gcc/c-family/
	* c-ada-spec.c (check_name): Rename into...
	(check_type_name_conflict): ...this.  Minor tweak.
	(dump_ada_function_declaration): Adjust to above renaming.
	(dump_ada_array_domains): Fix oversight.
	(dump_ada_declaration): Call check_type_name_conflict for variables.
nstester pushed a commit that referenced this pull request Jul 5, 2021
gcc/ada/

	* Make-generated.in: Add -f switch to ensure cp will never fail.
nstester pushed a commit that referenced this pull request Jul 25, 2021
gcc/ada/

	* libgnat/s-osprim__x32.adb: Add missing with clause.
nstester pushed a commit that referenced this pull request Sep 22, 2021
As observed by Jakub in comment #2 of PR 98865, the expression -(a>>63)
is optimized in GENERIC but not in GIMPLE.  Investigating further it
turns out that this is one of a few transformations performed by
fold_negate_expr in fold-const.c that aren't yet performed by match.pd.
This patch moves/duplicates them there, and should be relatively safe
as these transformations are already performed by the compiler, but
just in different passes.

This revised patch adds a Boolean simplify argument to tree-ssa-sccvn.c's
vn_nary_build_or_lookup_1 to control whether simplification should be
performed before value numbering, updating the callers, but then
avoiding simplification when constructing/value-numbering NEGATE_EXPR.
This avoids the regression of gcc.dg/tree-ssa/ssa-free-88.c, and enables
the new test case(s) to pass.

2021-09-22  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	* match.pd (negation simplifications): Implement some negation
	folding transformations from fold-const.c's fold_negate_expr.
	* tree-ssa-sccvn.c (vn_nary_build_or_lookup_1): Add a SIMPLIFY
	argument, to control whether the op should be simplified prior
	to looking up/assigning a value number.
	(vn_nary_build_or_lookup): Update call to vn_nary_build_or_lookup_1.
	(vn_nary_simplify): Likewise.
	(visit_nary_op): Likewise, but when constructing a NEGATE_EXPR
	now call vn_nary_build_or_lookup_1 disabling simplification.

gcc/testsuite/ChangeLog
	* gcc.dg/fold-negate-1.c: New test case.
nstester pushed a commit that referenced this pull request Nov 26, 2021
Fixes:

==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178
READ of size 4 at 0x00000666ca5c thread T0
    #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855
    #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916
    #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887
    #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829
    #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427
    gcc-mirror#5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346
    gcc-mirror#6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967
    gcc-mirror#7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808
    gcc-mirror#8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146

for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d.

gcc/d/ChangeLog:

	* d-attribs.cc (parse_optimize_options): Check index before
	accessing cl_options.
nstester pushed a commit that referenced this pull request Nov 29, 2021
As described in detail in the PR, in C++20 implicitly capturing 'this'
via a '=' capture default is deprecated, and in C++17 adding an explicit
'this' capture alongside a '=' capture default is diagnosed as redundant
(and is strictly speaking ill-formed).  This means it's impossible to
write, in a forward-compatible way, a C++17 lambda that has a '=' capture
default and that also captures 'this' (implicitly or explicitly):

  [=] { this; }      // #1 deprecated in C++20, OK in C++17
		     // GCC issues a -Wdeprecated warning in C++20 mode

  [=, this] { }      // #2 ill-formed in C++17, OK in C++20
		     // GCC issues an unconditional warning in C++17 mode

This patch resolves this dilemma by downgrading the warning for #2 into
a -pedantic one.  In passing, move it into the -Wc++20-extensions class
of warnings and adjust its wording accordingly.

	PR c++/100493

gcc/cp/ChangeLog:

	* parser.c (cp_parser_lambda_introducer): In C++17, don't
	diagnose a redundant 'this' capture alongside a by-copy
	capture default unless -pedantic.  Move the diagnostic into
	-Wc++20-extensions and adjust wording accordingly.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp1z/lambda-this1.C: Adjust expected diagnostics.
	* g++.dg/cpp1z/lambda-this8.C: New test.
	* g++.dg/cpp2a/lambda-this3.C: Compile with -pedantic in C++17
	to continue to diagnose redundant 'this' captures.
nstester pushed a commit that referenced this pull request Dec 30, 2021
…imize or target pragmas [PR103012]

The following testcases ICE when an optimize or target pragma
is followed by a long line (4096+ chars).
This is because on such long lines we can't use columns anymore,
but the cpp_define calls performed by c_cpp_builtins_optimize_pragma
or from the backend hooks for target pragma are done on temporary
buffers and expect to get columns from whatever line they appear on
(which happens to be the long line after optimize/target pragma),
and we run into:
 #0  fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986
 #1  0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502
 #2  0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827
 #3  0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898
 #4  0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592
 gcc-mirror#5  0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394
 gcc-mirror#6  0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601
 gcc-mirror#7  0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639
 gcc-mirror#8  0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589
 gcc-mirror#9  0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513
 gcc-mirror#10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522
 gcc-mirror#11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>)
     at ../../gcc/c-family/c-cppbuiltin.c:600
assertion that LC_RENAME doesn't happen first.

I think the right fix is emit those predefined macros upon
optimize/target pragmas with BUILTINS_LOCATION, like we already do
for those macros at the start of the TU, they don't appear in columns
of the next line after it.  Another possibility would be to force them
at the location of the pragma.

2021-12-30  Jakub Jelinek  <jakub@redhat.com>

	PR c++/103012
gcc/
	* config/i386/i386-c.c (ix86_pragma_target_parse): Perform
	cpp_define/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
	* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise.
	* config/s390/s390-c.c (s390_pragma_target_parse): Likewise.
gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform
	cpp_define_unused/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
gcc/testsuite/
	PR c++/103012
	* g++.dg/cpp/pr103012.C: New test.
	* g++.target/i386/pr103012.C: New test.
nstester pushed a commit that referenced this pull request Jan 21, 2022
This is a "canonical types differ for identical types" ICE, which started
with r11-4682.  It's a bit tricky to explain.  Consider:

  template <typename T> struct S {
    S<T> bar() noexcept(T::value);  // #1
    S<T> foo() noexcept(T::value);  // #2
  };

  template <typename T> S<T> S<T>::foo() noexcept(T::value) {}  // #3

We ICE because #3 and #2 have the same type, but their canonical types
differ: TYPE_CANONICAL (#3) == #2 but TYPE_CANONICAL (#2) == #1.

The member functions #1 and #2 have the same type.  However, since their
noexcept-specifier is deferred, when parsing them, we create a variant for
both of them, because DEFERRED_PARSE cannot be compared.  In other words,
build_cp_fntype_variant's

  tree v = TYPE_MAIN_VARIANT (type);
  for (; v; v = TYPE_NEXT_VARIANT (v))
    if (cp_check_qualified_type (v, type, type_quals, rqual, raises, late))
      return v;

will *not* find an existing variant when creating a method_type for #2, so we
have to create a new one.

But then we perform delayed parsing and call fixup_deferred_exception_variants
for #1 and #2.  f_d_e_v will replace TYPE_RAISES_EXCEPTIONS with the newly
parsed noexcept-specifier.  It also sets TYPE_CANONICAL (#2) to #1.  Both
noexcepts turned out to be the same, so now we have two equivalent variants in
the list!  I.e.,

+-----------------+      +-----------------+      +-----------------+
|      main       |      |      #2         |      |      #1         |
| S S::<T379>(S*) |----->| S S::<T37c>(S*) |----->| S S::<T37a>(S*) |----->NULL
|    -            |      |  noex(T::value) |      |  noex(T::value) |
+-----------------+      +-----------------+      +-----------------+

Then we get to #3.  As for #1 and #2, grokdeclarator calls build_memfn_type,
which ends up calling build_cp_fntype_variant, which will use the loop
above to look for an existing variant.  The first one that matches
cp_check_qualified_type will be used, so we use #2 rather than #1, and the
TYPE_CANONICAL mismatch follows.  Hopefully that makes sense.

As for the fix, I didn't think I could rewrite the method_type #2 with #1
because the type may have escaped via decltype.  So my approach is to
elide #2 from the list, so when looking for a matching variant, we always
find #1 (#2 remains live though, which admittedly sounds sort of dodgy).

	PR c++/101715

gcc/cp/ChangeLog:

	* tree.cc (fixup_deferred_exception_variants): Remove duplicate
	variants after parsing the exception specifications.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/noexcept72.C: New test.
	* g++.dg/cpp0x/noexcept73.C: New test.
nstester pushed a commit that referenced this pull request Jan 31, 2022
gcc/ada/

	PR ada/104027
	* gnat1drv.adb (Gnat1drv): Only call Exit_Program when not
	generating code, otherwise instead go to End_Of_Program.
nstester pushed a commit that referenced this pull request Feb 9, 2022
The following adjusts the earlier change to still allow an
uncritical replacement.

2022-02-09  Richard Biener  <rguenther@suse.de>

	PR middle-end/104464
	* gimple-isel.cc (gimple_expand_vec_cond_expr): Postpone
	throwing check to after unproblematic replacement.

	* gcc.dg/pr104464.c: New testcase.
nstester pushed a commit that referenced this pull request Feb 22, 2022
…04617]

On
 #define A(n) int foo1##n(void) { return 1##n; }
 #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9)
 #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9)
 #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9)
 #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9)
 E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
 B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR lto/104617
	* simple-object-elf.c (simple_object_elf_match): Fix up URL
	in comment.
	(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
	sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
	range (inclusive).
nstester pushed a commit that referenced this pull request Mar 15, 2022
Testing cc1 on pr93032-mztools-unsigned-char.c

Benchmark #1: (without patch)
  Time (mean ± σ):     338.8 ms ±  13.6 ms    [User: 323.2 ms, System: 14.2 ms]
  Range (min … max):   326.7 ms … 363.1 ms    10 runs

Benchmark #2: (with patch)
  Time (mean ± σ):     332.3 ms ±  12.8 ms    [User: 316.6 ms, System: 14.3 ms]
  Range (min … max):   322.5 ms … 357.4 ms    10 runs

Summary
  ./cc1.new ran 1.02 ± 0.06 times faster than ./cc1.old

gcc/analyzer/ChangeLog:
	* store.cc (store::store): Presize m_cluster_map.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
nstester pushed a commit that referenced this pull request Apr 13, 2022
DR 2352 changed the definitions of reference-related (so that it uses
"similar type" instead of "same type") and of reference-compatible (use
a standard conversion sequence).  That means that reference-related is
now more broad, which means that we will be binding more things directly.

The original patch for DR 2352 caused some problems, which were fixed in
r276251 by creating a "fake" ck_qual in direct_reference_binding, so
that in

  void f(int *); // #1
  void f(const int * const &); // #2
  int *x;
  int main()
  {
    f(x); // call #1
  }

we call #1.  The extra ck_qual in #2 causes compare_ics to select #1,
which is a better match for "int *" because then we don't have to do
a qualification conversion.

Let's turn to the problem in this PR.  We have

  void f(const int * const &); // #1
  void f(const int *); // #2
  int *x;
  int main()
  {
    f(x);
  }

We arrive in compare_ics to decide which one is better. The ICS for #1
looks like

    ck_ref_bind      <-    ck_qual         <-   ck_identity
  const int *const &     const int *const         int *

and the ICS for #2 is

    ck_qual     <-  ck_rvalue   <-  ck_identity
  const int *          int *           int *

We strip the reference and then comp_cv_qual_signature when comparing two
ck_quals sees that "const int *" is a proper subset of "const int *const"
and we return -1.  But that's wrong; presumably the top-level "const"
should be ignored and the call should be ambiguous.  This patch adjust
the type of the "fake" ck_qual so that this problem doesn't arise.

	PR c++/97296

gcc/cp/ChangeLog:

	* call.cc (direct_reference_binding): strip_top_quals when creating
	a ck_qual.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/ref-bind4.C: Add dg-error.
	* g++.dg/cpp0x/ref-bind8.C: New test.
nstester pushed a commit that referenced this pull request Apr 28, 2022
Here ever since r12-6022-gbb2a7f80a98de3 we stopped deeming the partial
specialization #2 to be more specialized than #1 ultimately because
dependent operator expressions now have a DEPENDENT_OPERATOR_TYPE type
instead of an empty type, and this made unify stop deducing T(2) == 1
for K during partial ordering for #1 and #2.

This minimal patch fixes this by making the relevant logic in unify
treat DEPENDENT_OPERATOR_TYPE like an empty type.

	PR c++/105425

gcc/cp/ChangeLog:

	* pt.cc (unify) <case TEMPLATE_PARM_INDEX>: Treat
	DEPENDENT_OPERATOR_TYPE like an empty type.

gcc/testsuite/ChangeLog:

	* g++.dg/template/partial-specialization13.C: New test.
nstester pushed a commit that referenced this pull request May 26, 2022
Here during cp_parser_single_declaration for #2, we were calling
associate_classtype_constraints for TPL<T> (the primary template type)
before maybe_process_partial_specialization could get a chance to
notice that we're in fact declaring a distinct constrained partial
spec and not redeclaring the primary template.  This caused us to
emit a bogus error about differing constraints b/t the primary template
and #2's constraints.  This patch fixes this by moving the call to
associate_classtype_constraints after the call to shadow_tag (which
calls maybe_process_partial_specialization) and adjusting shadow_tag to
use the return value of m_p_p_s.

Moreover, if we later try to define a constrained partial specialization
that's been declared earlier (as in the third testcase), then
maybe_new_partial_specialization correctly notices it's a redeclaration
and returns NULL_TREE.  But in this case we also need to update TYPE to
point to the redeclared partial spec (it'll otherwise continue pointing
to the primary template type, eventually leading to a bogus error).

	PR c++/96363

gcc/cp/ChangeLog:

	* decl.cc (shadow_tag): Use the return value of
	maybe_process_partial_specialization.
	* parser.cc (cp_parser_single_declaration): Call shadow_tag
	before associate_classtype_constraints.
	* pt.cc (maybe_new_partial_specialization): Change return type
	to bool.  Take 'type' argument by mutable reference.  Set 'type'
	to point to the correct constrained specialization when
	appropriate.
	(maybe_process_partial_specialization): Adjust accordingly.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/concepts-partial-spec12.C: New test.
	* g++.dg/cpp2a/concepts-partial-spec12a.C: New test.
	* g++.dg/cpp2a/concepts-partial-spec13.C: New test.
nstester pushed a commit that referenced this pull request Jun 3, 2022
As explained in r11-4959-gde6f64f9556ae3, the atom cache assumes two
equivalent expressions (according to cp_tree_equal) must use the same
template parameters (according to find_template_parameters).  This
assumption turned out to not hold for TARGET_EXPR, which was addressed
by that commit.

But this assumption apparently doesn't hold for PARM_DECL either:
find_template_parameters walks its DECL_CONTEXT but cp_tree_equal by
default doesn't consider DECL_CONTEXT unless comparing_specializations
is set.  Thus in the first testcase below, the atomic constraints of #1
and #2 are equivalent according to cp_tree_equal, but according to
find_template_parameters the former uses T and the latter uses both T
and U (surprisingly).

We could fix this assumption violation by setting comparing_specializations
in the atom_hasher, which would make cp_tree_equal return false for the
two atoms, but that seems overly pessimistic here.  Ideally the atoms
should continue being considered equivalent and we instead fix
find_template_paremeters to return just T for #2's atom.

To that end this patch makes for_each_template_parm_r stop walking the
DECL_CONTEXT of a PARM_DECL.  This should be safe to do because
tsubst_copy / tsubst_decl only substitutes the TREE_TYPE of a PARM_DECL
and doesn't bother substituting the DECL_CONTEXT, thus the only relevant
template parameters are those used in its type.  any_template_parm_r is
currently responsible for walking its TREE_TYPE, but I suppose it now makes
sense for for_each_template_parm_r to do so instead.

In passing this patch also makes for_each_template_parm_r stop walking
the DECL_CONTEXT of a VAR_/FUNCTION_DECL since doing so after walking
DECL_TI_ARGS is redundant, I think.

I experimented with not walking DECL_CONTEXT for CONST_DECL, but the
second testcase below demonstrates it's necessary to walk it.

	PR c++/105797

gcc/cp/ChangeLog:

	* pt.cc (for_each_template_parm_r) <case FUNCTION_DECL, VAR_DECL>:
	Don't walk DECL_CONTEXT.
	<case PARM_DECL>: Likewise.  Walk TREE_TYPE.
	<case CONST_DECL>: Simplify.
	(any_template_parm_r) <case PARM_DECL>: Don't walk TREE_TYPE.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp2a/concepts-decltype4.C: New test.
	* g++.dg/cpp2a/concepts-memfun3.C: New test.
nstester pushed a commit that referenced this pull request Jul 15, 2022
This patch implements C++23 P2255R2, which adds two new type traits to
detect reference binding to a temporary.  They can be used to detect code
like

  std::tuple<const std::string&> t("meow");

which is incorrect because it always creates a dangling reference, because
the std::string temporary is created inside the selected constructor of
std::tuple, and not outside it.

There are two new compiler builtins, __reference_constructs_from_temporary
and __reference_converts_from_temporary.  The former is used to simulate
direct- and the latter copy-initialization context.  But I had a hard time
finding a test where there's actually a difference.  Under DR 2267, both
of these are invalid:

  struct A { } a;
  struct B { explicit B(const A&); };
  const B &b1{a};
  const B &b2(a);

so I had to peruse [over.match.ref], and eventually realized that the
difference can be seen here:

  struct G {
    operator int(); // #1
    explicit operator int&&(); // #2
  };

int&& r1(G{}); // use #2 (no temporary)
int&& r2 = G{}; // use #1 (a temporary is created to be bound to int&&)

The implementation itself was rather straightforward because we already
have the conv_binds_ref_to_prvalue function.  The main function here is
ref_xes_from_temporary.
I've changed the return type of ref_conv_binds_directly to tristate, because
previously the function didn't distinguish between an invalid conversion and
one that binds to a prvalue.  Since it no longer returns a bool, I removed
the _p suffix.

The patch also adds the relevant class and variable templates to <type_traits>.

	PR c++/104477

gcc/c-family/ChangeLog:

	* c-common.cc (c_common_reswords): Add
	__reference_constructs_from_temporary and
	__reference_converts_from_temporary.
	* c-common.h (enum rid): Add RID_REF_CONSTRUCTS_FROM_TEMPORARY and
	RID_REF_CONVERTS_FROM_TEMPORARY.

gcc/cp/ChangeLog:

	* call.cc (ref_conv_binds_directly_p): Rename to ...
	(ref_conv_binds_directly): ... this.  Add a new bool parameter.  Change
	the return type to tristate.
	* constraint.cc (diagnose_trait_expr): Handle
	CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY.
	* cp-tree.h: Include "tristate.h".
	(enum cp_trait_kind): Add CPTK_REF_CONSTRUCTS_FROM_TEMPORARY
	and CPTK_REF_CONVERTS_FROM_TEMPORARY.
	(ref_conv_binds_directly_p): Rename to ...
	(ref_conv_binds_directly): ... this.
	(ref_xes_from_temporary): Declare.
	* cxx-pretty-print.cc (pp_cxx_trait_expression): Handle
	CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY.
	* method.cc (ref_xes_from_temporary): New.
	* parser.cc (cp_parser_primary_expression): Handle
	RID_REF_CONSTRUCTS_FROM_TEMPORARY and RID_REF_CONVERTS_FROM_TEMPORARY.
	(cp_parser_trait_expr): Likewise.
	(warn_for_range_copy): Adjust to call ref_conv_binds_directly.
	* semantics.cc (trait_expr_value): Handle
	CPTK_REF_CONSTRUCTS_FROM_TEMPORARY and CPTK_REF_CONVERTS_FROM_TEMPORARY.
	(finish_trait_expr): Likewise.

libstdc++-v3/ChangeLog:

	* include/std/type_traits (reference_constructs_from_temporary,
	reference_converts_from_temporary): New class templates.
	(reference_constructs_from_temporary_v,
	reference_converts_from_temporary_v): New variable templates.
	(__cpp_lib_reference_from_temporary): Define for C++23.
	* include/std/version (__cpp_lib_reference_from_temporary): Define for
	C++23.
	* testsuite/20_util/variable_templates_for_traits.cc: Test
	reference_constructs_from_temporary_v and
	reference_converts_from_temporary_v.
	* testsuite/20_util/reference_from_temporary/value.cc: New test.
	* testsuite/20_util/reference_from_temporary/value2.cc: New test.
	* testsuite/20_util/reference_from_temporary/version.cc: New test.

gcc/testsuite/ChangeLog:

	* g++.dg/ext/reference_constructs_from_temporary1.C: New test.
	* g++.dg/ext/reference_converts_from_temporary1.C: New test.
nstester pushed a commit that referenced this pull request Aug 3, 2022
This patch implements some additional zero-extension and sign-extension
related optimizations in simplify-rtx.cc.  The original motivation comes
from PR rtl-optimization/71775, where in comment #2 Andrew Pinksi sees:

Failed to match this instruction:
(set (reg:DI 88 [ _1 ])
    (sign_extend:DI (subreg:SI (ctz:DI (reg/v:DI 86 [ x ])) 0)))

On many platforms the result of DImode CTZ is constrained to be a
small unsigned integer (between 0 and 64), hence the truncation to
32-bits (using a SUBREG) and the following sign extension back to
64-bits are effectively a no-op, so the above should ideally (often)
be simplified to "(set (reg:DI 88) (ctz:DI (reg/v:DI 86 [ x ]))".

To implement this, and some closely related transformations, we build
upon the existing val_signbit_known_clear_p predicate.  In the first
chunk, nonzero_bits knows that FFS and ABS can't leave the sign-bit
bit set, so the simplification of of ABS (ABS (x)) and ABS (FFS (x))
can itself be simplified.  The second transformation is that we can
canonicalized SIGN_EXTEND to ZERO_EXTEND (as in the PR 71775 case above)
when the operand's sign-bit is known to be clear.  The final two chunks
are for SIGN_EXTEND of a truncating SUBREG, and ZERO_EXTEND of a
truncating SUBREG respectively.  The nonzero_bits of a truncating
SUBREG pessimistically thinks that the upper bits may have an
arbitrary value (by taking the SUBREG), so we need look deeper at the
SUBREG's operand to confirm that the high bits are known to be zero.

Unfortunately, for PR rtl-optimization/71775, ctz:DI on x86_64 with
default architecture options is undefined at zero, so we can't be sure
the upper bits of reg:DI 88 will be sign extended (all zeros or all ones).
nonzero_bits knows this, so the above transformations don't trigger,
but the transformations themselves are perfectly valid for other
operations such as FFS, POPCOUNT and PARITY, and on other targets/-march
settings where CTZ is defined at zero.

2022-08-03  Roger Sayle  <roger@nextmovesoftware.com>
	    Segher Boessenkool  <segher@kernel.crashing.org>
	    Richard Sandiford  <richard.sandiford@arm.com>

gcc/ChangeLog
	* simplify-rtx.cc (simplify_unary_operation_1) <ABS>: Add
	optimizations for CLRSB, PARITY, POPCOUNT, SS_ABS and LSHIFTRT
	that are all positive to complement the existing FFS and
	idempotent ABS simplifications.
	<SIGN_EXTEND>: Canonicalize SIGN_EXTEND to ZERO_EXTEND when
	val_signbit_known_clear_p is true of the operand.
	Simplify sign extensions of SUBREG truncations of operands
	that are already suitably (zero) extended.
	<ZERO_EXTEND>: Simplify zero extensions of SUBREG truncations
	of operands that are already suitably zero extended.
nstester pushed a commit that referenced this pull request Aug 17, 2022
In my previous patches I've been extending our std::move warnings,
but this tweak actually dials it down a little bit.  As reported in
bug 89780, it's questionable to warn about expressions in templates
that were type-dependent, but aren't anymore because we're instantiating
the template.  As in,

  template <typename T>
  Dest withMove() {
    T x;
    return std::move(x);
  }

  template Dest withMove<Dest>(); // #1
  template Dest withMove<Source>(); // #2

Saying that the std::move is pessimizing for #1 is not incorrect, but
it's not useful, because removing the std::move would then pessimize #2.
So the user can't really win.  At the same time, disabling the warning
just because we're in a template would be going too far, I still want to
warn for

  template <typename>
  Dest withMove() {
    Dest x;
    return std::move(x);
  }

because the std::move therein will be pessimizing for any instantiation.

So I'm using the suppress_warning machinery to that effect.
Problem: I had to add a new group to nowarn_spec_t, otherwise
suppressing the -Wpessimizing-move warning would disable a whole bunch
of other warnings, which we really don't want.

	PR c++/89780

gcc/cp/ChangeLog:

	* pt.cc (tsubst_copy_and_build) <case CALL_EXPR>: Maybe suppress
	-Wpessimizing-move.
	* typeck.cc (maybe_warn_pessimizing_move): Don't issue warnings
	if they are suppressed.
	(check_return_expr): Disable -Wpessimizing-move when returning
	a dependent expression.

gcc/ChangeLog:

	* diagnostic-spec.cc (nowarn_spec_t::nowarn_spec_t): Handle
	OPT_Wpessimizing_move and OPT_Wredundant_move.
	* diagnostic-spec.h (nowarn_spec_t): Add NW_REDUNDANT enumerator.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/Wpessimizing-move3.C: Remove dg-warning.
	* g++.dg/cpp0x/Wredundant-move2.C: Likewise.
nstester pushed a commit that referenced this pull request Nov 3, 2022
The eliminate reg-reg move by inverting the condition of
a cmove #2 peephole2 converts the following sequence:

  473: bx:DI=[r14:DI*0x8+r12:DI]
  960: r15:DI=r8:DI
  485: {flags:CCC=cmp(r15:DI+bx:DI,bx:DI);r15:DI=r15:DI+bx:DI;}
  737: r15:DI={(geu(flags:CCC,0))?r15:DI:bx:DI}

to:

 1110: {flags:CCC=cmp(r8:DI+bx:DI,bx:DI);r8:DI=r8:DI+bx:DI;}
 1111: r15:DI=[r14:DI*0x8+r12:DI]
 1112: r15:DI={(geu(flags:CCC,0))?r8:DI:r15:DI}

Please note that(insn 1110) uses register BX, but its
initialization was eliminated.

Avoid conversion if eliminated move intialized a register, used
in the moved instruction.

2022-11-03  Uroš Bizjak  <ubizjak@gmail.com>

gcc/ChangeLog:

	PR target/107404
	* config/i386/i386.md (eliminate reg-reg move by inverting the
	condition of a cmove #2 peephole2): Check if eliminated move
	initialized a register, used in the moved instruction.

gcc/testsuite/ChangeLog:

	PR target/107404
	* g++.target/i386/pr107404.C: New test.
nstester pushed a commit that referenced this pull request Jan 12, 2023
While looking at PR 105549, which is about fixing the ABI break
introduced in GCC 9.1 in parameter alignment with bit-fields, we
noticed that the GCC 9.1 warning is not emitted in all the cases where
it should be.  This patch fixes that and the next patch in the series
fixes the GCC 9.1 break.

We split this into two patches since patch #2 introduces a new ABI
break starting with GCC 13.1.  This way, patch #1 can be back-ported
to release branches if needed to fix the GCC 9.1 warning issue.

The main idea is to add a new global boolean that indicates whether
we're expanding the start of a function, so that aarch64_layout_arg
can emit warnings for callees as well as callers.  This removes the
need for aarch64_function_arg_boundary to warn (with its incomplete
information).  However, in the first patch there are still cases where
we emit warnings were we should not; this is fixed in patch #2 where
we can distinguish between GCC 9.1 and GCC.13.1 ABI breaks properly.

The fix in aarch64_function_arg_boundary (replacing & with &&) looks
like an oversight of a previous commit in this area which changed
'abi_break' from a boolean to an integer.

We also take the opportunity to fix the comment above
aarch64_function_arg_alignment since the value of the abi_break
parameter was changed in a previous commit, no longer matching the
description.

2022-11-28  Christophe Lyon  <christophe.lyon@arm.com>
	    Richard Sandiford  <richard.sandiford@arm.com>

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Fix
	comment.
	(aarch64_layout_arg): Factorize warning conditions.
	(aarch64_function_arg_boundary): Fix typo.
	* function.cc (currently_expanding_function_start): New variable.
	(expand_function_start): Handle
	currently_expanding_function_start.
	* function.h (currently_expanding_function_start): Declare.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/bitfield-abi-warning-align16-O2.c: New test.
	* gcc.target/aarch64/bitfield-abi-warning-align16-O2-extra.c: New
	test.
	* gcc.target/aarch64/bitfield-abi-warning-align32-O2.c: New test.
	* gcc.target/aarch64/bitfield-abi-warning-align32-O2-extra.c: New
	test.
	* gcc.target/aarch64/bitfield-abi-warning-align8-O2.c: New test.
	* gcc.target/aarch64/bitfield-abi-warning.h: New test.
	* g++.target/aarch64/bitfield-abi-warning-align16-O2.C: New test.
	* g++.target/aarch64/bitfield-abi-warning-align16-O2-extra.C: New
	test.
	* g++.target/aarch64/bitfield-abi-warning-align32-O2.C: New test.
	* g++.target/aarch64/bitfield-abi-warning-align32-O2-extra.C: New
	test.
	* g++.target/aarch64/bitfield-abi-warning-align8-O2.C: New test.
	* g++.target/aarch64/bitfield-abi-warning.h: New test.
nstester pushed a commit that referenced this pull request Feb 3, 2023
Here the ahead-of-time overload set pruning in finish_call_expr is
unintentionally returning a CALL_EXPR whose (pruned) callee is wrapped
in an ADDR_EXPR, despite the original callee not being wrapped in an
ADDR_EXPR.  This ends up causing a bogus declaration mismatch error in
the below testcase because the call to min in #1 gets expressed as a
CALL_EXPR of ADDR_EXPR of FUNCTION_DECL, whereas the level-lowered call
to min in #2 gets expressed instead as a CALL_EXPR of FUNCTION_DECL.

This patch fixes this by stripping the spurious ADDR_EXPR appropriately.
Thus the first call to min now also gets expressed as a CALL_EXPR of
FUNCTION_DECL, matching the behavior before r12-6075-g2decd2cabe5a4f.

	PR c++/107461

gcc/cp/ChangeLog:

	* semantics.cc (finish_call_expr): Strip ADDR_EXPR from
	the selected callee during overload set pruning.

gcc/testsuite/ChangeLog:

	* g++.dg/template/call9.C: New test.
nstester pushed a commit that referenced this pull request Feb 6, 2023
After r13-5684-g59e0376f607805 the (pruned) callee of a non-dependent
CALL_EXPR is a bare FUNCTION_DECL rather than ADDR_EXPR of FUNCTION_DECL.
This innocent change revealed that cp_tree_equal doesn't first check
dependence of a CALL_EXPR before treating a FUNCTION_DECL callee as a
dependent name, which leads to us incorrectly accepting the first two
testcases below and rejecting the third:

 * In the first testcase, cp_tree_equal incorrectly returns true for
   the two non-dependent CALL_EXPRs f(0) and f(0) (whose CALL_EXPR_FN
   are different FUNCTION_DECLs) which causes us to treat #2 as a
   redeclaration of #1.

 * Same issue in the second testcase, for f<int*>() and f<char>().

 * In the third testcase, cp_tree_equal incorrectly returns true for
   f<int>() and f<void(*)(int)>() which causes us to conflate the two
   dependent specializations A<decltype(f<int>()(U()))> and
   A<decltype(f<void(*)(int)>()(U()))>.

This patch fixes this by making called_fns_equal treat two callees as
dependent names only if the overall CALL_EXPRs are dependent, via a new
convenience function call_expr_dependent_name that is like dependent_name
but also checks dependence of the overall CALL_EXPR.

	PR c++/107461

gcc/cp/ChangeLog:

	* cp-tree.h (call_expr_dependent_name): Declare.
	* pt.cc (iterative_hash_template_arg) <case CALL_EXPR>: Use
	call_expr_dependent_name instead of dependent_name.
	* tree.cc (call_expr_dependent_name): Define.
	(called_fns_equal): Adjust to take two CALL_EXPRs instead of
	CALL_EXPR_FNs thereof.  Use call_expr_dependent_name instead
	of dependent_name.
	(cp_tree_equal) <case CALL_EXPR>: Adjust call to called_fns_equal.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/overload5.C: New test.
	* g++.dg/cpp0x/overload5a.C: New test.
	* g++.dg/cpp0x/overload6.C: New test.
nstester pushed a commit that referenced this pull request Apr 20, 2023
… in asm in different mode

See gcc.c-torture/execute/20030222-1.c.  Consider the code for 32-bit (e.g. BE) target:
  int i, v; long x; x = v; asm ("" : "=r" (i) : "0" (x));
We generate the following RTL with reload insns:
  1. subreg:si(x:di, 0) = 0;
  2. subreg:si(x:di, 4) = v:si;
  3. t:di = x:di, dead x;
  4. asm ("" : "=r" (subreg:si(t:di,4)) : "0" (t:di))
  5. i:si = subreg:si(t:di,4);
If we assign hard reg of x to t, dead code elimination will remove insn #2
and we will use unitialized hard reg.  So exclude the hard reg of x for t.
We could ignore this problem for non-empty asm using all x value but it is hard to
check that the asm are expanded into insn realy using x and setting r.
The old reload pass used the same approach.

gcc/ChangeLog

	* lra-constraints.cc (match_reload): Exclude some hard regs for
	multi-reg inout reload pseudos used in asm in different mode.
nstester pushed a commit that referenced this pull request Apr 23, 2023
Currently on xstormy16 SImode shifts by a single bit require two
instructions, and shifts by other non-zero integer immediate constants
require five instructions.  This patch implements the obvious optimization
that shifts by two bits can be done in four instructions, by using two
single-bit sequences.

Hence, ashift_2 was previously generated as:
        mov r7,r2 | shl r2,#2 | shl r3,#2 | shr r7,gcc-mirror#14 | or r3,r7
        ret
and with this patch we now generate:
        shl r2,#1 | rlc r3,#1 | shl r2,#1 | rlc r3,#1
        ret

2023-04-23  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/stormy16/stormy16.cc (xstormy16_output_shift): Implement
	SImode shifts by two by performing a single bit SImode shift twice.

gcc/testsuite/ChangeLog
	* gcc.target/xstormy16/shiftsi.c: New test case.
nstester pushed a commit that referenced this pull request May 2, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X##1;		\
    template <class U> struct X##2;		\
    template <class U> struct X##3;		\
    template <class U> struct X##4;		\
    template <class U> struct X#gcc-mirror#5;		\
    template <class U> struct X#gcc-mirror#6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
nstester pushed a commit that referenced this pull request May 19, 2023
…i parts [2]

[part #2 of PR/109279]

SPEC2017 deepsjeng uses large constants which currently generates less than
ideal code. This fix improves codegen for large constants which have
same low and hi parts: e.g.

	long long f(void) { return 0x0101010101010101ull; }

Before
        li      a5,0x1010000
        addi    a5,a5,0x101
        mv      a0,a5
        slli    a5,a5,32
        add     a0,a5,a0
        ret

With patch
	li	a5,0x1010000
	addi	a5,a5,0x101
	slli	a0,a5,32
	add	a0,a0,a5
	ret

This is testsuite clean.

gcc/ChangeLog:

	* config/riscv/riscv.cc (riscv_split_integer): if loval is equal
	to hival, ASHIFT the corresponding regs.

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
nstester pushed a commit that referenced this pull request May 23, 2023
This patch decreses one machine instruction from "single bit extraction
with shifting" operation, and tries to eliminate the conditional
branch if CST2_POW2 doesn't fit into signed 12 bits with the help
of ifcvt optimization.

    /* example #1 */
    int test0(int x) {
      return (x & 1048576) != 0 ? 1024 : 0;
    }
    extern int foo(void);
    int test1(void) {
      return (foo() & 1048576) != 0 ? 16777216 : 0;
    }

    ;; before
    test0:
	movi	a9, 0x400
	srai	a2, a2, 10
	and	a2, a2, a9
	ret.n
    test1:
	addi	sp, sp, -16
	s32i.n	a0, sp, 12
	call0	foo
	extui	a2, a2, 20, 1
	slli	a2, a2, 20
	beqz.n	a2, .L2
	movi.n	a2, 1
	slli	a2, a2, 24
    .L2:
	l32i.n	a0, sp, 12
	addi	sp, sp, 16
	ret.n

    ;; after
    test0:
	extui	a2, a2, 20, 1
	slli	a2, a2, 10
	ret.n
    test1:
	addi	sp, sp, -16
	s32i.n	a0, sp, 12
	call0	foo
	l32i.n	a0, sp, 12
	extui	a2, a2, 20, 1
	slli	a2, a2, 24
	addi	sp, sp, 16
	ret.n

In addition, if the left shift amount ('exact_log2(CST2_POW2)') is
between 1 through 3 and a either addition or subtraction with another
register follows, emit a ADDX[248] or SUBX[248] machine instruction
instead of separate left shift and add/subtract ones.

    /* example #2 */
    int test2(int x, int y) {
      return ((x & 1048576) != 0 ? 4 : 0) + y;
    }
    int test3(int x, int y) {
      return ((x & 2) != 0 ? 8 : 0) - y;
    }

    ;; before
    test2:
	movi.n	a9, 4
	srai	a2, a2, 18
	and	a2, a2, a9
	add.n	a2, a2, a3
	ret.n
    test3:
	movi.n	a9, 8
	slli	a2, a2, 2
	and	a2, a2, a9
	sub	a2, a2, a3
	ret.n

    ;; after
    test2:
	extui	a2, a2, 20, 1
	addx4	a2, a2, a3
	ret.n
    test3:
	extui	a2, a2, 1, 1
	subx8	a2, a2, a3
	ret.n

gcc/ChangeLog:

	* config/xtensa/predicates.md (addsub_operator): New.
	* config/xtensa/xtensa.md (*extzvsi-1bit_ashlsi3,
	*extzvsi-1bit_addsubx): New insn_and_split patterns.
	* config/xtensa/xtensa.cc (xtensa_rtx_costs):
	Add a special case about ifcvt 'noce_try_cmove()' to handle
	constant loads that do not fit into signed 12 bits in the
	patterns added above.
nstester pushed a commit that referenced this pull request Aug 4, 2023
This patch is inspired by Jakub's work on PR rtl-optimization/110717.
The bitfield example described in comment #2, looks like:

struct S { __int128 a : 69; };
unsigned type bar (struct S *p) {
  return p->a;
}

which on x86_64 with -O2 currently generates:

bar:    movzbl  8(%rdi), %ecx
        movq    (%rdi), %rax
        andl    $31, %ecx
        movq    %rcx, %rdx
        salq    $59, %rdx
        sarq    $59, %rdx
        ret

The ANDL $31 is interesting... we first extract an unsigned 69-bit bitfield
by masking/clearing the top bits of the most significant word, and then
it gets sign-extended, by left shifting and arithmetic right shifting.
Obviously, this bit-wise AND is redundant, for signed bit-fields, we don't
require these bits to be cleared, if we're about to set them appropriately.

This patch eliminates this redundancy in the middle-end, during RTL
expansion, but extending the extract_bit_field APIs so that the integer
UNSIGNEDP argument takes a special value; 0 indicates the field should
be sign extended, 1 (any non-zero value) indicates the field should be
zero extended, but -1 indicates a third option, that we don't care how
or whether the field is extended.  By passing and checking this sentinel
value at the appropriate places we avoid the useless bit masking (on
all targets).

For the test case above, with this patch we now generate:

bar:    movzbl  8(%rdi), %ecx
        movq    (%rdi), %rax
        movq    %rcx, %rdx
        salq    $59, %rdx
        sarq    $59, %rdx
        ret

2023-08-04  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* expmed.cc (extract_bit_field_1): Document that an UNSIGNEDP
	value of -1 is equivalent to don't care.
	(extract_integral_bit_field): Indicate that we don't require
	the most significant word to be zero extended, if we're about
	to sign extend it.
	(extract_fixed_bit_field_1): Document that an UNSIGNEDP value
	of -1 is equivalent to don't care.  Don't clear the most
	significant bits with AND mask when UNSIGNEDP is -1.

gcc/testsuite/ChangeLog
	* gcc.target/i386/pr110717-2.c: New test case.
nstester pushed a commit that referenced this pull request Sep 18, 2023
Given below example for VLS mode

void
test (vl_t *u)
{
  vl_t t;
  long long *p = (long long *)&t;

  p[0] = p[1] = 2;

  *u = t;
}

The vec_set will simplify the insn to vmv.s.x when index is 0, without
merged operand. That will result in some problems in DCE, aka:

1:  137[DI] = a0
2:  138[V2DI] = 134[V2DI]                              // deleted by DCE
3:  139[DI] = #2                                       // deleted by DCE
4:  140[DI] = #2                                       // deleted by DCE
5:  141[V2DI] = vec_dup:V2DI (139[DI])                 // deleted by DCE
6:  138[V2DI] = vslideup_imm (138[V2DI], 141[V2DI], 1) // deleted by DCE
7:  135[V2DI] = 138[V2DI]                              // deleted by DCE
8:  142[V2DI] = 135[V2DI]                              // deleted by DCE
9:  143[DI] = #2
10: 142[V2DI] = vec_dup:V2DI (143[DI])
11: (137[DI]) = 142[V2DI]

The higher 64 bits of 142[V2DI] is unknown here and it generated incorrect
code when store back to memory. This patch would like to fix this issue
by adding a new SCALAR_MOVE_MERGED_OP for vec_set.

Please note this patch doesn't enable VLS for vec_set, the underlying
patches will support this soon.

gcc/ChangeLog:

	* config/riscv/autovec.md: Bugfix.
	* config/riscv/riscv-protos.h (SCALAR_MOVE_MERGED_OP): New enum.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/base/scalar-move-merged-run-1.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
nstester pushed a commit that referenced this pull request Nov 9, 2023
Improve stack protector patterns and peephole2s even more:

a. Use unrelated register clears with integer mode size <= word
   mode size to clear stack protector scratch register.

b. Use unrelated register initializations in front of stack
   protector sequence to clear stack protector scratch register.

c. Use unrelated register initializations using LEA instructions
   to clear stack protector scratch register.

These stack protector improvements reuse 6914 unrelated register
initializations to substitute the clear of stack protector scratch
register in 12034 instances of stack protector sequence in recent linux
defconfig build.

gcc/ChangeLog:

	* config/i386/i386.md (@stack_protect_set_1_<PTR:mode>_<W:mode>):
	Use W mode iterator instead of SWI48.  Output MOV instead of XOR
	for TARGET_USE_MOV0.
	(stack_protect_set_1 peephole2): Use integer modes with
	mode size <= word mode size for operand 3.
	(stack_protect_set_1 peephole2 #2): New peephole2 pattern to
	substitute stack protector scratch register clear with unrelated
	register initialization, originally in front of stack
	protector sequence.
	(*stack_protect_set_3_<PTR:mode>_<SWI48:mode>): New insn pattern.
	(stack_protect_set_1 peephole2): New peephole2 pattern to
	substitute stack protector scratch register clear with unrelated
	register initialization involving LEA instruction.
nstester pushed a commit that referenced this pull request Nov 10, 2023
Use unrelated register initializations using zero/sign-extend instructions
to clear stack protector scratch register.

Hanlde only SI -> DImode extensions for 64-bit targets, as this is the
only extension that triggers the peephole in a non-negligible number.

Also use explicit check for word_mode instead of mode iterator in peephole2
patterns to avoid pattern explosion.

gcc/ChangeLog:

	* config/i386/i386.md (stack_protect_set_1 peephole2):
	Explicitly check operand 2 for word_mode.
	(stack_protect_set_1 peephole2 #2): Ditto.
	(stack_protect_set_2 peephole2): Ditto.
	(stack_protect_set_3 peephole2): Ditto.
	(*stack_protect_set_4z_<mode>_di): New insn patter.
	(*stack_protect_set_4s_<mode>_di): Ditto.
	(stack_protect_set_4 peephole2): New peephole2 pattern to
	substitute stack protector scratch register clear with unrelated
	register initialization involving zero/sign-extend instruction.
nstester pushed a commit that referenced this pull request Dec 11, 2023
Since the last import from upstream libsanitizer, the output has changed
and now looks more like this:

READ of size 6 at 0x7ff7beb2a144 thread T0
    #0 0x101cf7796 in MemcmpInterceptorCommon(void*, int (*)(void const*, void const*, unsigned long), void const*, void const*, unsigned long) sanitizer_common_interceptors.inc:813
    #1 0x101cf7b99 in memcmp sanitizer_common_interceptors.inc:840
    #2 0x108a0c39f in __stack_chk_guard+0xf (dyld:x86_64+0x8039f)

so let's adjust the pattern accordingly.

gcc/testsuite/ChangeLog:

	* c-c++-common/asan/memcmp-1.c: Adjust pattern on darwin.
nstester pushed a commit that referenced this pull request Dec 11, 2023
…-int (PR target/112413)

On m68k the compiler assumes that the PC-relative jump-via-jump-table
instruction and the jump table are adjacent with no padding in between.

When -mlong-jump-table-offsets is combined with -malign-int, a 2-byte
nop may be inserted before the jump table, causing the jump to add the
fetched offset to the wrong PC base and thus jump to the wrong address.

Fixed by referencing the jump table via its label. On the test case
in the PR the object code change is (the moveal at 16 is the nop):

    a:  6536            bcss 42 <f+0x42>
    c:  e588            lsll #2,%d0
    e:  203b 0808       movel %pc@(18 <f+0x18>,%d0:l),%d0
-  12:  4efb 0802       jmp %pc@(16 <f+0x16>,%d0:l)
+  12:  4efb 0804       jmp %pc@(18 <f+0x18>,%d0:l)
   16:  284c            moveal %a4,%a4
   18:  0000 0020       orib gcc-mirror#32,%d0
   1c:  0000 002c       orib gcc-mirror#44,%d0

Bootstrapped and tested on m68k-linux-gnu, no regressions.

Note: I don't have commit rights to I would need assistance applying this.

	PR target/112413
gcc/

	* config/m68k/linux.h (ASM_RETURN_CASE_JUMP): For
	TARGET_LONG_JUMP_TABLE_OFFSETS, reference the jump table
	via its label.
	* config/m68k/m68kelf.h (ASM_RETURN_CASE_JUMP): Likewise.
	* config/m68k/netbsd-elf.h (ASM_RETURN_CASE_JUMP): Likewise.
nstester pushed a commit that referenced this pull request Dec 19, 2023
During partial ordering, we want to look through dependent alias
template specializations within template arguments and otherwise
treat them as opaque in other contexts (see e.g. r7-7116-g0c942f3edab108
and r11-7011-g6e0a231a4aa240).  To that end template_args_equal was
given a partial_order flag that controls this behavior.  This flag
does the right thing when a dependent alias template specialization
appears as template argument of the partial specialization, e.g. in

  template<class T, class...> using first_t = T;
  template<class T> struct traits;
  template<class T> struct traits<first_t<T, T&>> { }; // #1
  template<class T> struct traits<first_t<const T, T&>> { }; // #2

we correctly consider #2 to be more specialized than #1.  But if the
alias specialization appears as a nested template argument of another
class template specialization, e.g. in

  template<class T> struct traits<A<first_t<T, T&>>> { }; // #1
  template<class T> struct traits<A<first_t<const T, T&>>> { }; // #2

then we incorrectly consider #1 and #2 to be unordered.  This is because

  1. we don't propagate the flag to recursive template_args_equal calls
  2. we don't use structural equality for class template specializations
     written in terms of dependent alias template specializations

This patch fixes the first issue by turning the partial_order flag into
a global.  This patch fixes the second issue by making us propagate
structural equality appropriately when building a class template
specialization.  In passing this patch also improves hashing of
specializations that use structural equality.

	PR c++/90679

gcc/cp/ChangeLog:

	* cp-tree.h (comp_template_args): Remove partial_order parameter.
	(template_args_equal): Likewise.
	* pt.cc (comparing_for_partial_ordering): New global flag.
	(iterative_hash_template_arg) <case tcc_type>: Hash the template
	and arguments for specializations that use structural equality.
	(template_args_equal): Remove partial order parameter and
	use comparing_for_partial_ordering instead.
	(comp_template_args): Likewise.
	(comp_template_args_porder): Set comparing_for_partial_ordering
	instead.  Make static.
	(any_template_arguments_need_structural_equality_p): Return true
	for an argument that's a dependent alias template specialization
	or a class template specialization that itself needs structural
	equality.
	* tree.cc (cp_tree_equal) <case TREE_VEC>: Adjust call to
	comp_template_args.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/alias-decl-75a.C: New test.
	* g++.dg/cpp0x/alias-decl-75b.C: New test.
sys-ceuplift pushed a commit that referenced this pull request Feb 4, 2024
This patch adjusts the costs so that we treat REG and SUBREG expressions the
same for costing.

This was motivated by bt_skip_func and bt_find_func in xz and results in nearly
a 5% improvement in the dynamic instruction count for input #2 and smaller, but
definitely visible improvements pretty much across the board.  Exceptions would
be perlbench input #1 and exchange2 which showed very small regressions.

In the bt_find_func and bt_skip_func cases we have  something like this:

> (insn 10 7 11 2 (set (reg/v:DI 136 [ x ])
>         (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip}
>      (nil))
> (insn 11 10 12 2 (set (reg:DI 142 [ _1 ])
>         (plus:DI (reg/v:DI 136 [ x ])
>             (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3}
>      (nil))

[ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ])
>         (plus:DI (reg/v:DI 136 [ x ])
>             (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3}
>      (nil))

Note the two uses of (reg 136). The best way to handle that in combine might be
a 3->2 split.  But there's a much better approach if we look at fwprop...

(set (reg:DI 142 [ _1 ])
    (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))
        (reg/v:DI 139 [ b ])))
change not profitable (cost 4 -> cost 8)

So that should be the same cost as a regular DImode addition when the ZBA
extension is enabled.  But it ends up costing more because the clause to cost
this variant isn't prepared to handle a SUBREG.  That results in the RTL above
having too high a cost and fwprop gives up.

One approach would be to replace the REG_P  with REG_P || SUBREG_P in the
costing code.  I ultimately decided against that and instead check if the
operand in question passes register_operand.

By far the most important case to handle is the DImode PLUS.  But for the sake
of consistency, I changed the other instances in riscv_rtx_costs as well.  For
those other cases we're talking about improvements in the .000001% range.

While we are into stage4, this just hits cost modeling which we've generally
agreed is still appropriate (though we were mostly talking about vector).  So
I'm going to extend that general agreement ever so slightly and include scalar
cost modeling :-)

gcc/
	* config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG
	similarly.

gcc/testsuite/

	* gcc.target/riscv/reg_subreg_costs.c: New test.

	Co-authored-by: Jivan Hakobyan <jivanhakobyan9@gmail.com>
sys-ceuplift pushed a commit that referenced this pull request Sep 7, 2024
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.
sys-ceuplift pushed a commit that referenced this pull request Oct 1, 2024
This is another case of load hoisting breaking UID order in the
preheader, this time between two hoistings.  The easiest way out is
to do what we do for the main stmt - copy instead of move.

	PR tree-optimization/116902
	PR tree-optimization/116842
	* tree-vect-stmts.cc (sort_after_uid): Remove again.
	(hoist_defs_of_uses): Copy defs instead of hoisting them so
	we can zero their UID.
	(vectorizable_load): Separate analysis and transform call,
	do transform on the stmt copy.

	* g++.dg/torture/pr116902.C: New testcase.
sys-ceuplift pushed a commit that referenced this pull request Oct 9, 2024
Whenever C1 and C2 are integer constants, X is of a wrapping type, and
cmp is a relational operator, the expression X +- C1 cmp C2 can be
simplified in the following cases:

(a) If cmp is <= and C2 -+ C1 == +INF(1), we can transform the initial
comparison in the following way:
   X +- C1 <= C2
   -INF <= X +- C1 <= C2 (add left hand side which holds for any X, C1)
   -INF -+ C1 <= X <= C2 -+ C1 (add -+C1 to all 3 expressions)
   -INF -+ C1 <= X <= +INF (due to (1))
   -INF -+ C1 <= X (eliminate the right hand side since it holds for any X)

(b) By analogy, if cmp if >= and C2 -+ C1 == -INF(1), use the following
sequence of transformations:

   X +- C1 >= C2
   +INF >= X +- C1 >= C2 (add left hand side which holds for any X, C1)
   +INF -+ C1 >= X >= C2 -+ C1 (add -+C1 to all 3 expressions)
   +INF -+ C1 >= X >= -INF (due to (1))
   +INF -+ C1 >= X (eliminate the right hand side since it holds for any X)

(c) The > and < cases are negations of (a) and (b), respectively.

This transformation allows to occasionally save add / sub instructions,
for instance the expression

3 + (uint32_t)f() < 2

compiles to

cmn     w0, #4
cset    w0, ls

instead of

add     w0, w0, 3
cmp     w0, 2
cset    w0, ls

on aarch64.

Testcases that go together with this patch have been split into two
separate files, one containing testcases for unsigned variables and the
other for wrapping signed ones (and thus compiled with -fwrapv).
Additionally, one aarch64 test has been adjusted since the patch has
caused the generated code to change from

cmn     w0, #2
csinc   w0, w1, wzr, cc   (x < -2)

to

cmn     w0, #3
csinc   w0, w1, wzr, cs   (x <= -3)

This patch has been bootstrapped and regtested on aarch64, x86_64, and
i386, and additionally regtested on riscv32.

gcc/ChangeLog:

	PR tree-optimization/116024
	* match.pd: New transformation around integer comparison.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/pr116024-2.c: New test.
	* gcc.dg/tree-ssa/pr116024-2-fwrapv.c: Ditto.
	* gcc.target/aarch64/gtu_to_ltu_cmp_1.c: Adjust.
sys-ceuplift pushed a commit that referenced this pull request Oct 23, 2024
PR jit/117275 reports various jit test failures seen on
powerpc64le-unknown-linux-gnu due to hitting this assertion
in varasm.cc on the 2nd compilation in a process:

#2  0x00007ffff63e67d0 in assemble_external_libcall (fun=0x7ffff2a4b1d8)
    at ../../src/gcc/varasm.cc:2650
2650          gcc_assert (!pending_assemble_externals_processed);
(gdb) p pending_assemble_externals_processed
$1 = true

We're not properly resetting state in varasm.cc after a compile
for libgccjit.

Fixed thusly.

gcc/ChangeLog:
	PR jit/117275
	* toplev.cc (toplev::finalize): Call varasm_cc_finalize.
	* varasm.cc (varasm_cc_finalize): New.
	* varasm.h (varasm_cc_finalize): New decl.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>
sys-ceuplift pushed a commit that referenced this pull request Nov 5, 2024
We currently crash upon the following invalid code (notice the "void
void**" parameter)

=== cut here ===
using size_t = decltype(sizeof(int));
void *operator new(size_t, void void **p) noexcept { return p; }
int x;
void f() {
    int y;
    new (&y) int(x);
}
=== cut here ===

The problem is that in this case, we end up with a NULL_TREE parameter
list for the new operator because of the error, and (1) coerce_new_type
wrongly complains about the first parameter type not being size_t,
(2) std_placement_new_fn_p blindly accesses the parameter list, hence a
crash.

This patch does NOT address #1 since we can't easily distinguish between
a new operator declaration without parameters from one with erroneous
parameters (and it's not worth the risk to refactor and break things for
an error recovery issue) hence a dg-bogus in new52.C, but it does
address #2 and the ICE by simply checking the first parameter against
NULL_TREE.

It also adds a new testcase checking that we complain about new
operators with no or invalid first parameters, since we did not have
any.

	PR c++/117101

gcc/cp/ChangeLog:

	* init.cc (std_placement_new_fn_p): Check first_arg against
	NULL_TREE.

gcc/testsuite/ChangeLog:

	* g++.dg/init/new52.C: New test.
	* g++.dg/init/new53.C: New test.
sys-ceuplift pushed a commit that referenced this pull request Nov 11, 2024
The second source register of insn "*extzvsi-1bit_addsubx" cannot be the
same as the destination register, because that register will be overwritten
with an intermediate value after insn splitting.

     /* example #1 */
     int test1(int b, int a) {
       return ((a & 1024) ? 4 : 0) + b;
     }

     ;; result #1 (incorrect)
     test1:
     	extui	a2, a3, 10, 1	;; overwrites A2 before used
     	addx4	a2, a2, a2
     	ret.n

This patch fixes that.

     ;; result #1 (correct)
     test1:
     	extui	a3, a3, 10, 1	;; uses A3 and then overwrites
     	addx4	a2, a3, a2
     	ret.n

However, it should be noted that the first source register can be the same
as the destination without any problems.

     /* example #2 */
     int test2(int a, int b) {
       return ((a & 1024) ? 4 : 0) + b;
     }

     ;; result (correct)
     test2:
     	extui	a2, a2, 10, 1	;; uses A2 and then overwrites
     	addx4	a2, a2, a3
     	ret.n

gcc/ChangeLog:

	* config/xtensa/xtensa.md (*extzvsi-1bit_addsubx):
	Add '&' to the destination register constraint to indicate that
	it is 'earlyclobber', append '0' to the first source register
	constraint to indicate that it can be the same as the destination
	register, and change the split condition from 1 to reload_completed
	so that the insn will be split only after RA in order to obtain
	allocated registers that satisfy the above constraints.
sys-ceuplift pushed a commit that referenced this pull request Nov 28, 2024
…R117557]

The testcase

#include <stdint.h>
#include <string.h>

#define N 8
#define L 8

void f(const uint8_t * restrict seq1,
       const uint8_t *idx, uint8_t *seq_out) {
  for (int i = 0; i < L; ++i) {
    uint8_t h = idx[i];
    memcpy((void *)&seq_out[i * N], (const void *)&seq1[h * N / 2], N / 2);
  }
}

compiled at -O3 -mcpu=neoverse-n1+sve

miscompiles to:

    ld1w    z31.s, p3/z, [x23, z29.s, sxtw]
    ld1w    z29.s, p7/z, [x23, z30.s, sxtw]
    st1w    z29.s, p7, [x24, z12.s, sxtw]
    st1w    z31.s, p7, [x24, z12.s, sxtw]

rather than

    ld1w    z31.s, p3/z, [x23, z29.s, sxtw]
    ld1w    z29.s, p7/z, [x23, z30.s, sxtw]
    st1w    z29.s, p7, [x24, z12.s, sxtw]
    addvl   x3, x24, #2
    st1w    z31.s, p3, [x3, z12.s, sxtw]

Where two things go wrong, the wrong mask is used and the address pointers to
the stores are wrong.

This issue is happening because the codegen loop in vectorizable_store is a
nested loop where in the outer loop we iterate over ncopies and in the inner
loop we loop over vec_num.

For SLP ncopies == 1 and vec_num == SLP_NUM_STMS, but the loop mask is
determined by only the outerloop index and the pointer address is only updated
in the outer loop.

As such for SLP we always use the same predicate and the same memory location.
This patch flattens the two loops and instead iterates over ncopies * vec_num
and simplified the indexing.

This does not fully fix the gcc_r miscompile error in SPECCPU 2017 as the error
moves somewhere else.  I will look at that next but fixes some other libraries
that also started failing.

gcc/ChangeLog:

	PR tree-optimization/117557
	* tree-vect-stmts.cc (vectorizable_store): Flatten the ncopies and
	vec_num loops.

gcc/testsuite/ChangeLog:

	PR tree-optimization/117557
	* gcc.target/aarch64/pr117557.c: New test.
sys-ceuplift pushed a commit that referenced this pull request Dec 9, 2024
This PR reports a missed optimization.  When we have:

  Str str{"Test"};
  callback(str);

as in the test, we're able to evaluate the Str::Str() call at compile
time.  But when we have:

  callback(Str{"Test"});

we are not.  With this patch (in fact, it's Patrick's patch with a little
tweak), we turn

  callback (TARGET_EXPR <D.2890, <<< Unknown tree: aggr_init_expr
    5
    __ct_comp
    D.2890
    (struct Str *) <<< Unknown tree: void_cst >>>
    (const char *) "Test" >>>>)

into

  callback (TARGET_EXPR <D.2890, {.str=(const char *) "Test", .length=4}>)

I explored the idea of calling maybe_constant_value for the whole
TARGET_EXPR in cp_fold.  That has three problems:
- we can't always elide a TARGET_EXPR, so we'd have to make sure the
  result is also a TARGET_EXPR;
- the resulting TARGET_EXPR must have the same flags, otherwise Bad
  Things happen;
- getting a new slot is also problematic.  I've seen a test where we
  had "TARGET_EXPR<D.2680, ...>, D.2680", and folding the whole TARGET_EXPR
  would get us "TARGET_EXPR<D.2681, ...>", but since we don't see the outer
  D.2680, we can't replace it with D.2681, and things break.

With this patch, two tree-ssa tests regressed: pr78687.C and pr90883.C.

FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted redundant store: .*.a = {}"
is easy.  Previously, we would call C::C, so .gimple has:

  D.2590 = {};
  C::C (&D.2590);
  D.2597 = D.2590;
  return D.2597;

Then .einline inlines the C::C call:

  D.2590 = {};
  D.2590.a = {}; // #1
  D.2590.b = 0;  // #2
  D.2597 = D.2590;
  D.2590 ={v} {CLOBBER(eos)};
  return D.2597;

then #2 is removed in .fre1, and #1 is removed in .dse1.  So the test
passes.  But with the patch, .gimple won't have that C::C call, so the
IL is of course going to look different.  The .optimized dump looks the
same though so there's no problem.

pr78687.C is XFAILed because the test passes with r15-5746 but not with
r15-5747 as well.  I opened <https://gcc.gnu.org/PR117971>.

	PR c++/116416

gcc/cp/ChangeLog:

	* cp-gimplify.cc (cp_fold_r) <case TARGET_EXPR>: Try to fold
	TARGET_EXPR_INITIAL and replace it with the folded result if
	it's TREE_CONSTANT.

gcc/testsuite/ChangeLog:

	* g++.dg/analyzer/pr97116.C: Adjust dg-message.
	* g++.dg/tree-ssa/pr78687.C: Add XFAIL.
	* g++.dg/tree-ssa/pr90883.C: Adjust dg-final.
	* g++.dg/cpp0x/constexpr-prvalue1.C: New test.
	* g++.dg/cpp1y/constexpr-prvalue1.C: New test.

Co-authored-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
sys-ceuplift pushed a commit that referenced this pull request Dec 11, 2024
The ASRD instruction on SVE performs an arithmetic shift right by an immediate
for divide.

This patch enables the use of ASRD with Neon modes.

For example:

int in[N], out[N];

void
foo (void)
{
  for (int i = 0; i < N; i++)
    out[i] = in[i] / 4;
}

compiles to:

	ldr	q31, [x1, x0]
	cmlt	v30.16b, v31.16b, #0
	and	z30.b, z30.b, 3
	add	v30.16b, v30.16b, v31.16b
	sshr	v30.16b, v30.16b, 2
	str	q30, [x0, x2]
	add	x0, x0, 16
	cmp	x0, 1024

but can just be:

	ldp	q30, q31, [x0], 32
	asrd	z31.b, p7/m, z31.b, #2
	asrd	z30.b, p7/m, z30.b, #2
	stp	q30, q31, [x1], 32
	cmp	x0, x2

This patch also adds the following overload:
	aarch64_ptrue_reg (machine_mode pred_mode, machine_mode data_mode)
Depending on the data mode, the function returns a predicate with the
appropriate bits set.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_ptrue_reg): New overload.
	* config/aarch64/aarch64-protos.h (aarch64_ptrue_reg): Likewise.
	* config/aarch64/aarch64-sve.md: Extended sdiv_pow2<mode>3
	and *sdiv_pow2<mode>3 to support Neon modes.

gcc/testsuite/ChangeLog:

	* gcc.target/aarch64/sve/sve-asrd.c: New test.

Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
Signed-off-by: Soumya AR <soumyaa@nvidia.com>
sys-ceuplift pushed a commit that referenced this pull request Dec 17, 2024
This crash started with my r12-7803 but I believe the problem lies
elsewhere.

build_vec_init has cleanup_flags whose purpose is -- if I grok this
correctly -- to avoid destructing an object multiple times.  Let's
say we are initializing an array of A.  Then we might end up in
a scenario similar to initlist-eh1.C:

  try
    {
      call A::A in a loop
      // #0
      try
        {
	  call a fn using the array
	}
      finally
	{
	  // #1
	  call A::~A in a loop
	}
    }
  catch
    {
      // #2
      call A::~A in a loop
    }

cleanup_flags makes us emit a statement like

  D.3048 = 2;

at #0 to disable performing the cleanup at #2, since #1 will take
care of the destruction of the array.

But if we are not emitting the loop because we can use a constant
initializer (and use a single { a, b, ...}), we shouldn't generate
the statement resetting the iterator to its initial value.  Otherwise
we crash in gimplify_var_or_parm_decl because it gets the stray decl
D.3048.

	PR c++/117985

gcc/cp/ChangeLog:

	* init.cc (build_vec_init): Pop CLEANUP_FLAGS if we're not
	generating the loop.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/initlist-array23.C: New test.
	* g++.dg/cpp0x/initlist-array24.C: New test.
sys-ceuplift pushed a commit that referenced this pull request Jan 7, 2025
This patch removes the AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS tunable and
use_new_vector_costs entry in aarch64-tuning-flags.def and makes the
AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS paths in the backend the
default. To that end, the function aarch64_use_new_vector_costs_p and its uses
were removed. To prevent costing vec_to_scalar operations with 0, as
described in
https://gcc.gnu.org/pipermail/gcc-patches/2024-October/665481.html,
we adjusted vectorizable_store such that the variable n_adjacent_stores
also covers vec_to_scalar operations. This way vec_to_scalar operations
are not costed individually, but as a group.
As suggested by Richard Sandiford, the "known_ne" in the multilane-check
was replaced by "maybe_ne" in order to treat nunits==1+1X as a vector
rather than a scalar.

Two tests were adjusted due to changes in codegen. In both cases, the
old code performed loop unrolling once, but the new code does not:
Example from gcc.target/aarch64/sve/strided_load_2.c (compiled with
-O2 -ftree-vectorize -march=armv8.2-a+sve -mtune=generic -moverride=tune=none):
f_int64_t_32:
        cbz     w3, .L92
        mov     x4, 0
        uxtw    x3, w3
+       cntd    x5
+       whilelo p7.d, xzr, x3
+       mov     z29.s, w5
        mov     z31.s, w2
-       whilelo p6.d, xzr, x3
-       mov     x2, x3
-       index   z30.s, #0, #1
-       uqdecd  x2
-       ptrue   p5.b, all
-       whilelo p7.d, xzr, x2
+       index   z30.d, #0, #1
+       ptrue   p6.b, all
        .p2align 3,,7
 .L94:
-       ld1d    z27.d, p7/z, [x0, #1, mul vl]
-       ld1d    z28.d, p6/z, [x0]
-       movprfx z29, z31
-       mul     z29.s, p5/m, z29.s, z30.s
-       incw    x4
-       uunpklo z0.d, z29.s
-       uunpkhi z29.d, z29.s
-       ld1d    z25.d, p6/z, [x1, z0.d, lsl 3]
-       ld1d    z26.d, p7/z, [x1, z29.d, lsl 3]
-       add     z25.d, z28.d, z25.d
+       ld1d    z27.d, p7/z, [x0, x4, lsl 3]
+       movprfx z28, z31
+       mul     z28.s, p6/m, z28.s, z30.s
+       ld1d    z26.d, p7/z, [x1, z28.d, uxtw 3]
        add     z26.d, z27.d, z26.d
-       st1d    z26.d, p7, [x0, #1, mul vl]
-       whilelo p7.d, x4, x2
-       st1d    z25.d, p6, [x0]
-       incw    z30.s
-       incb    x0, all, mul #2
-       whilelo p6.d, x4, x3
+       st1d    z26.d, p7, [x0, x4, lsl 3]
+       add     z30.s, z30.s, z29.s
+       incd    x4
+       whilelo p7.d, x4, x3
        b.any   .L94
 .L92:
        ret

Example from gcc.target/aarch64/sve/strided_store_2.c (compiled with
-O2 -ftree-vectorize -march=armv8.2-a+sve -mtune=generic -moverride=tune=none):
f_int64_t_32:
        cbz     w3, .L84
-       addvl   x5, x1, #1
        mov     x4, 0
        uxtw    x3, w3
-       mov     z31.s, w2
+       cntd    x5
        whilelo p7.d, xzr, x3
-       mov     x2, x3
-       index   z30.s, #0, #1
-       uqdecd  x2
-       ptrue   p5.b, all
-       whilelo p6.d, xzr, x2
+       mov     z29.s, w5
+       mov     z31.s, w2
+       index   z30.d, #0, #1
+       ptrue   p6.b, all
        .p2align 3,,7
 .L86:
-       ld1d    z28.d, p7/z, [x1, x4, lsl 3]
-       ld1d    z27.d, p6/z, [x5, x4, lsl 3]
-       movprfx z29, z30
-       mul     z29.s, p5/m, z29.s, z31.s
-       add     z28.d, z28.d, #1
-       uunpklo z26.d, z29.s
-       st1d    z28.d, p7, [x0, z26.d, lsl 3]
-       incw    x4
-       uunpkhi z29.d, z29.s
+       ld1d    z27.d, p7/z, [x1, x4, lsl 3]
+       movprfx z28, z30
+       mul     z28.s, p6/m, z28.s, z31.s
        add     z27.d, z27.d, #1
-       whilelo p6.d, x4, x2
-       st1d    z27.d, p7, [x0, z29.d, lsl 3]
-       incw    z30.s
+       st1d    z27.d, p7, [x0, z28.d, uxtw 3]
+       incd    x4
+       add     z30.s, z30.s, z29.s
        whilelo p7.d, x4, x3
        b.any   .L86
 .L84:
	ret

The patch was bootstrapped and tested on aarch64-linux-gnu, no
regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* tree-vect-stmts.cc (vectorizable_store): Extend the use of
	n_adjacent_stores to also cover vec_to_scalar operations.
	* config/aarch64/aarch64-tuning-flags.def: Remove
	use_new_vector_costs as tuning option.
	* config/aarch64/aarch64.cc (aarch64_use_new_vector_costs_p):
	Remove.
	(aarch64_vector_costs::add_stmt_cost): Remove use of
	aarch64_use_new_vector_costs_p.
	(aarch64_vector_costs::finish_cost): Remove use of
	aarch64_use_new_vector_costs_p.
	* config/aarch64/tuning_models/cortexx925.h: Remove
	AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS.
	* config/aarch64/tuning_models/fujitsu_monaka.h: Likewise.
	* config/aarch64/tuning_models/generic_armv8_a.h: Likewise.
	* config/aarch64/tuning_models/generic_armv9_a.h: Likewise.
	* config/aarch64/tuning_models/neoverse512tvb.h: Likewise.
	* config/aarch64/tuning_models/neoversen2.h: Likewise.
	* config/aarch64/tuning_models/neoversen3.h: Likewise.
	* config/aarch64/tuning_models/neoversev1.h: Likewise.
	* config/aarch64/tuning_models/neoversev2.h: Likewise.
	* config/aarch64/tuning_models/neoversev3.h: Likewise.
	* config/aarch64/tuning_models/neoversev3ae.h: Likewise.

gcc/testsuite/
	* gcc.target/aarch64/sve/strided_load_2.c: Adjust expected outcome.
	* gcc.target/aarch64/sve/strided_store_2.c: Likewise.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants