Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update libgccjit.c #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Update libgccjit.c #5

wants to merge 1 commit into from

Conversation

dxhj
Copy link

@dxhj dxhj commented Apr 21, 2016

Clean code

@czipperz
Copy link
Contributor

First off, thanks for your contribution!

Unfortunately, this repository is read only. You need to post this to the mailing list instead. Send a PLAIN TEXT EMAIL to gcc-patches at gcc dot gnu dot org. Here is a link to the mailing list: https://gcc.gnu.org/ml/gcc-patches/2016-04/. Title the mail [PATCH] Updated formatting of gcc/jit/libgccjit.c or something like that. Add a changelog entry at the end of your email body like:

2016-04-21  Victor Costa  dxhj@hotmail.com.br

<TAB>* jit/libgccjit.c: Update whitespace in macros.

Where <TAB> is a literal tab (\t).

Run git diff > a.diff and attach a.diff to the email.

@chriscz chriscz mentioned this pull request Mar 15, 2017
hubot pushed a commit that referenced this pull request May 29, 2017
	* cp-tree.h (lookup_maybe_add): Add DEDUPING argument.
	* name-lookup.c (name_lookup): Add deduping field.
	(name_lookup::preserve_state, name_lookup::restore_state): Deal
	with deduping.
	(name_lookup::add_overload): New.
	(name_lookup::add_value, name_lookup::add_fns): Call add_overload.
	(name_lookup::search_adl): Set deduping.  Don't unmark here.
	* pt.c (most_specialized_instantiation): Revert previous change,
	Assert not given duplicates.
	* tree.c (lookup_mark): Just mark the underlying decls.
	(lookup_maybe_add): Dedup using marked decls.

	PR c++/80891 (#5)
	* g++.dg/lookup/pr80891-5.C: New.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@248578 138bc75d-0d04-0410-961f-82ee72b054a4
hubot pushed a commit that referenced this pull request Apr 19, 2018
When -fcf-protection -mcet is used, I got

FAIL: g++.dg/eh/sighandle.C

(gdb) bt
 #0  _Unwind_RaiseException (exc=exc@entry=0x416ed0)
    at /export/gnu/import/git/sources/gcc/libgcc/unwind.inc:140
 #1  0x00007ffff7d9936b in __cxxabiv1::__cxa_throw (obj=<optimized out>,
    tinfo=0x403dd0 <typeinfo for int@@CXXABI_1.3>, dest=0x0)
    at /export/gnu/import/git/sources/gcc/libstdc++-v3/libsupc++/eh_throw.cc:90
 #2  0x0000000000401255 in sighandler (signo=11, si=0x7fffffffd6f8,
    uc=0x7fffffffd5c0)
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:9
 #3  <signal handler called> <<<< Signal frame which isn't on shadow stack
 #4  dosegv ()
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:14
 #5  0x00000000004012e3 in main ()
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:30
(gdb) p frames
$6 = 5
(gdb)

frame count should be 4, not 5.  This patch skips signal frames when
unwinding shadow stack.

gcc/testsuite/

	PR libgcc/85334
	* g++.dg/torture/pr85334.C: New test.

libgcc/

	PR libgcc/85334
	* unwind-generic.h (_Unwind_Frames_Increment): New.
	* config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
	Likewise.
	* unwind.inc (_Unwind_RaiseException_Phase2): Increment frame
	count with _Unwind_Frames_Increment.
	(_Unwind_ForcedUnwind_Phase2): Likewise.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259502 138bc75d-0d04-0410-961f-82ee72b054a4
kraj pushed a commit to kraj/gcc that referenced this pull request Oct 18, 2019
This is the first step of early splitting all the DImode comparison
operations.  We start by factoring the DImode handling out of
arm_gen_compare_reg into its own function.

Simple DImode equality comparisions (such as equality with zero, or
equality with a constant that is zero in one of the two word values
that it comprises) can be done using a single subtract followed by an
ORRS instruction.  This avoids the need for conditional execution.

For example, (r0 != 5) can be written as

	SUB	Rt, R0, gcc-mirror#5
	ORRS	Rt, Rt, R1

The ORRS is now expanded using an SImode pattern that already exists
in the MD file and this gives the register allocator more freedom to
select registers (consecutive pairs are no-longer required).
Furthermore, we can then delete the arm_cmpdi_zero pattern as it is
no-longer required.  We use SUB for the value adjustment as this has a
generally more flexible range of immediates than XOR and what's more
has the opportunity to be relaxed in thumb2 to a 16-bit SUBS
instruction.

	* config/arm/arm.c (arm_select_cc_mode): For DImode equality tests
	return CC_Zmode if comparing against a constant where one word is
	zero.
	(arm_gen_compare_reg): Split DImode handling to ...
	(arm_gen_dicompare_reg): ... here.  Handle equality comparisons
	against simple constants.
	* config/arm/arm.md (arm_cmpdi_zero): Delete pattern.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277177 138bc75d-0d04-0410-961f-82ee72b054a4
kraj pushed a commit to kraj/gcc that referenced this pull request Oct 18, 2019
This patch improves the expansion of usubvsi4 by allowing suitable
constants to be passed directly.  Unlike normal subtraction, either
operand may be a constant (and indeed I have seen cases where both can
be with LTO enabled).  One interesting testcase that improves as a
result of this is:

unsigned f6 (unsigned a)
{
  unsigned x;
  return __builtin_sub_overflow (5U, a, &x) ? 0 : x;
}

Which previously compiled to:

	rsbs	r3, r0, gcc-mirror#5
	cmp	r0, gcc-mirror#5
	movls	r0, r3
	movhi	r0, #0

but now generates the optimal sequence:

	rsbs	r0, r0, gcc-mirror#5
	movcc	r0, #0

	* config/arm/arm.md (usubv<mode>4): Delete expansion.
	(usubvsi4): New pattern.  Allow some immediate values for inputs.
	(usubvdi4): New pattern.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277187 138bc75d-0d04-0410-961f-82ee72b054a4
emsr pushed a commit to emsr/gcc that referenced this pull request Nov 2, 2019
This is the first step of early splitting all the DImode comparison
operations.  We start by factoring the DImode handling out of
arm_gen_compare_reg into its own function.

Simple DImode equality comparisions (such as equality with zero, or
equality with a constant that is zero in one of the two word values
that it comprises) can be done using a single subtract followed by an
ORRS instruction.  This avoids the need for conditional execution.

For example, (r0 != 5) can be written as

	SUB	Rt, R0, gcc-mirror#5
	ORRS	Rt, Rt, R1

The ORRS is now expanded using an SImode pattern that already exists
in the MD file and this gives the register allocator more freedom to
select registers (consecutive pairs are no-longer required).
Furthermore, we can then delete the arm_cmpdi_zero pattern as it is
no-longer required.  We use SUB for the value adjustment as this has a
generally more flexible range of immediates than XOR and what's more
has the opportunity to be relaxed in thumb2 to a 16-bit SUBS
instruction.

	* config/arm/arm.c (arm_select_cc_mode): For DImode equality tests
	return CC_Zmode if comparing against a constant where one word is
	zero.
	(arm_gen_compare_reg): Split DImode handling to ...
	(arm_gen_dicompare_reg): ... here.  Handle equality comparisons
	against simple constants.
	* config/arm/arm.md (arm_cmpdi_zero): Delete pattern.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277177 138bc75d-0d04-0410-961f-82ee72b054a4
emsr pushed a commit to emsr/gcc that referenced this pull request Nov 2, 2019
This patch improves the expansion of usubvsi4 by allowing suitable
constants to be passed directly.  Unlike normal subtraction, either
operand may be a constant (and indeed I have seen cases where both can
be with LTO enabled).  One interesting testcase that improves as a
result of this is:

unsigned f6 (unsigned a)
{
  unsigned x;
  return __builtin_sub_overflow (5U, a, &x) ? 0 : x;
}

Which previously compiled to:

	rsbs	r3, r0, gcc-mirror#5
	cmp	r0, gcc-mirror#5
	movls	r0, r3
	movhi	r0, #0

but now generates the optimal sequence:

	rsbs	r0, r0, gcc-mirror#5
	movcc	r0, #0

	* config/arm/arm.md (usubv<mode>4): Delete expansion.
	(usubvsi4): New pattern.  Allow some immediate values for inputs.
	(usubvdi4): New pattern.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@277187 138bc75d-0d04-0410-961f-82ee72b054a4
kraj pushed a commit to kraj/gcc that referenced this pull request Jun 17, 2020
Made apparent by recent commit dc70315
"openmp: Implement discovery of implicit declare target to clauses":

    +FAIL: libgomp.c/target-39.c (internal compiler error)
    +FAIL: libgomp.c/target-39.c (test for excess errors)
    +UNRESOLVED: libgomp.c/target-39.c compilation failed to produce executable

This is in a '--enable-offload-targets=[...],hsa' build, with '-foffload=hsa'
enabled (by default).

    during GIMPLE pass: hsagen
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c: In function ‘main._omp_fn.0.hsa.0’:
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c:23:11: internal compiler error: Segmentation fault
       23 |   #pragma omp target map(from:err)
          |           ^~~
    [...]

GDB:

    Program received signal SIGSEGV, Segmentation fault.
    fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    6267      return (fndecl_built_in_p (node, BUILT_IN_NORMAL)
    (gdb) bt
    #0  fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5304
    gcc-mirror#2  0x0000000000b1aca7 in gen_hsa_insns_for_gimple_stmt (stmt=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5770
    gcc-mirror#3  0x0000000000b1bd21 in gen_body_from_gimple () at [...]/source-gcc/gcc/hsa-gen.c:5999
    gcc-mirror#4  0x0000000000b1dbd2 in generate_hsa (kernel=<optimized out>) at [...]/source-gcc/gcc/hsa-gen.c:6596
    gcc-mirror#5  0x0000000000b1de66 in (anonymous namespace)::pass_gen_hsail::execute (this=0x2a2aac0) at [...]/source-gcc/gcc/hsa-gen.c:6680
    gcc-mirror#6  0x0000000000d06f90 in execute_one_pass (pass=pass@entry=0x2a2aac0) at [...]/source-gcc/gcc/passes.c:2502
    [...]
    (gdb) up
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at /home/thomas/tmp/source/gcc/build/track-slim-omp/source-gcc/gcc/hsa-gen.c:5304
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    (gdb) print function_decl
    $1 = (tree) 0x0
    (gdb) list
    5299      if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
    5300        {
    5301          tree function_decl = gimple_call_fndecl (stmt);
    5302          /* Prefetch pass can create type-mismatching prefetch builtin calls which
    5303             fail the gimple_call_builtin_p test above.  Handle them here.  */
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    5305            return;
    5306
    5307          if (function_decl == NULL_TREE)
    5308            {

The problem is present already since 2016-11-23 commit
56b1c60 (r242761) "Merge from HSA branch to
trunk", and the fix obvious enough.

	gcc/
	* hsa-gen.c (gen_hsa_insns_for_call): Move 'function_decl ==
	NULL_TREE' check earlier.
	gcc/testsuite/
	* c-c++-common/gomp/hsa-indirect-call-1.c: New file.
kraj pushed a commit to kraj/gcc that referenced this pull request Jun 17, 2020
Made apparent by recent commit dc70315
"openmp: Implement discovery of implicit declare target to clauses":

    +FAIL: libgomp.c/target-39.c (internal compiler error)
    +FAIL: libgomp.c/target-39.c (test for excess errors)
    +UNRESOLVED: libgomp.c/target-39.c compilation failed to produce executable

This is in a '--enable-offload-targets=[...],hsa' build, with '-foffload=hsa'
enabled (by default).

    during GIMPLE pass: hsagen
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c: In function ‘main._omp_fn.0.hsa.0’:
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c:23:11: internal compiler error: Segmentation fault
       23 |   #pragma omp target map(from:err)
          |           ^~~
    [...]

GDB:

    Program received signal SIGSEGV, Segmentation fault.
    fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    6267      return (fndecl_built_in_p (node, BUILT_IN_NORMAL)
    (gdb) bt
    #0  fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5304
    gcc-mirror#2  0x0000000000b1aca7 in gen_hsa_insns_for_gimple_stmt (stmt=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5770
    gcc-mirror#3  0x0000000000b1bd21 in gen_body_from_gimple () at [...]/source-gcc/gcc/hsa-gen.c:5999
    gcc-mirror#4  0x0000000000b1dbd2 in generate_hsa (kernel=<optimized out>) at [...]/source-gcc/gcc/hsa-gen.c:6596
    gcc-mirror#5  0x0000000000b1de66 in (anonymous namespace)::pass_gen_hsail::execute (this=0x2a2aac0) at [...]/source-gcc/gcc/hsa-gen.c:6680
    gcc-mirror#6  0x0000000000d06f90 in execute_one_pass (pass=pass@entry=0x2a2aac0) at [...]/source-gcc/gcc/passes.c:2502
    [...]
    (gdb) up
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at /home/thomas/tmp/source/gcc/build/track-slim-omp/source-gcc/gcc/hsa-gen.c:5304
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    (gdb) print function_decl
    $1 = (tree) 0x0
    (gdb) list
    5299      if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
    5300        {
    5301          tree function_decl = gimple_call_fndecl (stmt);
    5302          /* Prefetch pass can create type-mismatching prefetch builtin calls which
    5303             fail the gimple_call_builtin_p test above.  Handle them here.  */
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    5305            return;
    5306
    5307          if (function_decl == NULL_TREE)
    5308            {

The problem is present already since 2016-11-23 commit
56b1c60 (r242761) "Merge from HSA branch to
trunk", and the fix obvious enough.

	gcc/
	* hsa-gen.c (gen_hsa_insns_for_call): Move 'function_decl ==
	NULL_TREE' check earlier.
	gcc/testsuite/
	* c-c++-common/gomp/hsa-indirect-call-1.c: New file.

(cherry picked from commit 973bce0)
kraj pushed a commit to kraj/gcc that referenced this pull request Jun 17, 2020
Made apparent by recent commit dc70315
"openmp: Implement discovery of implicit declare target to clauses":

    +FAIL: libgomp.c/target-39.c (internal compiler error)
    +FAIL: libgomp.c/target-39.c (test for excess errors)
    +UNRESOLVED: libgomp.c/target-39.c compilation failed to produce executable

This is in a '--enable-offload-targets=[...],hsa' build, with '-foffload=hsa'
enabled (by default).

    during GIMPLE pass: hsagen
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c: In function ‘main._omp_fn.0.hsa.0’:
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c:23:11: internal compiler error: Segmentation fault
       23 |   #pragma omp target map(from:err)
          |           ^~~
    [...]

GDB:

    Program received signal SIGSEGV, Segmentation fault.
    fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    6267      return (fndecl_built_in_p (node, BUILT_IN_NORMAL)
    (gdb) bt
    #0  fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5304
    gcc-mirror#2  0x0000000000b1aca7 in gen_hsa_insns_for_gimple_stmt (stmt=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5770
    gcc-mirror#3  0x0000000000b1bd21 in gen_body_from_gimple () at [...]/source-gcc/gcc/hsa-gen.c:5999
    gcc-mirror#4  0x0000000000b1dbd2 in generate_hsa (kernel=<optimized out>) at [...]/source-gcc/gcc/hsa-gen.c:6596
    gcc-mirror#5  0x0000000000b1de66 in (anonymous namespace)::pass_gen_hsail::execute (this=0x2a2aac0) at [...]/source-gcc/gcc/hsa-gen.c:6680
    gcc-mirror#6  0x0000000000d06f90 in execute_one_pass (pass=pass@entry=0x2a2aac0) at [...]/source-gcc/gcc/passes.c:2502
    [...]
    (gdb) up
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at /home/thomas/tmp/source/gcc/build/track-slim-omp/source-gcc/gcc/hsa-gen.c:5304
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    (gdb) print function_decl
    $1 = (tree) 0x0
    (gdb) list
    5299      if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
    5300        {
    5301          tree function_decl = gimple_call_fndecl (stmt);
    5302          /* Prefetch pass can create type-mismatching prefetch builtin calls which
    5303             fail the gimple_call_builtin_p test above.  Handle them here.  */
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    5305            return;
    5306
    5307          if (function_decl == NULL_TREE)
    5308            {

The problem is present already since 2016-11-23 commit
56b1c60 (r242761) "Merge from HSA branch to
trunk", and the fix obvious enough.

	gcc/
	* hsa-gen.c (gen_hsa_insns_for_call): Move 'function_decl ==
	NULL_TREE' check earlier.
	gcc/testsuite/
	* c-c++-common/gomp/hsa-indirect-call-1.c: New file.

(cherry picked from commit 973bce0)
kraj pushed a commit to kraj/gcc that referenced this pull request Jun 17, 2020
Made apparent by recent commit dc70315
"openmp: Implement discovery of implicit declare target to clauses":

    +FAIL: libgomp.c/target-39.c (internal compiler error)
    +FAIL: libgomp.c/target-39.c (test for excess errors)
    +UNRESOLVED: libgomp.c/target-39.c compilation failed to produce executable

This is in a '--enable-offload-targets=[...],hsa' build, with '-foffload=hsa'
enabled (by default).

    during GIMPLE pass: hsagen
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c: In function ‘main._omp_fn.0.hsa.0’:
    source-gcc/libgomp/testsuite/libgomp.c/target-39.c:23:11: internal compiler error: Segmentation fault
       23 |   #pragma omp target map(from:err)
          |           ^~~
    [...]

GDB:

    Program received signal SIGSEGV, Segmentation fault.
    fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    6267      return (fndecl_built_in_p (node, BUILT_IN_NORMAL)
    (gdb) bt
    #0  fndecl_built_in_p (node=0x0, name=BUILT_IN_PREFETCH) at [...]/source-gcc/gcc/tree.h:6267
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5304
    gcc-mirror#2  0x0000000000b1aca7 in gen_hsa_insns_for_gimple_stmt (stmt=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at [...]/source-gcc/gcc/hsa-gen.c:5770
    gcc-mirror#3  0x0000000000b1bd21 in gen_body_from_gimple () at [...]/source-gcc/gcc/hsa-gen.c:5999
    gcc-mirror#4  0x0000000000b1dbd2 in generate_hsa (kernel=<optimized out>) at [...]/source-gcc/gcc/hsa-gen.c:6596
    gcc-mirror#5  0x0000000000b1de66 in (anonymous namespace)::pass_gen_hsail::execute (this=0x2a2aac0) at [...]/source-gcc/gcc/hsa-gen.c:6680
    gcc-mirror#6  0x0000000000d06f90 in execute_one_pass (pass=pass@entry=0x2a2aac0) at [...]/source-gcc/gcc/passes.c:2502
    [...]
    (gdb) up
    #1  0x0000000000b19739 in gen_hsa_insns_for_call (stmt=stmt@entry=0x7ffff693b200, hbb=hbb@entry=0x2b152c0) at /home/thomas/tmp/source/gcc/build/track-slim-omp/source-gcc/gcc/hsa-gen.c:5304
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    (gdb) print function_decl
    $1 = (tree) 0x0
    (gdb) list
    5299      if (!gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
    5300        {
    5301          tree function_decl = gimple_call_fndecl (stmt);
    5302          /* Prefetch pass can create type-mismatching prefetch builtin calls which
    5303             fail the gimple_call_builtin_p test above.  Handle them here.  */
    5304          if (fndecl_built_in_p (function_decl, BUILT_IN_PREFETCH))
    5305            return;
    5306
    5307          if (function_decl == NULL_TREE)
    5308            {

The problem is present already since 2016-11-23 commit
56b1c60 (r242761) "Merge from HSA branch to
trunk", and the fix obvious enough.

	gcc/
	* hsa-gen.c (gen_hsa_insns_for_call): Move 'function_decl ==
	NULL_TREE' check earlier.
	gcc/testsuite/
	* c-c++-common/gomp/hsa-indirect-call-1.c: New file.

(cherry picked from commit 973bce0)
kraj pushed a commit to kraj/gcc that referenced this pull request Oct 12, 2020
Prevents the following UBSAN error:

./xgcc -B. /home/marxin/Programming/gcc/gcc/testsuite/g++.dg/torture/pr49770.C -O2 -c
/home/marxin/Programming/gcc2/gcc/ipa-modref-tree.h:482:22: runtime error: load of value 2, which is not a valid value for type 'bool'
    #0 0x1fdb4d1 in modref_tree<int>::merge(modref_tree<int>*, vec<modref_parm_map, va_heap, vl_ptr>*) /home/marxin/Programming/gcc2/gcc/ipa-modref-tree.h:482
    #1 0x1fcadaa in merge_call_side_effects(modref_summary*, gimple*, modref_summary*, bool) /home/marxin/Programming/gcc2/gcc/ipa-modref.c:511
    gcc-mirror#2 0x1fcbadd in analyze_call /home/marxin/Programming/gcc2/gcc/ipa-modref.c:642
    gcc-mirror#3 0x1fcc061 in analyze_stmt /home/marxin/Programming/gcc2/gcc/ipa-modref.c:732
    gcc-mirror#4 0x1fccf31 in analyze_function /home/marxin/Programming/gcc2/gcc/ipa-modref.c:823
    gcc-mirror#5 0x1fd17e5 in execute /home/marxin/Programming/gcc2/gcc/ipa-modref.c:1441
    gcc-mirror#6 0x25cca6e in execute_one_pass(opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2509
    gcc-mirror#7 0x25cd39b in execute_pass_list_1 /home/marxin/Programming/gcc2/gcc/passes.c:2597
    gcc-mirror#8 0x25cd450 in execute_pass_list_1 /home/marxin/Programming/gcc2/gcc/passes.c:2598
    gcc-mirror#9 0x25cd4ee in execute_pass_list(function*, opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2608
    gcc-mirror#10 0x25c7a5a in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/Programming/gcc2/gcc/passes.c:1726
    gcc-mirror#11 0x25cfa3f in execute_ipa_pass_list(opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2941
    gcc-mirror#12 0x173572d in ipa_passes /home/marxin/Programming/gcc2/gcc/cgraphunit.c:2642
    gcc-mirror#13 0x17364ee in symbol_table::compile() /home/marxin/Programming/gcc2/gcc/cgraphunit.c:2777
    gcc-mirror#14 0x17372d9 in symbol_table::finalize_compilation_unit() /home/marxin/Programming/gcc2/gcc/cgraphunit.c:3022
    gcc-mirror#15 0x2a1f00a in compile_file /home/marxin/Programming/gcc2/gcc/toplev.c:485
    gcc-mirror#16 0x2a27dc8 in do_compile /home/marxin/Programming/gcc2/gcc/toplev.c:2321
    gcc-mirror#17 0x2a283cc in toplev::main(int, char**) /home/marxin/Programming/gcc2/gcc/toplev.c:2460
    gcc-mirror#18 0x54f21cd in main /home/marxin/Programming/gcc2/gcc/main.c:39
    gcc-mirror#19 0x7ffff6f0de09 in __libc_start_main ../csu/libc-start.c:314
    gcc-mirror#20 0x9eac09 in _start (/home/marxin/Programming/gcc2/objdir/gcc/cc1plus+0x9eac09)

gcc/ChangeLog:

	* ipa-modref.c (merge_call_side_effects): Clear modref_parm_map
	fields in the vector.
kraj pushed a commit to kraj/gcc that referenced this pull request Oct 19, 2020
It fixes:

/home/marxin/Programming/gcc2/gcc/ipa-modref-tree.h:482:22: runtime error: load of value 255, which is not a valid value for type 'bool'
    #0 0x18e5df3 in modref_tree<int>::merge(modref_tree<int>*, vec<modref_parm_map, va_heap, vl_ptr>*) /home/marxin/Programming/gcc2/gcc/ipa-modref-tree.h:482
    #1 0x18dc180 in ipa_merge_modref_summary_after_inlining(cgraph_edge*) /home/marxin/Programming/gcc2/gcc/ipa-modref.c:1779
    gcc-mirror#2 0x18c1c72 in inline_call(cgraph_edge*, bool, vec<cgraph_edge*, va_heap, vl_ptr>*, int*, bool, bool*) /home/marxin/Programming/gcc2/gcc/ipa-inline-transform.c:492
    gcc-mirror#3 0x4a3589c in inline_small_functions /home/marxin/Programming/gcc2/gcc/ipa-inline.c:2216
    gcc-mirror#4 0x4a3b230 in ipa_inline /home/marxin/Programming/gcc2/gcc/ipa-inline.c:2697
    gcc-mirror#5 0x4a3d902 in execute /home/marxin/Programming/gcc2/gcc/ipa-inline.c:3096
    gcc-mirror#6 0x1edf831 in execute_one_pass(opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2509
    gcc-mirror#7 0x1ee26af in execute_ipa_pass_list(opt_pass*) /home/marxin/Programming/gcc2/gcc/passes.c:2936
    gcc-mirror#8 0x103f31b in ipa_passes /home/marxin/Programming/gcc2/gcc/cgraphunit.c:2700
    gcc-mirror#9 0x103fb40 in symbol_table::compile() /home/marxin/Programming/gcc2/gcc/cgraphunit.c:2777
    gcc-mirror#10 0x104092b in symbol_table::finalize_compilation_unit() /home/marxin/Programming/gcc2/gcc/cgraphunit.c:3022
    gcc-mirror#11 0x235723b in compile_file /home/marxin/Programming/gcc2/gcc/toplev.c:485
    gcc-mirror#12 0x235fff9 in do_compile /home/marxin/Programming/gcc2/gcc/toplev.c:2321
    gcc-mirror#13 0x23605fc in toplev::main(int, char**) /home/marxin/Programming/gcc2/gcc/toplev.c:2460
    gcc-mirror#14 0x4e2b93b in main /home/marxin/Programming/gcc2/gcc/main.c:39
    gcc-mirror#15 0x7ffff6f0ae09 in __libc_start_main ../csu/libc-start.c:314
    gcc-mirror#16 0x9a0be9 in _start (/home/marxin/Programming/gcc2/objdir/gcc/cc1+0x9a0be9)

gcc/ChangeLog:

	* ipa-modref.c (compute_parm_map): Clear vector.
nstester pushed a commit to nstester/gcc that referenced this pull request Feb 23, 2021
/home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77:25: runtime error: left shift of 0x0000000000000000fffffffffffffffb by 96 places cannot be represented in type '__int128'
    #0 0x7ffff754edfe in __ubsan::Value::getSIntValue() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.cpp:77
    #1 0x7ffff7548719 in __ubsan::Value::isNegative() const /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_value.h:190
    #2 0x7ffff7542a34 in handleShiftOutOfBoundsImpl /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:338
    #3 0x7ffff75431b7 in __ubsan_handle_shift_out_of_bounds /home/marxin/Programming/gcc2/libsanitizer/ubsan/ubsan_handlers.cpp:370
    #4 0x40067f in main (/home/marxin/Programming/testcases/a.out+0x40067f)
    gcc-mirror#5 0x7ffff72c8b24 in __libc_start_main (/lib64/libc.so.6+0x27b24)
    gcc-mirror#6 0x4005bd in _start (/home/marxin/Programming/testcases/a.out+0x4005bd)

Differential Revision: https://reviews.llvm.org/D97263

Cherry-pick from 16ede09.
nstester pushed a commit to nstester/gcc that referenced this pull request May 7, 2021
gcc/ada/

	* raise-gcc.c (__gnat_others_value): Remove const qualifier.
	(__gnat_all_others_value): Likewise.
	(__gnat_unhandled_others_value): Likewise.
	(GNAT_OTHERS): Cast to Exception_Id instead of _Unwind_Ptr.
	(GNAT_ALL_OTHERS): Likewise.
	(GNAT_UNHANDLED_OTHERS): Likewise.
	(Is_Handled_By_Others): Change parameter type to Exception_Id.
	(Language_For): Likewise.
	(Foreign_Data_For): Likewise.
	(is_handled_by): Likewise.  Adjust throughout, remove redundant
	line and fix indentation.
	* libgnat/a-exexpr.adb (Is_Handled_By_Others): Remove pragma and
	useless qualification from parameter type.
	(Foreign_Data_For): Likewise.
	(Language_For): Likewise.
nstester pushed a commit to nstester/gcc that referenced this pull request Jun 14, 2021
The fixed error is:

==21166==ERROR: AddressSanitizer: alloc-dealloc-mismatch (operator new [] vs operator delete) on 0x60300000d900
    #0 0x7367d7 in operator delete(void*, unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:172
    #1 0x3b82e6e in pointer_equiv_analyzer::~pointer_equiv_analyzer() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:161
    #2 0x3b83387 in hybrid_folder::~hybrid_folder() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:517
    #3 0x3b83387 in execute_early_vrp /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:686
    #4 0x1790611 in execute_one_pass(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2567
    gcc-mirror#5 0x1792003 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2656
    gcc-mirror#6 0x1792029 in execute_pass_list_1 /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2657
    gcc-mirror#7 0x179209f in execute_pass_list(function*, opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:2667
    gcc-mirror#8 0x178a5f3 in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:1773
    gcc-mirror#9 0x1792fac in do_per_function_toporder(void (*)(function*, void*), void*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/plugin.h:191
    gcc-mirror#10 0x1792fac in execute_ipa_pass_list(opt_pass*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/passes.c:3001
    gcc-mirror#11 0xc525fc in ipa_passes /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2154
    gcc-mirror#12 0xc525fc in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2289
    gcc-mirror#13 0xc5a096 in symbol_table::compile() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2269
    gcc-mirror#14 0xc5a096 in symbol_table::finalize_compilation_unit() /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/cgraphunit.c:2537
    gcc-mirror#15 0x1a7a17c in compile_file /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:482
    gcc-mirror#16 0x69c758 in do_compile /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2210
    gcc-mirror#17 0x69c758 in toplev::main(int, char**) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/toplev.c:2349
    gcc-mirror#18 0x6a932a in main /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/main.c:39
    gcc-mirror#19 0x7ffff7820b34 in __libc_start_main ../csu/libc-start.c:332
    gcc-mirror#20 0x6aa5fd in _start (/home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/objdir/gcc/cc1+0x6aa5fd)

0x60300000d900 is located 0 bytes inside of 32-byte region [0x60300000d900,0x60300000d920)
allocated by thread T0 here:
    #0 0x735ab7 in operator new[](unsigned long) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/libsanitizer/asan/asan_new_delete.cpp:102
    #1 0x3b82dac in pointer_equiv_analyzer::pointer_equiv_analyzer(gimple_ranger*) /home/marxin/BIG/buildbot/buildworker/marxinbox-gcc-asan/build/gcc/gimple-ssa-evrp.c:156

gcc/ChangeLog:

	* gimple-ssa-evrp.c (pointer_equiv_analyzer::~pointer_equiv_analyzer): Use delete[].
nstester pushed a commit to nstester/gcc that referenced this pull request Jul 8, 2021
…atch.pd

All of the optimizations/transformations mentioned in bugzilla for
PR tree-optimization/40210 are already implemented in mainline GCC,
with one exception.  In comment gcc-mirror#5, there's a suggestion that
(bswap64(x)>>56)&0xff can be implemented without the bswap as
(unsigned char)x, or equivalently x&0xff.

This patch implements the above optimization, and closely related
variants.  For any single bit, (bswap(X)>>C1)&1 can be simplified
to (X>>C2)&1, where bit position C2 is the appropriate permutation
of C1.  Similarly, the bswap can eliminated if the desired set of
bits all lie within the same byte, hence (bswap(x)>>8)&255 can
always be optimized, as can (bswap(x)>>8)&123.

Previously,
int foo(long long x) {
  return (__builtin_bswap64(x) >> 56) & 0xff;
}

compiled with -O2 to
foo:	movq    %rdi, %rax
        bswap   %rax
        shrq    $56, %rax
        ret

with this patch, it now compiles to
foo:	movzbl  %dil, %eax
        ret

2021-07-08  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR tree-optimization/40210
	* match.pd (bswap optimizations): Simplify (bswap(x)>>C1)&C2 as
	(x>>C3)&C2 when possible.  Simplify bswap(x)>>C1 as ((T)x)>>C2
	when possible.  Simplify bswap(x)&C1 as (x>>C2)&C1 when 0<=C1<=255.

gcc/testsuite/ChangeLog
	PR tree-optimization/40210
	* gcc.dg/builtin-bswap-13.c: New test.
	* gcc.dg/builtin-bswap-14.c: New test.
nstester pushed a commit to nstester/gcc that referenced this pull request Nov 26, 2021
Fixes:

==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178
READ of size 4 at 0x00000666ca5c thread T0
    #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855
    #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916
    #2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887
    #3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829
    #4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427
    gcc-mirror#5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346
    gcc-mirror#6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967
    gcc-mirror#7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808
    gcc-mirror#8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146

for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d.

gcc/d/ChangeLog:

	* d-attribs.cc (parse_optimize_options): Check index before
	accessing cl_options.
kraj pushed a commit to kraj/gcc that referenced this pull request Nov 26, 2021
Fixes:

==129444==ERROR: AddressSanitizer: global-buffer-overflow on address 0x00000666ca5c at pc 0x000000ef094b bp 0x7fffffff8180 sp 0x7fffffff8178
READ of size 4 at 0x00000666ca5c thread T0
    #0 0xef094a in parse_optimize_options ../../gcc/d/d-attribs.cc:855
    #1 0xef0d36 in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:916
    gcc-mirror#2 0xef107e in d_handle_optimize_attribute ../../gcc/d/d-attribs.cc:887
    gcc-mirror#3 0xff85b1 in decl_attributes(tree_node**, tree_node*, int, tree_node*) ../../gcc/attribs.c:829
    gcc-mirror#4 0xef2a91 in apply_user_attributes(Dsymbol*, tree_node*) ../../gcc/d/d-attribs.cc:427
    gcc-mirror#5 0xf7b7f3 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:1346
    gcc-mirror#6 0xf87bc7 in get_symbol_decl(Declaration*) ../../gcc/d/decl.cc:967
    gcc-mirror#7 0xf87bc7 in DeclVisitor::visit(FuncDeclaration*) ../../gcc/d/decl.cc:808
    gcc-mirror#8 0xf83db5 in DeclVisitor::build_dsymbol(Dsymbol*) ../../gcc/d/decl.cc:146

for the following test-case: gcc/testsuite/gdc.dg/attr_optimize1.d.

gcc/d/ChangeLog:

	* d-attribs.cc (parse_optimize_options): Check index before
	accessing cl_options.
nstester pushed a commit to nstester/gcc that referenced this pull request Dec 30, 2021
…imize or target pragmas [PR103012]

The following testcases ICE when an optimize or target pragma
is followed by a long line (4096+ chars).
This is because on such long lines we can't use columns anymore,
but the cpp_define calls performed by c_cpp_builtins_optimize_pragma
or from the backend hooks for target pragma are done on temporary
buffers and expect to get columns from whatever line they appear on
(which happens to be the long line after optimize/target pragma),
and we run into:
 #0  fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986
 #1  0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502
 #2  0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827
 #3  0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898
 #4  0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592
 gcc-mirror#5  0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394
 gcc-mirror#6  0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601
 gcc-mirror#7  0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639
 gcc-mirror#8  0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589
 gcc-mirror#9  0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513
 gcc-mirror#10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522
 gcc-mirror#11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>)
     at ../../gcc/c-family/c-cppbuiltin.c:600
assertion that LC_RENAME doesn't happen first.

I think the right fix is emit those predefined macros upon
optimize/target pragmas with BUILTINS_LOCATION, like we already do
for those macros at the start of the TU, they don't appear in columns
of the next line after it.  Another possibility would be to force them
at the location of the pragma.

2021-12-30  Jakub Jelinek  <jakub@redhat.com>

	PR c++/103012
gcc/
	* config/i386/i386-c.c (ix86_pragma_target_parse): Perform
	cpp_define/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
	* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise.
	* config/s390/s390-c.c (s390_pragma_target_parse): Likewise.
gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform
	cpp_define_unused/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
gcc/testsuite/
	PR c++/103012
	* g++.dg/cpp/pr103012.C: New test.
	* g++.target/i386/pr103012.C: New test.
kraj pushed a commit to kraj/gcc that referenced this pull request Jan 24, 2022
…imize or target pragmas [PR103012]

The following testcases ICE when an optimize or target pragma
is followed by a long line (4096+ chars).
This is because on such long lines we can't use columns anymore,
but the cpp_define calls performed by c_cpp_builtins_optimize_pragma
or from the backend hooks for target pragma are done on temporary
buffers and expect to get columns from whatever line they appear on
(which happens to be the long line after optimize/target pragma),
and we run into:
 #0  fancy_abort (file=0x3abec67 "../../libcpp/line-map.c", line=502, function=0x3abecfc "linemap_add") at ../../gcc/diagnostic.c:1986
 #1  0x0000000002e7c335 in linemap_add (set=0x7ffff7fca000, reason=LC_RENAME, sysp=0, to_file=0x41287a0 "pr103012.i", to_line=3) at ../../libcpp/line-map.c:502
 gcc-mirror#2  0x0000000002e7cc24 in linemap_line_start (set=0x7ffff7fca000, to_line=3, max_column_hint=128) at ../../libcpp/line-map.c:827
 gcc-mirror#3  0x0000000002e7ce2b in linemap_position_for_column (set=0x7ffff7fca000, to_column=1) at ../../libcpp/line-map.c:898
 gcc-mirror#4  0x0000000002e771f9 in _cpp_lex_direct (pfile=0x40c3b60) at ../../libcpp/lex.c:3592
 gcc-mirror#5  0x0000000002e76c3e in _cpp_lex_token (pfile=0x40c3b60) at ../../libcpp/lex.c:3394
 gcc-mirror#6  0x0000000002e610ef in lex_macro_node (pfile=0x40c3b60, is_def_or_undef=true) at ../../libcpp/directives.c:601
 gcc-mirror#7  0x0000000002e61226 in do_define (pfile=0x40c3b60) at ../../libcpp/directives.c:639
 gcc-mirror#8  0x0000000002e610b2 in run_directive (pfile=0x40c3b60, dir_no=0, buf=0x7fffffffd430 "__OPTIMIZE__ 1\n", count=14) at ../../libcpp/directives.c:589
 gcc-mirror#9  0x0000000002e650c1 in cpp_define (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2513
 gcc-mirror#10 0x0000000002e65100 in cpp_define_unused (pfile=0x40c3b60, str=0x2f784d1 "__OPTIMIZE__") at ../../libcpp/directives.c:2522
 gcc-mirror#11 0x0000000000f50685 in c_cpp_builtins_optimize_pragma (pfile=0x40c3b60, prev_tree=<optimization_node 0x7fffea042000>, cur_tree=<optimization_node 0x7fffea042020>)
     at ../../gcc/c-family/c-cppbuiltin.c:600
assertion that LC_RENAME doesn't happen first.

I think the right fix is emit those predefined macros upon
optimize/target pragmas with BUILTINS_LOCATION, like we already do
for those macros at the start of the TU, they don't appear in columns
of the next line after it.  Another possibility would be to force them
at the location of the pragma.

2021-12-30  Jakub Jelinek  <jakub@redhat.com>

	PR c++/103012
gcc/
	* config/i386/i386-c.c (ix86_pragma_target_parse): Perform
	cpp_define/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
	* config/arm/arm-c.c (arm_pragma_target_parse): Likewise.
	* config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Likewise.
	* config/s390/s390-c.c (s390_pragma_target_parse): Likewise.
gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins_optimize_pragma): Perform
	cpp_define_unused/cpp_undef calls with forced token locations
	BUILTINS_LOCATION.
gcc/testsuite/
	PR c++/103012
	* g++.dg/cpp/pr103012.C: New test.
	* g++.target/i386/pr103012.C: New test.

(cherry picked from commit 1dbe26b)
vathpela pushed a commit to vathpela/gcc that referenced this pull request Feb 22, 2022
…04617]

On
 #define A(n) int foo1##n(void) { return 1##n; }
 #define B(n) A(n##0) A(n#gcc-mirror#1) A(n#gcc-mirror#2) A(n#gcc-mirror#3) A(n#gcc-mirror#4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9)
 #define C(n) B(n##0) B(n#gcc-mirror#1) B(n#gcc-mirror#2) B(n#gcc-mirror#3) B(n#gcc-mirror#4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9)
 #define D(n) C(n##0) C(n#gcc-mirror#1) C(n#gcc-mirror#2) C(n#gcc-mirror#3) C(n#gcc-mirror#4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9)
 #define E(n) D(n##0) D(n#gcc-mirror#1) D(n#gcc-mirror#2) D(n#gcc-mirror#3) D(n#gcc-mirror#4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9)
 E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
 B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR lto/104617
	* simple-object-elf.c (simple_object_elf_match): Fix up URL
	in comment.
	(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
	sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
	range (inclusive).
xionghul pushed a commit to xionghul/gcc that referenced this pull request Mar 24, 2022
Here we weren't respecting SFINAE when evaluating a call to a consteval
function, which caused us to reject the new testcase below.  This patch
fixes this by making build_over_call use the SFINAE-friendly version of
cxx_constant_value.

This change causes us to no longer diagnose ahead of time a couple of
non-constant non-dependent consteval calls in consteval-if2.C with
-fchecking=2.  These errors were apparently coming from the call to
fold_non_dependent_expr in build_non_dependent_expr (for the RHS of the +=)
despite complain=tf_none being passed.  Now that build_over_call respects
the value of complain during constant evaluation of a consteval call,
the errors are gone.

That the errors are also gone without -fchecking=2 is a regression caused
by r12-7264-gc19f317a78c0e4 and is the subject of PR104620.  As described
in comment gcc-mirror#5, I think it's basically an accident that we were diagnosing
these two calls correctly before r12-7264, so perhaps we can live without
these errors for GCC 12.  Thus this patch just XFAILs the two tests.

	PR c++/104620

gcc/cp/ChangeLog:

	* call.cc (build_over_call): Use cxx_constant_value_sfinae
	instead of cxx_constant_value to evaluate a consteval call.
	* constexpr.cc (cxx_constant_value_sfinae): Add decl parameter
	and pass it to cxx_eval_outermost_constant_expr.
	* cp-tree.h (cxx_constant_value_sfinae): Add decl parameter.
	* pt.cc (fold_targs_r): Pass NULL_TREE as decl parameter to
	cxx_constant_value_sfinae.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp23/consteval-if2.C: XFAIL two dg-error tests where
	the argument to the non-constant non-dependent consteval call is
	wrapped by NON_DEPENDENT_EXPR.
	* g++.dg/cpp2a/consteval30.C: New test.
kraj pushed a commit to kraj/gcc that referenced this pull request May 10, 2022
…04617]

On
 #define A(n) int foo1##n(void) { return 1##n; }
 #define B(n) A(n##0) A(n##1) A(n#gcc-mirror#2) A(n#gcc-mirror#3) A(n#gcc-mirror#4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9)
 #define C(n) B(n##0) B(n##1) B(n#gcc-mirror#2) B(n#gcc-mirror#3) B(n#gcc-mirror#4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9)
 #define D(n) C(n##0) C(n##1) C(n#gcc-mirror#2) C(n#gcc-mirror#3) C(n#gcc-mirror#4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9)
 #define E(n) D(n##0) D(n##1) D(n#gcc-mirror#2) D(n#gcc-mirror#3) D(n#gcc-mirror#4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9)
 E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
 B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR lto/104617
	* simple-object-elf.c (simple_object_elf_match): Fix up URL
	in comment.
	(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
	sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
	range (inclusive).

(cherry picked from commit 2f59f06)
kraj pushed a commit to kraj/gcc that referenced this pull request May 11, 2022
…04617]

On
 #define A(n) int foo1##n(void) { return 1##n; }
 #define B(n) A(n##0) A(n##1) A(n#gcc-mirror#2) A(n#gcc-mirror#3) A(n#gcc-mirror#4) A(n#gcc-mirror#5) A(n#gcc-mirror#6) A(n#gcc-mirror#7) A(n#gcc-mirror#8) A(n#gcc-mirror#9)
 #define C(n) B(n##0) B(n##1) B(n#gcc-mirror#2) B(n#gcc-mirror#3) B(n#gcc-mirror#4) B(n#gcc-mirror#5) B(n#gcc-mirror#6) B(n#gcc-mirror#7) B(n#gcc-mirror#8) B(n#gcc-mirror#9)
 #define D(n) C(n##0) C(n##1) C(n#gcc-mirror#2) C(n#gcc-mirror#3) C(n#gcc-mirror#4) C(n#gcc-mirror#5) C(n#gcc-mirror#6) C(n#gcc-mirror#7) C(n#gcc-mirror#8) C(n#gcc-mirror#9)
 #define E(n) D(n##0) D(n##1) D(n#gcc-mirror#2) D(n#gcc-mirror#3) D(n#gcc-mirror#4) D(n#gcc-mirror#5) D(n#gcc-mirror#6) D(n#gcc-mirror#7) D(n#gcc-mirror#8) D(n#gcc-mirror#9)
 E(0) E(1) E(2) D(30) D(31) C(320) C(321) C(322) C(323) C(324) C(325)
 B(3260) B(3261) B(3262) B(3263) A(32640) A(32641) A(32642)
testcase with
./xgcc -B ./ -c -g -fpic -ffat-lto-objects -flto  -O0 -o foo1.o foo1.c -ffunction-sections
./xgcc -B ./ -shared -g -fpic -flto -O0 -o foo1.so foo1.o
/tmp/ccTW8mBm.debug.temp.o: file not recognized: file format not recognized
(testcase too slow to be included into testsuite).
The problem is clearly reported by readelf:
readelf: foo1.o.debug.temp.o: Warning: Section 2 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 5 has an out of range sh_link value of 65321
readelf: foo1.o.debug.temp.o: Warning: Section 10 has an out of range sh_link value of 65323
readelf: foo1.o.debug.temp.o: Warning: [ 2]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [ 5]: Link field (65321) should index a symtab section.
readelf: foo1.o.debug.temp.o: Warning: [10]: Link field (65323) should index a string section.
because simple_object_elf_copy_lto_debug_sections doesn't adjust sh_info and
sh_link fields in ElfNN_Shdr if they are in between SHN_{LO,HI}RESERVE
inclusive.  Not adjusting those is incorrect though, SHN_{LO,HI}RESERVE
range is only relevant to the 16-bit fields, mainly st_shndx in ElfNN_Sym
where if one needs >= SHN_LORESERVE section number, SHN_XINDEX should be
used instead and .symtab_shndx section should contain the real section
index, and in ElfNN_Ehdr e_shnum and e_shstrndx fields, where if >=
SHN_LORESERVE value is needed it should put those into
Shdr[0].sh_{size,link}.  But, sh_{link,info} are 32-bit fields which can
contain any section index.

Note, as simple-object-elf.c mentions, binutils from 2.12 to 2.18 (so before
2011) used to mishandle the > 63.75K sections case and assumed there is a
hole in between the sections, but what
simple_object_elf_copy_lto_debug_sections does wouldn't help in that case
for the debug temp object creation, we'd need to detect the case also in
that routine and take it into account in the remapping etc.  I think
it is not worth it given that it is over 10 years, if somebody needs
63.75K or more sections, better use more recent binutils.

2022-02-22  Jakub Jelinek  <jakub@redhat.com>

	PR lto/104617
	* simple-object-elf.c (simple_object_elf_match): Fix up URL
	in comment.
	(simple_object_elf_copy_lto_debug_sections): Remap sh_info and
	sh_link even if they are in the SHN_LORESERVE .. SHN_HIRESERVE
	range (inclusive).

(cherry picked from commit 2f59f06)
nstester pushed a commit to nstester/gcc that referenced this pull request May 2, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X##1;		\
    template <class U> struct X##2;		\
    template <class U> struct X##3;		\
    template <class U> struct X##4;		\
    template <class U> struct X#gcc-mirror#5;		\
    template <class U> struct X#gcc-mirror#6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
Liaoshihua pushed a commit to Liaoshihua/gcc that referenced this pull request Mar 19, 2024
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X#gcc-mirror#1;		\
    template <class U> struct X#gcc-mirror#2;		\
    template <class U> struct X#gcc-mirror#3;		\
    template <class U> struct X#gcc-mirror#4;		\
    template <class U> struct X#gcc-mirror#5;		\
    template <class U> struct X#gcc-mirror#6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
hubot pushed a commit that referenced this pull request Mar 21, 2024
Consider

  constexpr int VAL = 1;
  struct foo {
      template <int B>
      void bar(typename std::conditional<B==VAL, int, float>::type arg) { }
  };
  template void foo::bar<1>(int arg);

where we since r11-291 fail to emit the code for the explicit
instantiation.  That's because cp_walk_subtrees/TYPENAME_TYPE now
walks TYPE_CONTEXT ('conditional' here) as well, and in a template
finds the B==VAL template argument.  VAL is constexpr, which implies const,
which in the global scope implies static.  constrain_visibility_for_template
then makes "struct conditional<(B == VAL), int, float>" non-TREE_PUBLIC.
Then symtab_node::needed_p checks TREE_PUBLIC, sees it's 0, and we don't
emit any code.

I thought the fix would be some ODR-esque check to not consider
constexpr variables/fns that are used just for their value.  But
it turned out to be tricky.  For instance, we can't skip
determine_visibility in a template; we can't even skip it for value-dep
expressions.  For example, no-linkage-expr1.C has

  using P = struct {}*;
  template <int N>
  void f(int(*)[((P)0, N)]) {}

where ((P)0, N) is value-dep, but N is not relevant here: we have to
ferret out the anonymous type.  When instantiating, it's already gone.

This patch uses decl_constant_var_p.  This is to implement (an
approximation) [basic.def.odr]#14.5.1 and [basic.def.odr]#5.2.

	PR c++/110323

gcc/cp/ChangeLog:

	* decl2.cc (min_vis_expr_r) <case VAR_DECL>: Do nothing for
	decl_constant_var_p VAR_DECLs.

gcc/testsuite/ChangeLog:

	* g++.dg/template/explicit-instantiation6.C: New test.
	* g++.dg/template/explicit-instantiation7.C: New test.
NinaRanns referenced this pull request in NinaRanns/gcc Jun 11, 2024
hubot pushed a commit that referenced this pull request Sep 7, 2024
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.
mikpe added a commit to mikpe/gcc that referenced this pull request Sep 8, 2024
hubot pushed a commit that referenced this pull request Sep 12, 2024
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.

(cherry picked from commit bb8dd09)
hubot pushed a commit that referenced this pull request Sep 13, 2024
…o_debug_section [PR116614]

cat abc.C
  #define A(n) struct T##n {} t##n;
  #define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
  #define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
  #define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
  #define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
  E(1) E(2) E(3)
  int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status

The following patch fixes that.  Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.

Yet, the fix isn't solely about removing the
  if (new_i - 1 >= SHN_LORESERVE)
    {
      *err = ENOTSUP;
      return "Too many copied sections";
    }
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit.  But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.

So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.

2024-09-07  Jakub Jelinek  <jakub@redhat.com>

	PR lto/116614
	* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
	comments.
	(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
	consistency.
	(simple_object_elf_find_sections): Formatting fixes.
	(simple_object_elf_fetch_attributes): Likewise.
	(simple_object_elf_attributes_merge): Likewise.
	(simple_object_elf_start_write): Likewise.
	(simple_object_elf_write_ehdr): Likewise.
	(simple_object_elf_write_shdr): Likewise.
	(simple_object_elf_write_to_file): Likewise.
	(simple_object_elf_copy_lto_debug_section): Likewise.  Don't fail for
	new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
	over .symtab_shndx sections, though emit those last and compute their
	section content when processing associated .symtab sections.  Handle
	simple_object_internal_read failure even in the .symtab_shndx reading
	case.

(cherry picked from commit bb8dd09)
hubot pushed a commit that referenced this pull request Oct 24, 2024
This patch folds svindex with constant arguments into a vector series.
We implemented this in svindex_impl::fold using the function build_vec_series.
For example,
svuint64_t f1 ()
{
  return svindex_u642 (10, 3);
}
compiled with -O2 -march=armv8.2-a+sve, is folded to {10, 13, 16, ...}
in the gimple pass lower.
This optimization benefits cases where svindex is used in combination with
other gimple-level optimizations.
For example,
svuint64_t f2 ()
{
    return svmul_x (svptrue_b64 (), svindex_u64 (10, 3), 5);
}
has previously been compiled to
f2:
        index   z0.d, #10, #3
        mul     z0.d, z0.d, #5
        ret
Now, it is compiled to
f2:
        mov     x0, 50
        index   z0.d, x0, #15
        ret

We added test cases checking
- the application of the transform during gimple for constant arguments,
- the interaction with another gimple-level optimization.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>

gcc/
	* config/aarch64/aarch64-sve-builtins-base.cc
	(svindex_impl::fold): Add constant folding.

gcc/testsuite/
	* gcc.target/aarch64/sve/index_const_fold.c: New test.
hubot pushed a commit that referenced this pull request Nov 4, 2024
We can make use of the integrated rotate step of the XAR instruction
to implement most vector integer rotates, as long we zero out one
of the input registers for it.  This allows for a lower-latency sequence
than the fallback SHL+USRA, especially when we can hoist the zeroing operation
away from loops and hot parts.  This should be safe to do for 64-bit vectors
as well even though the XAR instructions operate on 128-bit values, as the
bottom 64-bit results is later accessed through the right subregs.

This strategy is used whenever we have XAR instructions, the logic
in aarch64_emit_opt_vec_rotate is adjusted to resort to
expand_rotate_as_vec_perm only when it's expected to generate a single REV*
instruction or when XAR instructions are not present.

With this patch we can gerate for the input:
v4si
G1 (v4si r)
{
    return (r >> 23) | (r << 9);
}

v8qi
G2 (v8qi r)
{
  return (r << 3) | (r >> 5);
}
the assembly for +sve2:
G1:
        movi    v31.4s, 0
        xar     z0.s, z0.s, z31.s, #23
        ret

G2:
        movi    v31.4s, 0
        xar     z0.b, z0.b, z31.b, #5
        ret

instead of the current:
G1:
        shl     v31.4s, v0.4s, 9
        usra    v31.4s, v0.4s, 23
        mov     v0.16b, v31.16b
        ret
G2:
        shl     v31.8b, v0.8b, 3
        usra    v31.8b, v0.8b, 5
        mov     v0.8b, v31.8b
        ret

Bootstrapped and tested on aarch64-none-linux-gnu.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>

gcc/

	* config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add
	generation of XAR sequences when possible.

gcc/testsuite/

	* gcc.target/aarch64/rotate_xar_1.c: New test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants