Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Q: how to apply mforce-l32 patch #4

Closed
susisstrolch opened this issue Jun 12, 2015 · 4 comments
Closed

Q: how to apply mforce-l32 patch #4

susisstrolch opened this issue Jun 12, 2015 · 4 comments

Comments

@susisstrolch
Copy link

can you give an advice how to apply this particular patch to crosstool-NG build environment?

@jcmvbkbc
Copy link
Owner

Download the patch ( https://github.com/jcmvbkbc/gcc-xtensa/commit/6b0c9f92fb8e11c6be098febb4f502f6af37cd35.patch ) to the crosstool-NG/local-patches/gcc/5.1.0 and rebuild the toolchain (by removing .build/src and running ct-ng build)

@susisstrolch
Copy link
Author

Thanks - "rm -rf .build/src" was the magic powder I was missing...

@pfalcon
Copy link

pfalcon commented Jun 12, 2015

@jcmvbkbc : Now I'm missing something - since when your crosstool-NG switched to gcc 5.1.0 and in which branch? I see that both lx106 and lx106-g++ branches have 4.8.2 specified in crosstool.config, while lx106-g++-1.21.0 doesn't seem to have version set, is that it?

@jcmvbkbc
Copy link
Owner

Yes, it is. Published that branch last week, posted update to the gcc thread on esp8266.com.
The patch applies to 4.8.2/4.9.2 as well though.

jcmvbkbc pushed a commit that referenced this issue Nov 22, 2016
	Backport from mainline.
	PR fortran/50221
	PR fortran/68216
	PR fortran/63932
	PR fortran/66408
	* trans_array.c (gfc_conv_scalarized_array_ref): Pass the
	symbol decl for deferred character length array references.
	* trans-stmt.c (gfc_trans_allocate): Keep the string lengths
	to update deferred length character string lengths.
	* trans-types.c (gfc_get_dtype_rank_type); Use the string
	length of deferred character types for the dtype size.
	* trans.c (gfc_build_array_ref): For references to deferred
	character arrays, use the domain max value, if it is a variable
	to set the 'span' and use pointer arithmetic for acces to the
	element.
	(trans_code): Set gfc_current_locus for diagnostic purposes.

	Backport from mainline.
	PR fortran/67674
	* trans-expr.c (gfc_conv_procedure_call): Do not fix deferred
	string lengths of components.

	Backport from mainline.
	PR fortran/49954
	* resolve.c (deferred_op_assign): New function.
	(gfc_resolve_code): Call it.
	* trans-array.c (concat_str_length): New function.
	(gfc_alloc_allocatable_for_assignment): Jump directly to alloc/
	realloc blocks for deferred character length arrays because the
	string length might change, even if the shape is the same. Call
	concat_str_length to obtain the string length for concatenation
	since it is needed to compute the lhs string length.
	Set the descriptor dtype appropriately for the new string
	length.
	* trans-expr.c (gfc_trans_assignment_1): Fix the rse string
	length for all characters, other than deferred types. For
	concatenation operators, push the rse.pre block to the inner
	most loop so that the temporary pointer and the assignments
	are properly placed.

	Backport from mainline.
	PR fortran/67779
	* trans_array.c (gfc_conv_scalarized_array_ref): Add missing
	se->use_offset from condition for calculation of 'base'.

2015-01-10  Paul Thomas  <pault@gcc.gnu.org>

	Backport from mainline.
	PR fortran/50221
	* gfortran.dg/deferred_character_1.f90: New test.
	* gfortran.dg/deferred_character_4.f90: New test for comment
	#4 of the PR.

	Backport from mainline.
	PR fortran/68216
	* gfortran.dg/deferred_character_2.f90: New test.

	Backport from mainline.
	PR fortran/67674
	* gfortran.dg/deferred_character_3.f90: New test.

	Backport from mainline.
	PR fortran/63932
	* gfortran.dg/deferred_character_5.f90: New test.

	Backport from mainline.
	PR fortran/66408
	* gfortran.dg/deferred_character_6.f90: New test.

	Backport from mainline.
	PR fortran/49954
	* gfortran.dg/deferred_character_7.f90: New test.

	Backport from mainline.
	PR fortran/67779
	* gfortran.dg/actual_array_offset_1: New test.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-5-branch@232203 138bc75d-0d04-0410-961f-82ee72b054a4
jcmvbkbc pushed a commit that referenced this issue May 30, 2017
	* ptree.c (cxx_print_xnode): Show internal OVERLOAD structure.
	* tree.c (ovl_insert, ovl_iterator_remove_node): Fix copying
	assert.

	PR c++/80891 (#4)
	* g++.dg/lookup/pr80891-4.C: New.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@248576 138bc75d-0d04-0410-961f-82ee72b054a4
jcmvbkbc pushed a commit that referenced this issue Jun 18, 2018
When -fcf-protection -mcet is used, I got

FAIL: g++.dg/eh/sighandle.C

(gdb) bt
 #0  _Unwind_RaiseException (exc=exc@entry=0x416ed0)
    at /export/gnu/import/git/sources/gcc/libgcc/unwind.inc:140
 #1  0x00007ffff7d9936b in __cxxabiv1::__cxa_throw (obj=<optimized out>,
    tinfo=0x403dd0 <typeinfo for int@@CXXABI_1.3>, dest=0x0)
    at /export/gnu/import/git/sources/gcc/libstdc++-v3/libsupc++/eh_throw.cc:90
 #2  0x0000000000401255 in sighandler (signo=11, si=0x7fffffffd6f8,
    uc=0x7fffffffd5c0)
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:9
 #3  <signal handler called> <<<< Signal frame which isn't on shadow stack
 #4  dosegv ()
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:14
 #5  0x00000000004012e3 in main ()
    at /export/gnu/import/git/sources/gcc/gcc/testsuite/g++.dg/eh/sighandle.C:30
(gdb) p frames
$6 = 5
(gdb)

frame count should be 4, not 5.  This patch skips signal frames when
unwinding shadow stack.

gcc/testsuite/

	PR libgcc/85334
	* g++.dg/torture/pr85334.C: New test.

libgcc/

	PR libgcc/85334
	* unwind-generic.h (_Unwind_Frames_Increment): New.
	* config/i386/shadow-stack-unwind.h (_Unwind_Frames_Increment):
	Likewise.
	* unwind.inc (_Unwind_RaiseException_Phase2): Increment frame
	count with _Unwind_Frames_Increment.
	(_Unwind_ForcedUnwind_Phase2): Likewise.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@259502 138bc75d-0d04-0410-961f-82ee72b054a4
jcmvbkbc pushed a commit that referenced this issue Dec 27, 2022
With many thanks to H.J. for doing all the hard work, this patch resolves
two P1 regressions; PR target/106933 and PR target/106959.

Although superficially similar, the i386 backend's two scalar-to-vector
(STV) passes perform their transformations in importantly different ways.
The original pass converting SImode and DImode operations to V4SImode
or V2DImode operations is "soft", allowing values to be maintained in
both integer and vector hard registers.  The newer pass converting TImode
operations to V1TImode is "hard" (all or nothing) that converts all uses
of a pseudo to vector form.  To implement this it invokes powerful ju-ju
calling SET_MODE on a reg_rtx, which due to RTL sharing, often updates
this pseudo's mode everywhere in the RTL chain.  Hence, TImode STV can only
be performed when all uses of a pseudo are convertible to V1TImode form.
To ensure this the STV passes currently use data-flow analysis to inspect
all DEFs and USEs in a chain.  This works fine for chains that are in
the usual single assignment form, but the occurrence of uninitialized
variables, or multiple assignments that split a pseudo's usage into
several independent chains (lifetimes) can lead to situations where
some but not all of a pseudo's occurrences need to be updated.  This is
safe for the SImode/DImode pass, but leads to the above bugs during
the TImode pass.

My one minor tweak to HJ's patch from comment #4 of bugzilla PR106959
is to only perform the new single_def_chain_p check for TImode STV; it
turns out that STV of SImode/DImode min/max operates safely on multiple-def
chains, and prohibiting this leads to testsuite regressions.  We don't
(yet) support V1TImode min/max, so this idiom isn't an issue during the
TImode STV pass.

For the record, the two alternate possible fixes are (i) make the TImode
STV pass "soft", by eliminating use of SET_MODE, instead using replace_rtx
with a new pseudo, or (ii) merging "chains" so that multiple DFA
chains/lifetimes are considered a single STV chain.

2022-12-23  H.J. Lu  <hjl.tools@gmail.com>
	    Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	PR target/106933
	PR target/106959
	* config/i386/i386-features.cc (single_def_chain_p): New predicate
	function to check that a pseudo's use-def chain is in SSA form.
	(timode_scalar_to_vector_candidate_p): Check that TImode regs that
	are SET_DEST or SET_SRC of an insn match/are single_def_chain_p.

gcc/testsuite/ChangeLog
	PR target/106933
	PR target/106959
	* gcc.target/i386/pr106933-1.c: New test case.
	* gcc.target/i386/pr106933-2.c: Likewise.
	* gcc.target/i386/pr106959-1.c: Likewise.
	* gcc.target/i386/pr106959-2.c: Likewise.
	* gcc.target/i386/pr106959-3.c: Likewise.
jcmvbkbc pushed a commit that referenced this issue Feb 13, 2023
The aarch64 ISA specification allows a left shift amount to be applied
after extension in the range of 0 to 4 (encoded in the imm3 field).

This is true for at least the following instructions:

 * ADD (extend register)
 * ADDS (extended register)
 * SUB (extended register)

The result of this patch can be seen, when compiling the following code:

uint64_t myadd(uint64_t a, uint64_t b)
{
    return a+(((uint8_t)b)<<4);
}

Without the patch the following sequence will be generated:

0000000000000000 <myadd>:
   0:	d37c1c21 	ubfiz	x1, x1, #4, #8
   4:	8b000020 	add	x0, x1, x0
   8:	d65f03c0 	ret

With the patch the ubfiz will be merged into the add instruction:

0000000000000000 <myadd>:
   0:	8b211000 	add	x0, x0, w1, uxtb #4
   4:	d65f03c0 	ret

gcc/ChangeLog:

	* config/aarch64/aarch64.cc (aarch64_uxt_size): fix an
	off-by-one in checking the permissible shift-amount.
jcmvbkbc pushed a commit that referenced this issue May 8, 2023
This patch adds support for xstormy16's swap nibbles instruction (swpn).
For the test case:

short foo(short x) {
  return (x&0xff00) | ((x<<4)&0xf0) | ((x>>4)&0x0f);
}

GCC with -O2 currently generates the nine instruction sequence:
foo:    mov r7,r2
        asr r2,#4
        and r2,#15
        mov.w r6,#-256
        and r6,r7
        or r2,r6
        shl r7,#4
        and r7,#255
        or r2,r7
        ret

with this patch, we now generate:
foo:	swpn r2
	ret

To achieve this using combine's four instruction "combinations" requires
a little wizardry.  Firstly, define_insn_and_split are introduced to
treat logical shifts followed by bitwise-AND as macro instructions that
are split after reload.  This is sufficient to recognize a QImode
nibble swap, which can be implemented by swpn followed by either a
zero-extension or a sign-extension from QImode to HImode.  Then finally,
in the correct context, a QImode swap-nibbles pattern can be combined to
preserve the high-byte of a HImode word, matching the xstormy16's swpn
semantics.  The naming of the new code iterators is taken from i386.md.

2023-04-29  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/stormy16/stormy16.md (any_lshift): New code iterator.
	(any_or_plus): Likewise.
	(any_rotate): Likewise.
	(*<any_lshift>_and_internal): New define_insn_and_split to
	recognize a logical shift followed by an AND, and split it
	again after reload.
	(*swpn): New define_insn matching xstormy16's swpn.
	(*swpn_zext): New define_insn recognizing swpn followed by
	zero_extendqihi2, i.e. with the high byte set to zero.
	(*swpn_sext): Likewise, for swpn followed by cbw.
	(*swpn_sext_2): Likewise, for an alternate RTL form.
	(*swpn_zext_ior): A pre-reload splitter so that an swpn+zext+ior
	sequence is split in the correct place to recognize the *swpn_zext
	followed by any_or_plus (ior, xor or plus) instruction.

gcc/testsuite/ChangeLog
	* gcc.target/xstormy16/swpn-1.c: New QImode test case.
	* gcc.target/xstormy16/swpn-2.c: New zero_extend test case.
	* gcc.target/xstormy16/swpn-3.c: New sign_extend test case.
	* gcc.target/xstormy16/swpn-4.c: New HImode test case.
jcmvbkbc pushed a commit that referenced this issue May 8, 2023
I noticed that for member class templates of a class template we were
unnecessarily substituting both the template and its type.  Avoiding that
duplication speeds compilation of this silly testcase from ~12s to ~9s on my
laptop.  It's unlikely to make a difference on any real code, but the
simplification is also nice.

We still need to clear CLASSTYPE_USE_TEMPLATE on the partial instantiation
of the template class, but it makes more sense to do that in
tsubst_template_decl anyway.

  #define NC(X)					\
    template <class U> struct X##1;		\
    template <class U> struct X##2;		\
    template <class U> struct X##3;		\
    template <class U> struct X##4;		\
    template <class U> struct X##5;		\
    template <class U> struct X##6;
  #define NC2(X) NC(X##a) NC(X##b) NC(X##c) NC(X##d) NC(X##e) NC(X##f)
  #define NC3(X) NC2(X##A) NC2(X##B) NC2(X##C) NC2(X##D) NC2(X##E)
  template <int I> struct A
  {
    NC3(am)
  };
  template <class...Ts> void sink(Ts...);
  template <int...Is> void g()
  {
    sink(A<Is>()...);
  }
  template <int I> void f()
  {
    g<__integer_pack(I)...>();
  }
  int main()
  {
    f<1000>();
  }

gcc/cp/ChangeLog:

	* pt.cc (instantiate_class_template): Skip the RECORD_TYPE
	of a class template.
	(tsubst_template_decl): Clear CLASSTYPE_USE_TEMPLATE.
jcmvbkbc pushed a commit that referenced this issue Nov 11, 2023
This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a ZERO_EXTEND
to a single ZERO_EXTEND, but as shown in this PR it is possible for
combine's make_compound_operation to unintentionally generate a
non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is unlikely to be
matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]))))

which results in the following code:

foo:	AND     #0xff, R12
        RLAM.A #4, R12 { RRAM.A #4, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

With this patch, we now see:

Trying 2 -> 7:
    2: r25:HI=zero_extend(R12:QI)
      REG_DEAD R12:QI
    7: r28:PSI=sign_extend(r25:HI)#0
      REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
    (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ])))
allowing combination of insns 2 and 7
original costs 4 + 8 = 12
replacement cost 8

foo:	MOV.B   R12, R12
        RLAM.A  #1, R12
        MOVX.W  table(R12), R12
        RETA

2023-10-26  Roger Sayle  <roger@nextmovesoftware.com>
	    Richard Biener  <rguenther@suse.de>

gcc/ChangeLog
	PR rtl-optimization/91865
	* combine.cc (make_compound_operation): Avoid creating a
	ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
	PR rtl-optimization/91865
	* gcc.target/msp430/pr91865.c: New test case.
jcmvbkbc pushed a commit that referenced this issue Mar 14, 2024
Here we have

  template<class T>
  auto is_throwable(T t) -> decltype(throw t, true) { ... }

where we didn't properly mark 't' as IMPLICIT_RVALUE_P, which caused
the wrong overload to have been chosen.  Jason figured out it's because
we don't correctly implement [expr.prim.id.unqual]#4.2, which post-P2266
says that an id-expression is move-eligible if

"the id-expression (possibly parenthesized) is the operand of
a throw-expression, and names an implicitly movable entity that belongs
to a scope that does not contain the compound-statement of the innermost
lambda-expression, try-block, or function-try-block (if any) whose
compound-statement or ctor-initializer contains the throw-expression."

I worked out that it's trying to say that given

  struct X {
    X();
    X(const X&);
    X(X&&) = delete;
  };

the following should fail: the scope of the throw is an sk_try, and it's
also x's scope S, and S "does not contain the compound-statement of the
*try-block" so x is move-eligible, so we move, so we fail.

  void f ()
  try {
    X x;
    throw x;  // use of deleted function
  } catch (...) {
  }

Whereas here:

  void g (X x)
  try {
    throw x;
  } catch (...) {
  }

the throw is again in an sk_try, but x's scope is an sk_function_parms
which *does* contain the {} of the *try-block, so x is not move-eligible,
so we don't move, so we use X(const X&), and the code is fine.

The current code also doesn't seem to handle

  void h (X x) {
    void z (decltype(throw x, true));
  }

where there's no enclosing lambda or sk_try so we should move.

I'm not doing anything about lambdas because we shouldn't reach the
code at the end of the function: the DECL_HAS_VALUE_EXPR_P check
shouldn't let us go further.

	PR c++/113789
	PR c++/113853

gcc/cp/ChangeLog:

	* typeck.cc (treat_lvalue_as_rvalue_p): Update code to better
	reflect [expr.prim.id.unqual]#4.2.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/sfinae69.C: Remove dg-bogus.
	* g++.dg/cpp0x/sfinae70.C: New test.
	* g++.dg/cpp0x/sfinae71.C: New test.
	* g++.dg/cpp0x/sfinae72.C: New test.
	* g++.dg/cpp2a/implicit-move4.C: New test.
jcmvbkbc pushed a commit that referenced this issue Jun 17, 2024
Here during overload resolution we have two strictly viable ambiguous
candidates #1 and #2, and two non-strictly viable candidates #3 and #4
which we hold on to ever since r14-6522.  These latter candidates have
an empty second arg conversion since the first arg conversion was deemed
bad, and this trips up joust when called on #3 and #4 which assumes all
arg conversions are there.

We can fix this by making joust robust to empty arg conversions, but in
this situation we shouldn't need to compare #3 and #4 at all given that
we have a strictly viable candidate.  To that end, this patch makes
tourney shortcut considering non-strictly viable candidates upon
encountering ambiguity between two strictly viable candidates (taking
advantage of the fact that the candidates list is sorted according to
viability via splice_viable).

	PR c++/115239

gcc/cp/ChangeLog:

	* call.cc (tourney): Don't consider a non-strictly viable
	candidate as the champ if there was ambiguity between two
	strictly viable candidates.

gcc/testsuite/ChangeLog:

	* g++.dg/overload/error7.C: New test.

Reviewed-by: Jason Merrill <jason@redhat.com>
jcmvbkbc pushed a commit that referenced this issue Jul 29, 2024
These tests used to generate:

        bl      swap
        ldr     r2, [sp, #4]
        mov     r0, r2  @ __fp16

but g:9d20529d94b23275885f380d155fe8671ab5353a means that we can
load directly into r0:

        bl      swap
        ldrh    r0, [sp, #4]    @ __fp16

This patch updates the tests to "defend" this change.

While there, the scans include:

mov\tr1, r[03]}

But if the spill of r2 occurs first, there's no real reason why
r2 couldn't be used as the temporary, instead r3.

The patch tries to update the scans while preserving the spirit
of the originals.

gcc/testsuite/
	* gcc.target/arm/fp16-aapcs-2.c: Expect the return value to be
	loaded directly from the stack.  Test that the swap generates
	two moves out of r0/r1 and two moves in.
	* gcc.target/arm/fp16-aapcs-4.c: Likewise.
jcmvbkbc pushed a commit that referenced this issue Nov 10, 2024
Update test case for armv8.1-m.main that supports conditional
arithmetic.

armv7-m:
        push    {r4, lr}
        ldr     r4, .L6
        ldr     r4, [r4]
        lsls    r4, r4, #29
        it      mi
        addmi   r2, r2, #1
        bl      bar
        movs    r0, #0
        pop     {r4, pc}

armv8.1-m.main:
        push    {r3, r4, r5, lr}
        ldr     r4, .L5
        ldr     r5, [r4]
        tst     r5, #4
        csinc   r2, r2, r2, eq
        bl      bar
        movs    r0, #0
        pop     {r3, r4, r5, pc}

gcc/testsuite/ChangeLog:

	* gcc.target/arm/epilog-1.c: Use check-function-bodies.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants