Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: AIX build broken with SIGSEGV #70483

Closed
ayappanec opened this issue Nov 21, 2024 · 13 comments
Closed

runtime: AIX build broken with SIGSEGV #70483

ayappanec opened this issue Nov 21, 2024 · 13 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FixPending Issues that have a fix which has not yet been reviewed or submitted. help wanted NeedsFix The path to resolution is known, but the work has not been done. OS-AIX
Milestone

Comments

@ayappanec
Copy link

Go version

go version devel go1.24-a2a4f00783

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='ppc64'
GOBIN=''
GOCACHE='/home/buildusr/.cache/go-build'
GOENV='/home/buildusr/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='ppc64'
GOHOSTOS='aix'
GOINSECURE=''
GOMODCACHE='/home/buildusr/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='aix'
GOPATH='/home/buildusr/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/freeware/lib/golang'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/freeware/lib/golang/pkg/tool/aix_ppc64'
GOVCS=''
GOVERSION='go1.22.6'
GCCGO='gccgo'
GOPPC64='power8'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -maix64 -pthread -mcmodel=large -fmessage-length=0 -ffile-prefix-map=/tmp/go-build4155363056=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Trigger go build using make.bash

What did you see happen?

Building Go bootstrap cmd/go (go_bootstrap) using Go toolchain1.
Building Go toolchain2 using go_bootstrap and Go toolchain1.
Building Go toolchain3 using go_bootstrap and Go toolchain2.
# runtime
SIGSEGV: segmentation violation
PC=0x100075e74 m=8 sigcode=50 addr=0x20113f960

goroutine 0 gp=0xa00010000003500 m=8 mp=0xa00010000580008 [idle]:
runtime.getGCMaskOnDemand(0x100a0e960)
	runtime/type.go:108 +0x24 fp=0x110cd6570 sp=0x110cd6510 pc=0x100075e74
runtime.getGCMask(...)
	runtime/type.go:86
runtime.(*mspan).typePointersOfUnchecked(0xa00010000382cc0?, 0xa00010000382cc0?)
	runtime/mbitmap.go:200 +0x94 fp=0x110cd65b0 sp=0x110cd6570 pc=0x100012f84
runtime.scanobject(0xa00010000180000, 0xa0001000004c150)
	runtime/mgcmark.go:1426 +0x24c fp=0x110cd6648 sp=0x110cd65b0 pc=0x10002263c
runtime.gcDrain(0xa0001000004c150, 0x7)
	runtime/mgcmark.go:1228 +0x294 fp=0x110cd66c0 sp=0x110cd6648 pc=0x100021d24
runtime.gcDrainMarkWorkerIdle(...)
	runtime/mgcmark.go:1100
runtime.gcBgMarkWorker.func2()
	runtime/mgc.go:1519 +0x74 fp=0x110cd6720 sp=0x110cd66c0 pc=0x10001d614
runtime.systemstack(0x0)
	runtime/asm_ppc64x.s:256 +0x68 fp=0x110cd6740 sp=0x110cd6720 pc=0x100083038

goroutine 203 gp=0xa00010000923880 m=8 mp=0xa00010000580008 [GC worker (active)]:
runtime.systemstack_switch()
	runtime/asm_ppc64x.s:213 +0x10 fp=0xa000100005e3ee0 sp=0xa000100005e3ec0 pc=0x100082fb0
runtime.gcBgMarkWorker(0xa00010002c68380)
	runtime/mgc.go:1483 +0x270 fp=0xa000100005e3f98 sp=0xa000100005e3ee0 pc=0x10001d2f0
runtime.gcBgMarkStartWorkers.gowrap1()
	runtime/mgc.go:1339 +0x4c fp=0xa000100005e3fc0 sp=0xa000100005e3f98 pc=0x10001d06c
runtime.goexit({})
	runtime/asm_ppc64x.s:1022 +0x4 fp=0xa000100005e3fc0 sp=0xa000100005e3fc0 pc=0x100085a14
created by runtime.gcBgMarkStartWorkers in goroutine 1
	runtime/mgc.go:1339 +0x198

https://build.golang.org/log/3ddc19274de093367f9925b03a9135a64c5412e3

What did you expect to see?

The build break started after this change --> https://go-review.googlesource.com/c/go/+/616255

@ayappanec
Copy link
Author

@randall77

@dmitshur
Copy link
Contributor

CC @golang/aix.

@dmitshur dmitshur added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-AIX labels Nov 21, 2024
@dmitshur dmitshur added this to the Go1.24 milestone Nov 21, 2024
@dmitshur dmitshur changed the title AIX build broken with SIGSEGV runtime: AIX build broken with SIGSEGV Nov 21, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 21, 2024
@randall77
Copy link
Contributor

It is failing when trying to do an atomic operation. The address looks strange, probably it isn't mapped.
I suspect what is happening is that aix/ppc64 doesn't do relocations from the rodata section to the data section correctly. (I think I remember something like that from long ago in my brain, not sure.)

If an aix and/or ppc64 person can confirm, I can think about how we might fix it. It may be tricky.

@golang/ppc64

@randall77
Copy link
Contributor

My memory is of #58857

@randall77
Copy link
Contributor

We might need to move all the types from SRODATA to SNOPTRDATA on aix.

@jkrishmys
Copy link
Contributor

Hi @randall77, let me have a look and confirm. Thanks

@jkrishmys
Copy link
Contributor

Hello @randall77, Here, during gc marking, r6 register holds 0x20114ac60, which is the faulting address. r3 and r7 both pointing to 0x100a0eb60
`# runtime [cmd/compile]
SIGSEGV: segmentation violation
PC=0x100075cf4 m=19 sigcode=50 addr=0x20114ac60.

goroutine 481 gp=0xa000100046c76c0 m=nil [GC worker (idle)]:
runtime.gopark(0x110166ee0?, 0x1?, 0x0?, 0x0?, 0xfffffffffffdfd2?)
/home/jkrishna/golang/golangdevel/goroot/src/runtime/proc.go:435 +0x144 fp=0xa000100046c8ee0 sp=0xa000100046c8eb0 pc=0x10007e054
runtime.gcBgMarkWorker(0xa00010003596620)
/home/jkrishna/golang/golangdevel/goroot/src/runtime/mgc.go:1423 +0x11c fp=0xa000100046c8f98 sp=0xa000100046c8ee0 pc=0x10001d11c
runtime.gcBgMarkStartWorkers.gowrap1()
/home/jkrishna/golang/golangdevel/goroot/src/runtime/mgc.go:1339 +0x4c fp=0xa000100046c8fc0 sp=0xa000100046c8f98 pc=0x10001cfec
runtime.goexit({})
/home/jkrishna/golang/golangdevel/goroot/src/runtime/asm_ppc64x.s:1022 +0x4 fp=0xa000100046c8fc0 sp=0xa000100046c8fc0 pc=0x100085874
created by runtime.gcBgMarkStartWorkers in goroutine 1
/home/jkrishna/golang/golangdevel/goroot/src/runtime/mgc.go:1339 +0x198

r0 0x0 r1 0x11151b510
r2 0x1100f8810 r3 0x100a0eb60
r4 0xa00010000300000 r5 0x10
r6 0x20114ac60 r7 0x100a0eb60
r8 0x1ea000 r9 0xa00010000300000
r10 0xa000100046d6818 r11 0x1101295e0
r12 0x100 r13 0x111523800
r14 0x1 r15 0x110129728
r16 0x6 r17 0xb9
r18 0xa0001000004cf08 r19 0xa0001000004e1a8
r20 0x20 r21 0x16
r22 0x11141be30 r23 0x11151b630
r24 0x0 r25 0x79afde02
r26 0xffffffffdca2bd19 r27 0xfffffffffa3cee9e
r28 0xfffffffffea0920e r29 0xa0001000004e198
r30 0xa000100006ae540 r31 0x100012f04
pc 0x100075cf4 ctr 0x9000000005e2700
link 0x100012f04 xer 0x20000000
ccr 0x42448228 trap 0x0
go tool dist: FAILED: /home/jkrishna/golang/golangdevel/goroot/pkg/tool/aix_ppc64/go_bootstrap install -a cmd/asm cmd/cgo cmd/compile cmd/link cmd/preprofile: exit status 1
` as in the previous issue #58857, there may be AIX relocation issue of rodata.

@randall77
Copy link
Contributor

I poked at this and didn't make much progress. Without ssh access to a builder I can't really tell what is going on, or what fix might be productive. An AIX person needs to take the lead on this one.

Possible ideas:

  1. Hack compiler to put types in the data section. I'm not sure this will work, as other things in rodata point to types (like itabs), so it's just pushing the problem elsewhere. But maybe we could find all the cases where this happens, or ...
  2. Just get rid of rodata on aix, and just put everything that would have been in rodata into data (noptrdata?) instead.
  3. Fix the aix loader so it handles rodata->data relocations correctly.

@jkrishmys
Copy link
Contributor

Hello @randall77, thanks for the lead. We will work on it.

@jkrishmys
Copy link
Contributor

jkrishmys commented Dec 10, 2024

I am playing with second idea of getting rid of rodata on AIX, and I am relatively new to golang linking. In symtab.go, I played with changing where the garbage collection symbols are collected, from rodata to noptr and data. Getting phase errors. Have changed the rodata sections to noptr in data.go file.
runtime.gcbss: phase error: addr=0x20085a35e but val=0x200859ea0 sym=runtime.gcbss type=SDATA sect=.data sect.addr=0x200859ea0 prev=runtime.gcdata

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/638016 mentions this issue: cmd/link, runtime: apply a delta to RODATA->DATA relocations

@cherrymui
Copy link
Member

cherrymui commented Dec 21, 2024

It is perhaps possible to type descriptors in NOPTRDATA, or move all RODATA to NOPTRDATA. But that is tricky. The linker has assumptions about where these symbols go in multiple places. We can look into that in the next release. For now, CL 638016 applies a more targeted fix, by applying the offset at run time. It also has the advantage of keeping things read-only.

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. FixPending Issues that have a fix which has not yet been reviewed or submitted. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 21, 2024
@dmitshur dmitshur moved this from Todo to All-But-Submitted in Go Compiler / Runtime Dec 21, 2024
@github-project-automation github-project-automation bot moved this from All-But-Submitted to Done in Go Compiler / Runtime Dec 23, 2024
wyf9661 pushed a commit to wyf9661/go that referenced this issue Jan 21, 2025
On AIX, an R_ADDR relocation from an RODATA symbol to a DATA
symbol does not work, as the dynamic loader can change the address
of the data section, and it is not possible to apply a dynamic
relocation to RODATA. In order to get the correct address, we
apply the delta between unrelocated and relocated data section
addresses at run time. The linker saves both the unrelocated and
the relocated addresses, so we can compute the delta.

This is possible because RODATA symbols are generated by the
compiler and so we have full control of. On AIX, the only case
is the on-demand GC pointer masks from the type descriptors, for
very large types.

Perhaps there is a better way.

Fixes golang#70483.

Change-Id: I2664c0a813b38f7b146794cb1e73ccf5e238ca65
Reviewed-on: https://go-review.googlesource.com/c/go/+/638016
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FixPending Issues that have a fix which has not yet been reviewed or submitted. help wanted NeedsFix The path to resolution is known, but the work has not been done. OS-AIX
Projects
Development

No branches or pull requests

8 participants