Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: enable multicore-sdr by default; allow env var to disable #182

Merged
merged 3 commits into from
Jun 14, 2021

Conversation

cryptonemo
Copy link
Contributor

No description provided.

@cryptonemo
Copy link
Contributor Author

cryptonemo commented Jun 11, 2021

Note, this is a bad title as it still requires FIL_PROOFS_USE_MULTICORE_SDR=1 to be set at runtime. It just means that you have the option to use this by default.

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow this makes the Go linker not link to hwloc, using prebuilt binaries I'm getting

go build -ldflags="-X=github.com/filecoin-project/lotus/build.CurrentCommit=+git.3b681c84d" -o lotus ./cmd/lotus
# github.com/filecoin-project/lotus/cmd/lotus
/usr/lib/go/pkg/tool/linux_amd64/link: running gcc failed: exit status 1
/bin/ld: /home/magik6k/github.com/filecoin-project/go-lotus/extern/filecoin-ffi/libfilcrypto.a(hwloc-1ab47d5704e4f2bf.hwloc.dkhqznxr-cgu.0.rcgu.o): in function `core::ptr::drop_in_place<hwloc::bitmap::Bitmap>':
hwloc.dkhqznxr-cgu.0:(.text._ZN4core3ptr42drop_in_place$LT$hwloc..bitmap..Bitmap$GT$17h50092dae1f42cf1fE+0xb): undefined reference to `hwloc_bitmap_free'
/bin/ld: /home/magik6k/github.com/filecoin-project/go-lotus/extern/filecoin-ffi/libfilcrypto.a(hwloc-1ab47d5704e4f2bf.hwloc.dkhqznxr-cgu.0.rcgu.o): in function `hwloc::Topology::new':
hwloc.dkhqznxr-cgu.0:(.text._ZN5hwloc8Topology3new17h1b1054795ca46179E+0x11): undefined reference to `hwloc_topology_init'
/bin/ld: hwloc.dkhqznxr-cgu.0:(.text._ZN5hwloc8Topology3new17h1b1054795ca46179E+0x1b): undefined reference to `hwloc_topology_load'
/bin/ld: hwloc.dkhqznxr-cgu.0:(.text._ZN5hwloc8Topology3new17h1b1054795ca46179E+0x25): undefined reference to `hwloc_topology_get_support'
/bin/ld: /home/magik6k/github.com/filecoin-project/go-lotus/extern/filecoin-ffi/libfilcrypto.a(hwloc-1ab47d5704e4f2bf.hwloc.dkhqznxr-cgu.0.rcgu.o): in function `hwloc::Topology::with_flags':
hwloc.dkhqznxr-cgu.0:(.text._ZN5hwloc8Topology10with_flags17h7d088a384b896636E+0xb5): undefined reference to `hwloc_topology_init'
/bin/ld: hwloc.dkhqznxr-cgu.0:(.text._ZN5hwloc8Topology10with_flags17h7d088a384b896636E+0xc2): undefined reference to `hwloc_topology_set_flags'
/bin/ld: hwloc.dkhqznxr-cgu.0:(.text._ZN5hwloc8Topology10with_flags17h7d088a384b896636E+0xcc): undefined reference to `hwloc_topology_load'
/bin/ld: hwloc.dkhqznxr-cgu.0:(.text._ZN5hwloc8Topology10with_flags17h7d088a384b896636E+0xd6): undefined reference to `hwloc_topology_get_support'

...

@magik6k
Copy link
Contributor

magik6k commented Jun 11, 2021

(Same happens when built with FFI_BUILD_FROM_SOURCE=1)

@vmx
Copy link
Contributor

vmx commented Jun 11, 2021

That could be issue then. As (at least the rust-fil-proofs README says) multicore won't work with hwlock. I'll have a closer look.

@vmx
Copy link
Contributor

vmx commented Jun 11, 2021

@magik6k it seems to be related to your environment. It passed on CI, it compiles on my local machine, a remote machine and also for some miners.

If anyone wants to try to reproduce. Checkout https://github.com/filecoin-project/lotus/ and then:

git checkout release/v1.10.0
cd extern/filecoin-ffi
git checkout multicore-sdr-fix
cd ../..
git commit "wip" extern/filecoin-ffi
FFI_BUILD_FROM_SOURCE=1 RUSTFLAGS='-C target-cpu=native' FFI_USE_BLST=1 make clean deps lotus

@vmx
Copy link
Contributor

vmx commented Jun 11, 2021

It's also reproducible for @magik6k with:

git clone https://github.com/vmx/lotus.git --branch multicore-sdr-fix
cd lotus
FFI_BUILD_FROM_SOURCE=1 RUSTFLAGS='-C target-cpu=native' FFI_USE_BLST=1 make clean deps lotus

@vmx
Copy link
Contributor

vmx commented Jun 11, 2021

@magik6k found out that there is a difference in linking. Master statically links, this PR dynamically links hwloc.

Update: that's actually not true. master is missing hwloc because it's not linked at all. When I try v1.10-rc1, I get the same ldd output as for this PR:

Lotus v1.10-rc1:

$ ldd lotus
	linux-vdso.so.1 (0x00007fffc61c4000)
	libhwloc.so.15 => /usr/lib/libhwloc.so.15 (0x00007f3b07c57000)
	libOpenCL.so.1 => /opt/cuda/lib64/libOpenCL.so.1 (0x00007f3b07a50000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f3b07a36000)
	libutil.so.1 => /usr/lib/libutil.so.1 (0x00007f3b07a31000)
	librt.so.1 => /usr/lib/librt.so.1 (0x00007f3b07a26000)
	libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f3b07a05000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007f3b078be000)
	libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f3b078b7000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007f3b076ea000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f3b07ccc000)
	libudev.so.1 => /usr/lib/libudev.so.1 (0x00007f3b076c2000)

Lotus with this PR applied:

$ ldd lotus	linux-vdso.so.1 (0x00007ffca59a1000)
	libhwloc.so.15 => /usr/lib/libhwloc.so.15 (0x00007fdb0199e000)
	libOpenCL.so.1 => /opt/cuda/lib64/libOpenCL.so.1 (0x00007fdb01797000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007fdb0177d000)
	libutil.so.1 => /usr/lib/libutil.so.1 (0x00007fdb01778000)
	librt.so.1 => /usr/lib/librt.so.1 (0x00007fdb0176d000)
	libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007fdb0174c000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007fdb01605000)
	libdl.so.2 => /usr/lib/libdl.so.2 (0x00007fdb015fe000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007fdb01431000)
	/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fdb01a13000)
	libudev.so.1 => /usr/lib/libudev.so.1 (0x00007fdb01409000)

@vmx
Copy link
Contributor

vmx commented Jun 11, 2021

I was able to reproduce once, but it was in a checkout I constantly switched to other branches. I cannot reproduce it on clean checkouts.

Hence I'd like to ask for help. I've created this one-liner. Please run it and see if it fails with the error message above.

git clone https://github.com/filecoin-project/lotus --recursive && cd lotus/extern/filecoin-ffi/ && git checkout origin/multicore-sdr-fix && cd ../../ && git commit -m 'wip' extern/filecoin-ffi/ && FFI_BUILD_FROM_SOURCE=1 RUSTFLAGS='-C target-cpu=native' FFI_USE_BLST=1 make clean deps lotus

If it works and ldd ./lotus should also show hwloc as dynamically linked.

@vmx
Copy link
Contributor

vmx commented Jun 12, 2021

@magik6k: I can reproduce it. It happens if you build current master first and try this PR:

git clone https://github.com/filecoin-project/lotus lotus_make_first --recursive && cd lotus_make_first && FFI_BUILD_FROM_SOURCE=1 RUSTFLAGS='-C target-cpu=native' FFI_USE_BLST=1 make clean deps lotus && cd extern/filecoin-ffi/ && git checkout origin/multicore-sdr-fix && cd ../../ && git commit -m 'wip' extern/filecoin-ffi/ && FFI_BUILD_FROM_SOURCE=1 RUSTFLAGS='-C target-cpu=native' FFI_USE_BLST=1 make clean deps lotus

This means that this PR is good on a fresh checkout. It also means that the clean target doesn't clean up everything properly. But that's a separate issue then. I propose to merge this PR as soon as possible.

@vmx
Copy link
Contributor

vmx commented Jun 12, 2021

I looked if there's a difference after doing a git clean -xfdxfd in both cases (building master first and not building anything first). I can't find a difference. Could it be that there's a difference in some directory Go writes into, which is not the lotus checkout directory?

I think investigating in why make clean doesn't lead to a clean state as a fresh checkout is something people that know Go need to figure out.

@Kubuxu
Copy link

Kubuxu commented Jun 14, 2021

I think it is caused by golang/go#24355
The fix may be to run go clean -cache -testcache . inside the filecoin-ffi on clean/on download of new ffi.

@Kubuxu
Copy link

Kubuxu commented Jun 14, 2021

In essence, if headers or archive change but the .go files do not, caches might not dropped, it probably is non-issue with archive as Go links the updated archive but it might not re-look for dependencies.

I'm testing this hypothesis right now.

Possibly works around golang/go#24355

Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
@Kubuxu
Copy link

Kubuxu commented Jun 14, 2021

The reproduction that @vmx produced does not reproduce anymore with 225142b

@magik6k
Copy link
Contributor

magik6k commented Jun 14, 2021

Can confirm this works for me now

Copy link
Contributor

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified that:

  • It builds for me
  • Multicore SDR works in pond

@magik6k magik6k merged commit 57a91e8 into master Jun 14, 2021
@cryptonemo cryptonemo deleted the multicore-sdr-fix branch June 14, 2021 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants