Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.30 -> 1.31 dylib late-binding regression with GNU binutils 2.28 or older. #61539

Closed
eddyb opened this issue Jun 5, 2019 · 20 comments
Closed
Assignees
Labels
O-linux Operating system: Linux P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@eddyb
Copy link
Member

eddyb commented Jun 5, 2019

Originally reported as #60593, and appears to be caused by #54592, this is my reduction:

  • common.rs
pub static STATIC: u8 = 0;
pub fn get() -> *const u8 { &STATIC }
  • plugin.rs
extern crate common;
#[no_mangle]
pub fn run() {
    println!("plugin: {:p}", common::get as *const ());
    assert_eq!(common::get(), &common::STATIC as *const _);
}
  • driver.rs
extern crate common;
fn main() {
    println!("driver: {:p}", common::get as *const ());
    // You can comment this line to make it work.
    assert_eq!(common::get(), &common::STATIC as *const _);
    unsafe {
        extern "C" {
            fn dlopen(filename: *const u8, flag: i32) -> *mut u8;
            fn dlsym(handle: *mut u8, symbol: *const u8) -> *const u8;
        }
        std::mem::transmute::<*const u8, fn()>(dlsym(
            dlopen(b"./libplugin.so\0".as_ptr(), 1),
            b"run\0".as_ptr(),
        ))()
    }
}

How to run it:

RELEASE=nightly-2018-10-11
rustup toolchain add $RELEASE

echo 'System toolchain versions:'
ldd --version | head -n1 | sed 's/^ldd //'
ld.bfd --version | head -n1
ld.gold --version | head -n1
echo

rustc +$RELEASE common.rs --crate-type=lib
# You can change this from dylib to cdylib to make it work.
rustc +$RELEASE plugin.rs --crate-type=dylib -L.
# You can uncomment -Z plt=yes to make it work.
rustc +$RELEASE driver.rs -L. # -Z plt=yes
./driver

I was able to try this out with NixOS 18.03 and 18.09 releases, via:

NIX_PATH=nixpkgs=https://github.com/NixOS/nixpkgs-channels/archive/nixos-18.03.tar.gz \
    nix-shell -p stdenv --run ./test.sh
NIX_PATH=nixpkgs=https://github.com/NixOS/nixpkgs-channels/archive/nixos-18.09.tar.gz \
    nix-shell -p stdenv --run ./test.sh

Example of failure (using a NixOS 18.03 toolchain):

info: syncing channel updates for 'nightly-2018-10-12-x86_64-unknown-linux-gnu'

  nightly-2018-10-12-x86_64-unknown-linux-gnu unchanged - rustc 1.31.0-nightly (77af31408 2018-10-11)

System toolchain versions:
(GNU libc) 2.26
GNU ld (GNU Binutils) 2.28.1
GNU gold (GNU Binutils 2.28.1) 1.14

driver: 0x55d244a55820
plugin: 0x55d244a55820
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `0x55d244a9e7ac`,
 right: `0x7f87a69b3f19`', plugin.rs:5:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Example of success (by switching to NixOS 18.09):

info: syncing channel updates for 'nightly-2018-10-12-x86_64-unknown-linux-gnu'

  nightly-2018-10-12-x86_64-unknown-linux-gnu unchanged - rustc 1.31.0-nightly (77af31408 2018-10-11)

System toolchain versions:
(GNU libc) 2.27
GNU ld (GNU Binutils) 2.30
GNU gold (GNU Binutils 2.30) 1.15

driver: 0x5649e7dbf4f0
plugin: 0x7fca6df40930

Other ways to get it to succeed:

  • use a Rust version before Support for disabling PLT for better function call performance #54592 (e.g. RELEASE=nightly-2018-10-11)
  • pass -Z plt=yes when compiling driver
  • change the crate type of plugin to cdylib (instead of dylib)
    • presumably common::get is private in that case?
    • not a general fix AFAIK, but I'm not sure how to repro with cdylib
  • don't call common::get in driver
    • common::get won't get looked up early, which means driver's copy won't override plugin's (within the context of affected toolchains)
  • set RTLD_DEEPBIND when calling dlopen
    • common::get from driver is ignored when looking up the use from plugin

cc @alexcrichton @rust-lang/compiler

@eddyb eddyb added I-nominated regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2019
@eddyb eddyb changed the title 1.30 -> 1.31 regression with less recent Linux distro toolchains. 1.30 -> 1.31 dylib late-binding regression with less recent Linux distro toolchains. Jun 5, 2019
@eddyb
Copy link
Member Author

eddyb commented Jun 5, 2019

Looks like relro support was added in GNU binutils 2.28, so maybe it had some bugs?
There isn't any 2.29 binutils build on the official NixOS build servers so I can't easily test that version.

@nagisa
Copy link
Member

nagisa commented Jun 5, 2019

How confident are we that PLT entirely negates this issue other than just accidentally preventing it from triggering in this particular case?

To me it seems that exactly what happens with PLT here is prevention and the core issue still remains regardless of whether PLT is enabled or not.

Not opposed to flipping back the default for now, though.

@eddyb
Copy link
Member Author

eddyb commented Jun 6, 2019

I agree, I have no idea what the actual bug is yet, just what affects this specific reduction.
Note that in all cases that work, common::get is not deduplicated, as opposed to both common::get and common::STATIC being deduplicated.

What I suspect is happening with -Z plt=yes is bypassing a buggy linker feature.
But I'd have to bisect binutils to prove that. Which I might still do, I just hope it's easier than glibc.

@pnkfelix pnkfelix added the O-linux Operating system: Linux label Jun 6, 2019
@nagisa
Copy link
Member

nagisa commented Jun 11, 2019

@eddyb an easy way to (dis-)prove linker fault is to attempt linking with lld.

@pnkfelix
Copy link
Member

triage: we need to discuss this at today's meeting before I'll be comfortable assigning a priority to it.

@pnkfelix
Copy link
Member

discussed in triage meeting. P-high. Removing nomination tag. Assigning to @nagisa and myself.

@pnkfelix pnkfelix added P-high High priority and removed I-nominated labels Jun 13, 2019
@nagisa
Copy link
Member

nagisa commented Jun 30, 2019

Linking with lld works. Using clang instead of gcc as a ld driver also fails. This does indeed point all the fingers to ld. I’ll try to close down on the ld version now.

@nagisa
Copy link
Member

nagisa commented Jun 30, 2019

This issue is no longer present with binutils 2.29.0.

@nagisa
Copy link
Member

nagisa commented Jun 30, 2019

@eddyb eddyb changed the title 1.30 -> 1.31 dylib late-binding regression with less recent Linux distro toolchains. 1.30 -> 1.31 dylib late-binding regression with GNU binutils 2.28. Jul 1, 2019
@eddyb
Copy link
Member Author

eddyb commented Jul 1, 2019

@nagisa @alexcrichton Is there precedent for checking the version of the linker to work around bugs?
I also wonder what happens with GNU binutils 2.27 and earlier, since I don't think it supports relro, do we check for that somehow, and fallback, or are we asking for relro in a way that gets ignored?

@nagisa
Copy link
Member

nagisa commented Jul 1, 2019

At least 2.27 also fails so not relro related. Waiting for 2.26 to build right now.

@nagisa
Copy link
Member

nagisa commented Jul 1, 2019

2.26 and 2.23.1 also fail (although 2.23.1 only prints the addresses and exits with exit code 0 before calling into the plugin), so it is probably safe to say that this issue is present in all relevant versions of the GNU toolchain before 2.29.

@eddyb
Copy link
Member Author

eddyb commented Jul 1, 2019

@nagisa Oh, right, I forgot, the bug was fixed after 2.28 but I never tested earlier versions.

@eddyb eddyb changed the title 1.30 -> 1.31 dylib late-binding regression with GNU binutils 2.28. 1.30 -> 1.31 dylib late-binding regression with GNU binutils 2.28 or older. Jul 1, 2019
@alexcrichton
Copy link
Member

We have only one rudimentary check for support of a c compiler flag (-pie I think) but other than that we don't have any precedence for c compiler or linker version checking in rustc itself

@cuviper
Copy link
Member

cuviper commented Jul 1, 2019

I also wonder what happens with GNU binutils 2.27 and earlier, since I don't think it supports relro,

I think 2.27 only added a way to configure the default:

* Add a configure option --enable-relro to decide whether -z relro should
  be enabled in ELF linker by default.  Default to yes for all Linux
  targets except FRV, HPPA, IA64 and MIPS.  Note that -z relro can increase
  disk and memory size.

But -z relro has existed much longer -- I see references way back in ChangeLog-2004.

@pnkfelix
Copy link
Member

pre-triage stray thought: While I worry about trying to change compiler behavior based on a dynamically detected linker version, I would have fewer objections to use such detection just to drive the emission of a diagnostic warning. Where said diagnostic tells people to take one of the aforementioned steps to work around the bug.

nominating for discussion at T-compiler meeting.

@nagisa
Copy link
Member

nagisa commented Jul 18, 2019

This has been discussed briefly during the meeting last week, but no real conclusion was arrived at.

@pnkfelix
Copy link
Member

Some points made during aforementioned meeting that I thought were worth transcribing here:

  • nagisa: "the warning message based on just linker version would lead to false positives more often than not, but I don’t know of any other solution that’s not reenabling plt by default."
  • pnkfelix: " is there particular pattern of codegen we could associate the diagnostic with, to try to lower the false positive rate?"
    • nagisa: "Maybe? Ultimately to trigger that particular bug a dynamic library needs to be loaded at runtime (via e.g. dlopen)"
  • nagisa: "Most relevant case where this almost always happen are compiler plugins"

@pnkfelix
Copy link
Member

I haven't taken any time to look into this since I last commented here (about two months ago). Its tempting to downgrade this to P-medium with justification "this is a bug in ld that is fixed in a later version." But I don't want to do that without at least surveying the current popular GNU-based distributions.

@pnkfelix
Copy link
Member

pnkfelix commented Jan 16, 2020

I tried landing a lint to detect this case (PR #66839) but no one seems terribly enthused about that approach, at least not at this time. (Maybe it would have gotten more traction sooner after @eddyb found the bug.)

In that PR, we did get a survey of what version of ld is used in popular GNU distributions, taken from https://repology.org/project/binutils/versions

So, I will just close this issue as a bug in GNU ld that is fixed in later versions and thus not something we are going to workaround.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-linux Operating system: Linux P-high High priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants