-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
electrs exits with code 135 #226
Comments
Oh and my Docker image is built from the included Dockerfile in this repo at the current tip of master |
Confirmed the issue definitely isn't root by setting correct file permissions and running electrs as non-root. I still get the same issue.
I also noticed this
|
I believe 135 means killed by a signal (didn't find the code in electrs source). I guess this comes from rocksdb (crashes right after log mentioning opening it). I think there were some issues with rocksdb on RPi which you seem to be using.
|
Thanks @Kixunil! I'll try and get a backtrace for you. |
Doesn't seem to be much there:
|
Anyway, I just realized it's likely that it won't contain enough information for debugging because of release mode. You need to remove At least we have confirmation it's a signal. :) |
My bad it was a typo, I missed the "c" in
I'll rebuild the Docker image wit those flag changes and try again. Do I need to create a new coredump with the new image or can I just |
You will definitely need a new coredump. |
Hmmn, this is very odd. With the following change: diff --git a/Dockerfile b/Dockerfile
index 228b4f1..b3798be 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -10,8 +10,8 @@ USER user
WORKDIR /home/user
COPY ./ /home/user
-RUN cargo build --release
-RUN cargo install --path .
+RUN cargo build
+RUN cargo install --debug --path .
# Electrum RPC
EXPOSE 50001 I can build the Docker image and attempt to start the sync with the exact same command as before and it works without crashing. I'll let it sync and then try rebuilding without those flag changes and see if it's still stable. 🤷♂️ |
Might be a bug in compiler miscompiling with optimizations. Would be really helpful to figure out which (Rust or C) is it. |
I'll post back with the results after the initial index has finished, it's at about 40% atm.
By this do you mean whether it's electrs or RocksDB? |
Yes. |
I can confirm that the issue doesn't occur in non-debug mode. The full sync has completed in debug mode. Starting the release binary and pointing to the db dir indexed by the debug binary doesn't work, it fails with the exact same error I reported in the initial comment when trying to start a fresh sync. What else can I do to help you figure out if the error is happening in Rust or C? |
Damn, that sucks. It's also possible to have release build with debug symbols. You need to add I also noticed there's Other ideas:
Maybe I should make my idea clear: if any options above help, it is probably a bug in Note that this is all an optimistic heuristic: unfortunately if there's UB anywhere, it could appear as an entirely different problem somewhere else and it could be a nightmare to find. (I have experienced such issue in some C code a few years ago.) |
Ok, testing diff --git a/Cargo.toml b/Cargo.toml
index a5f2ca5..c18ea1c 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -17,6 +17,7 @@ spec = "config_spec.toml"
[profile.release]
lto = true
+debug = true
[features]
latest_rust = [] # use latest Rust features (otherwise, support Rust 1.34) |
If all else fails, maybe we could try sanitizers. |
I can get some useful info out of
|
Testing again with: diff --git a/Cargo.toml b/Cargo.toml
index a5f2ca5..f570efb 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -16,7 +16,8 @@ build = "build.rs"
spec = "config_spec.toml"
[profile.release]
-lto = true
+lto = false
+debug = true
[features]
latest_rust = [] # use latest Rust features (otherwise, support Rust 1.34) to see if |
Should I add these options in addition to or instead of |
Which version of Raspberry Pi do you use? If it's RPi 3, then I think I know what's happening: The line it crashes on is attempting to initialize 64-bit integer, which on ARM requires to sit on an address that's multiple of 8 bytes (8B == 64b), but the allocator returns pointers aligned to 4B (32b). Access with invalid alignment produces In ideal world, the allocator would be capable of returning 64-bit aligned pointers even on 32-bit but I don't know if we can do anything about it. So the next best thing seems to be adding I think the next approach should be trying to patch the I need to do some other stuff right now, will look at patching later. In the meantime, please let me know if it's really RPi 3 (or lower; RPi 4 would be really strange) and if you have the time, familiarize yourself with how to replace cargo dependencies (Google, I know it's possible, jut don't remember how exactly), because you will need to do it in order to test my patch of |
It's a Pi 4 🤔
I'm only really experienced with high level languages. I wish I could help more with this but I really wouldn't know where to start digging into some Rust/C stuff to fix low level compilation/architecture type bugs. I'm sure I can manage to swap out the Cargo dep though. |
Ah, according to a Hackaday article Raspbian is 32 bit even on Pi 4. That makes even more sense. 32 bit CPU requiring 64-bit-aligned access sounded a bit strange to me. So let's proceed with a patch to This should be a very high-level patch - just adding a command line option in the above mentioned scenario. BTW, I think Rust is pretty nice even for high level people (modern features like sum types, iterators, async/await), if you ever want to give it a try. :) |
There's an issue with compiling C(++) code on 64-bit CPUs running 32-bit OSes, which have 32-bit-aligned allocators. The resulting binary will use aligned accesses in release mode. See romanz/electrs#226 for an example of such issue. This change adds default flags setting alignment to whatever the pointer alignment is. I realize the limitations of this change such as: * Works in Cargo build scripts only (the major use case for `cc` crate) * Is unnecessary if allocator actually supports 64 bit alingment * Is incorrect if allocator supports smaller alignment than what's pointer width. However, resolving this issue requires either matching on all possible triples, which I don't have the time to implement or somehow fetching the configuration for given triple, which I don't know how to do. This is also an experiment to see if it even helps at all.
So I wrote the patch, please note the Also, maybe a good idea to run |
I don't think it is. This is |
Interestingly, I just quickly tried updating the base Rust image and that seems to be working. diff --git a/Dockerfile b/Dockerfile
index 228b4f1..3518a80 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -1,4 +1,4 @@
-FROM rust:1.34.0-slim
+FROM rust:1.42.0-slim
RUN apt-get update
RUN apt-get install -y clang cmake Building the Docker image with just that that change applied to master is working without crashes. |
🤔 that might mean my analysis was incorrect or new version of Rust somehow affects the allocator or something. Will have to think about it. Would you be still willing to try my patch, so we can understand the situation better? |
Yeah, of course! Would it make more sense to checkout the I tried just checking out
I ran
Note I haven't even pointed it at your Is this expected or is this something that needs fixing before proceeding to test your |
You also need I understand now, you didn't just upgrade Rust, you also upgraded the whole underlying OS from stretch to buster, which probably fixed the bug somewhere in the C++ toolchain. Fixing the bug in |
Ahh, you're right!
That's very confusing. They offer both stretch and buster variants on Docker hub but if you don't specify it defaults to different Debian versions when you specify different Rust versions. If I submit a PR to update the base Rust Docker image which seems to resolve this issue, do you want me to pin the Debian version too so this is fixed going forwards? We can pin the docker image to Of course I'm still happy to continue testing |
AFAIK @romanz had a good reason to keep Rust 1.34 - due to it being the latest supported Rust version in Debian Buster. But maybe if we could change the underlying OS to Buster while keeping 1.34 that might work. |
I installed those other packages:
But still got the same error as before:
Strange because stdarg.h does exist:
|
Damn, I swear I saw this problem before and it was something trivial like missing package. |
Ah, I think it was |
Oh also:
There is no official Buster image for 1.34, only stretch. https://hub.docker.com/_/rust?tab=tags&page=1&name=1.34 So if we need to stick to Rust 1.34 then we'll need to fix the |
Thanks, late here now but I'll try |
WTF. Could we perhaps depend on pure Debian buster (without Rust) and just do |
|
Yeah definitely I'll see if that works. |
Manually adding |
Ah ok I got it! Just This successfully builds and run tests against a fresh Rust Docker image:
I moved this over to my workstation because it was taking forever but I'll try this on the Pi now and see if it fails and if dropping in your |
I believe you only need to add: [patch.crates-io]
cc = { git = "https://github.com/Kixunil/cc-rs", branch = "fix-alignment" } to |
When updating the lockfile I was getting:
So I had to update the version number in the patch branch to 1.0.41 for the https://github.com/lukechilds/cc-rs/tree/fix-alignment That successfully patches the
|
🤔 going to check what's the problem with gcc, could you try setting |
With
|
🤦 this is terrible. The compilers document those arguments, yet don't support them. Maybe because they are too old? You could check the versions... But honestly, I feel quite demotivated right now. Looks like the most robust solution is to upgrade the OS anyway. I'd suggest:
If you're interested in knowing more precisely who to blame, you might try Thanks a lot for your willingness to go beyond the necessary steps, helping me understand the situation better! I suggest closing this after Dockerfile and/or documentation are updated. |
Many thanks for reporting and debugging this issue! |
@romanz oops, just seen you already updated the issue after opening my PR and seeing there's a merge conflict. I made the commit late last night and didn't have time to open a PR. @Kixunil indicated there was a reason you wanted to keep Rust 1.34.0:
Is that the case? If so I made a Dockerfile that is based on Debian Buster and then installs Rust 1.34.0 with an easy way to change the Rust version. #234 If you'd rather update to 1.42.0 than obviously just ignore that PR. However it might be worth locking to a specific Debian version, e.g |
Resolved via #235 - thanks for the help :) |
This issue should be closed, right? |
Yeah. |
I have the following Docker Compose setup:
Changed to run as root user because
lukechilds/bitcoind
currently runs as root soelectrs-app
needs to be root to read.bitcoin/.cookie
. I know I should fix that but it's fine for just playing around.Anyway, I don't think it should be root that's causing issues, I'm sure (I think?) this exact config has worked before, but for some reason when I try to run it now I get:
Is there any way to get any more debug to see why it's failing? Does code 135 mean something specific?
Thanks!
The text was updated successfully, but these errors were encountered: