-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mesa store path is haunted #199
Comments
Copied my report over from there: I ran into this as well, without running Lix anywhere (yet) -
I cannot confirm 100% it did indeed get built on this machine, or another aarch64 machines, as I have a bunch of remote builders configured, but it definitely doesn't seem Lix-specific. |
Here are two more manifestations with CppNix (not 100% sure which version) from a few months ago: #156 (along with one corrupted file) Is the bad region similar? I try to bang on Mesa a little bit after each release but have not managed to personally replicate the problem on my 64GB M1 Max with ext4. I have had one or two non-deterministic builds but haven't captured a good and a bad build to compare because I didn't realize the necessity at the time. Help tracking/correlating this would be greatly appreciated. I don't think I've heard of it happening with Mesas labeled as 23, but it has evidently persisted across a couple Asahi patch releases. |
Here's my cursed store path: v8z97d2vgyc1zn5bh5mwmywk5dvsarzs-mesa-24.1.0-drivers.tar.gz That's been with 4fac534 and nixpkgs 2ec060b94ebd81598603bb5ea49455e255928f9c. Build log: |
fyi our one bad copy has the zeroed region aligned with a section (but not aligned to a page), if that's helpful. you can throw yours in ghidra and check if it's the same, perhaps? repeating what i said on the other thread: this will be likely possible to consistently reproduce if we have a build directory from a bad copy of mesa with a verbose build log. you can repro with nix build --rebuild in a loop vs a good copy with added load with |
I was hoping someone is gonna do the correlation with the other data for me 😄 not sure I'll get to debug this too much more in the next few days. |
fwiw, 00590000 through 00682000 are zeroed, and |
I think I'm having this same problem on my musl system. In that case, Weston just segfaults — presumably musl's dynamic linker is less resilient than this than glibc? Working mesa:
Bad mesa:
More info: Nix 2.22 I'm building on NixOS on Apple Silicon, but this is a VM image — it's running in QEMU and the VM system doesn't use nixos-apple-silicon at all. I'm also not using the standard asahi kernel on the host — I'm running kvm-arm64/nv-6.8-nv2-only from https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/. Store path too big to upload to GitHub. |
hi @marcan, this may interest you by this point, given we have multiple people hitting it. |
@alyssais Just to be clear, the VM system is running on a machine which uses NixOS on Apple Silicon, but with a custom kernel? Are you building the Asahi Mesa or the standard nixpkgs one? The VM uses the standard NixOS kernel? |
Host machine uses NixOS on Apple Silicon with the aforementioned custom kernel, which is based on an older version of the Asahi kernel. VM uses the standard NixOS kernel with some config modifications, and standard mesa. Asahi mesa is not involved in the system at all — I just use simpledrm on the host. Edit: I'm also using a fairly standard NixOS kernel config on the host, as opposed to the custom one from nixos-apple-silicon. |
puck thinks this might be a patchelf bug which would explain it only hitting mesa and on NixOS. |
so, my current working theory is that this is not patchelf, but a repeat of a previous issue; tho i'm not entirely sure why it doesn't affect x86_64, as it should have the same bug. Since NixOS/nixpkgs#207101, This issue is exacerbated by the fact that I believe that moving the symlink deduping logic from |
I've been pointed at https://github.com/void-linux/void-packages/blob/master/srcpkgs/mesa/patches/megadriver-symlinks.patch - which is likely to be a better solution here; just patching the megadriver installer to symlink, rather than hardlink. |
It looks like Mesa uses hard links for a reason — to avoid installing the megadriver under its original non-driver-specific name, and I think it makes sense not to change that, so based on my current understanding I'd prefer moving our symlinkification to preFixup rather than applying Void's patch. @dcbaker do you have any thoughts here, ooc? |
While we're waiting for agreement on how to proceed to fix this, does anyone have instructions on how someone can work around this if they run into it? I believe I ran into this last night, and my nix-fu is not strong enough to know how to force rebuliding mesa... (but I'm eager to learn!) |
I think the fix here is to do something (this is evil btw) like:
|
I can not recommend doing anything with Instead, I suggest adding an overlay that makes a meaningless change to the derivation (but changes the derivation hash):
|
Wish I'd tried that approach first lol
I was attempting a few things with that flag and, totally borked my install! Luckily booting a rescue image is easy with NixOS and it is fixed now. I wiped my whole nix store and started fresh, and mesa built properly this time.
On Thu, May 23, 2024, at 6:08 PM, Yureka wrote:
I can not recommend doing anything with `--ignore-liveness`, it is too easy to fuck up your entire system with that.
Instead, I suggest adding an overlay that makes a meaningless change to the derivation (but changes the derivation hash):
`nixpkgs.overlays = [
(final: prev: {
mesa-asahi-edge = prev.mesa-asahi-edge.overrideAttrs (oldAttrs: {
src = lib.cleanSource oldAttrs.src;
});
})
];
`
…
—
Reply to this email directly, view it on GitHub <#199 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADSRCH66BFIVVBRQQJHNFLZD2AIZAVCNFSM6AAAAABHUG6TBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRYGI2TONRQGM>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Has an upstream nixpkgs fix to rearrange the mesa derivation been filed? Can the strip hook be taught to ignore symlinks and only access unique inodes? |
Some late night bash(1)ing to solve the latter: NixOS/nixpkgs#314175 Will actually test over the weekend that it works with nixpkgs; I still haven't managed to replicate the haunting on my machine. |
@lf- |
Any 'repair' related commands only do something if the hash recorded in the nix store db does not match the path contents on disk. But in this case the path is not 'corrupted' in the nix-store sense, as the disk contents match the nix store db. |
A fix for the underlying issue has been merged into nixpkgs and will take a week or two to make it to a release here. I am encouraged to wait for that instead of attempting to modify our Mesa build as the rate of occurrence does not seem too high. I will keep track and close this issue once that release happens. Thanks very much to all again for debugging. |
Huh, I could have sworn it remade the path unconditionally, but apparently not. Sorry for the noise. |
I was having this issue on a fresh install with the latest ISO. I tried the older 2024-04-20 release and all is well (at least after configuring |
The fixes have been merged to nixpks, but it'll take a while to land. Check https://nixpk.gs/pr-tracker.html?pr=314175 and https://nixpk.gs/pr-tracker.html?pr=314541 for when the fix ends up in the unstable and 24.05 channels respectively. |
unstable is there already |
This fix is in unstable and also the latest release. Stable will be another week or two, but I'm considering this fixed. |
@alyssais sorry for missing this, I was on a github hiatus. The hardlinks are basically about space savings, and there was this thought that back in the day that a distro might want to update a single driver if that meant that they wouldn't have to update the entire mesa package (something that in reality only Debian could do). |
@dcbaker so is there any reason for them not to be symlinks to a megadriver.so? |
Now the third person describing an issue where after a rebuild, graphics acceleration does not work. After debugging it turns out the mesa driver has some regions zeroed out.
This was originally reported in the Lix issue tracker since the first two known cases were with Lix, however a third case has appeared which was running CppNix.
A hypothesis is that high load/contention on the builder would increase the likelihood of this issue manifesting. When rebuilding an equivalent mesa derivation afterwards it always turned out fine. nixos-apple-silicon users will usually build the kernel and mesa at the same time. Currently the fault has not been isolated. It could be in the kernel, in the toolchain, in the mesa build system...
https://git.lix.systems/lix-project/lix/issues/248
The text was updated successfully, but these errors were encountered: