-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
modprobe splat crashes with error: killed #699
Comments
Is this running on the physical hardware or is it in a virtual machine? |
It's the physical hardware, tried memtest86+ for a couple of hours |
been working ok before, except for spontanious crashes now and then...which is why I tried to switch zfs-version... |
Upgraded to Ubuntu 11.10 with 3.0.0-17-generic, and everything worked out great then. |
ZFSOnLinux still supports 2.6.32, so this is probably a bug that should be fixed. |
@dajhorn ashnohoe on freenode reported that this is a regression in Ubuntu mountall package. @zOOge Would you mind reopening this? Only you and @behlendorf have the authority to reopen this issue. |
How did How did this computer get 12GB of memory? These motherboards have problems with unmatched timings after 8GB. The first and second channel should be balanced. Double-check that the memory clock is set correctly according to the product manual because some motherboards in this family also require downclocking for stability after 8GB. Upgrade the BIOS. The |
The reason it's 12G is that I happend to have 2 extra modules in a drawer, of the same type. |
@dajhorn My ubuntu knowledge is minimal, so I would not be a good proxy for dialgoue on this. I suggest connecting to freenode and talking to ashnohoe in #zfsonlinux. This occurs on his system, so it would be best to talk to him. |
@zOOge if you want to pursue a bug report for the First, isolate for known hardware bugs:
Second, edit the Third, attach the entire |
I'm affected also to this "bug" since upgrading from linux-2.6.32-40 to linux-2.6.32-41 and/or from zfsonlinux from 0.6.054 to 0.6.0.56. Though the older 2.6.32.40 kernel still works with 0.6.0.56 -Processor- -Memory- -BIOS- -Version- Current BIOS: On boot the boot process stops due to modeprobe was crashing on zfs. But booting 2.6.32-40 works, but tells that I should modeprobe zfs manually instead. So I guess it could be an issue of mountall (zfs) 0.6.0.56 or something with the newer kernel. Therfor I doubt, that the workaround against K8 errata #93 doesn't work or the BIOS is the culprit. dpkg.log http://paste.ubuntu.com/952878/ [ 486.934742] ******* Your BIOS seems to not contain a fix for K8 errata #93 |
Installed Ubuntu 10.04 LTS and ran: |
@SADESA The Jetway NF99FL-525 is an Intel Atom D525 motherboard. My understanding is that the stock Ubuntu kernel does not have a code path to produce the given dmesg on the purported hardware. This is not a ZoL bug, although ZoL may provoke the bug. If the panic persists after upgrading the BIOS, then remove all memory DIMMs from the secondary memory channel and try again with exactly identically matched DIMMs installed only on the primary memory channel. Or revert and use the older kernel that doesn't try to access the borked memory address. |
This is not a kernel panic, but a modprobe zfs crash, because the computer still works normally, but stops if the error occurs at boot time.. Also I'm curious why on three different hardware it will end up with Code: Bad RIP value. RIP [] 0xffffffffabcddcba with exact the same values... I think this will only happen with linux-2.6.32-41 on ubuntu LTS 10.04... |
Experiencing the same issues with 2.6.32-41 on 10.04 (AMD X2-555 proc in an ASUS M4A88T MB, 16GB ecc). No apparent problems with 2.6.32-40. Sorry for lack of trace info, may have time this weekend. |
Correction, 12GB. Might as well mention this: |
@brsinc: Thanks for isolating the hardware. Two things:
Does this bug happen for anybody using a more recent kernel or a non-Asus motherboard? The LP ticket for the problematic kernel release is: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/888042 This patch was backported into the Ubuntu 2.6.32-41 kernel: http://git.kernel.org/linus/3c4aa91f21f65b7b40bdfb015eacbcb8453ccae2 It removes a safety mechanism for the Asus M3 motherboards that could have been preventing the crash. Perhaps a similar patch was added that affects Asus M4 motherboards. (The AMD 800 and AMD 900 chipsets are mostly the same silicon, except that AMD licensed SLI from Nvidia for the AM3+.) At this point, you are probably stuck because:
I suggest that you revert the kernel or try the |
I am having the same issue on a IBM and an HP, both of which are non-Asus MB I believe. I am working on a new install of 10.04LTS on the HP right now, here are the results contents of DMIDECODE: Handle 0x0003, DMI type 3, 21 bytes Handle 0x0004, DMI type 4, 42 bytes Handle 0x0005, DMI type 7, 19 bytes Handle 0x0006, DMI type 7, 19 bytes Handle 0x0007, DMI type 7, 19 bytes Handle 0x0008, DMI type 8, 9 bytes Handle 0x0009, DMI type 8, 9 bytes Handle 0x000A, DMI type 8, 9 bytes Handle 0x000B, DMI type 8, 9 bytes Handle 0x000C, DMI type 8, 9 bytes Handle 0x000D, DMI type 8, 9 bytes Handle 0x000E, DMI type 8, 9 bytes Handle 0x000F, DMI type 8, 9 bytes Handle 0x0010, DMI type 8, 9 bytes Handle 0x0011, DMI type 8, 9 bytes Handle 0x0012, DMI type 8, 9 bytes Handle 0x0013, DMI type 9, 17 bytes Handle 0x0014, DMI type 9, 17 bytes Handle 0x0015, DMI type 9, 17 bytes Handle 0x0016, DMI type 9, 17 bytes Handle 0x0017, DMI type 11, 5 bytes Handle 0x0018, DMI type 12, 5 bytes Handle 0x0019, DMI type 15, 29 bytes Handle 0x001A, DMI type 16, 15 bytes Handle 0x001B, DMI type 17, 28 bytes Handle 0x001C, DMI type 17, 28 bytes Handle 0x001D, DMI type 17, 28 bytes Handle 0x001E, DMI type 17, 28 bytes Handle 0x001F, DMI type 19, 15 bytes Handle 0x0020, DMI type 20, 19 bytes Handle 0x0021, DMI type 20, 19 bytes Handle 0x0022, DMI type 20, 19 bytes Handle 0x0023, DMI type 20, 19 bytes Handle 0x0024, DMI type 23, 13 bytes Handle 0x0025, DMI type 24, 5 bytes Handle 0x0026, DMI type 25, 9 bytes Handle 0x0027, DMI type 26, 20 bytes Handle 0x0028, DMI type 27, 12 bytes Handle 0x0029, DMI type 28, 20 bytes Handle 0x002A, DMI type 29, 20 bytes Handle 0x002B, DMI type 30, 6 bytes Handle 0x002C, DMI type 32, 20 bytes Handle 0x002D, DMI type 38, 18 bytes Handle 0x002E, DMI type 126, 4 bytes Handle 0x002F, DMI type 208, 5 bytes Handle 0x0030, DMI type 209, 12 bytes Handle 0x0031, DMI type 224, 5 bytes Handle 0x0032, DMI type 225, 12 bytes Handle 0x0033, DMI type 127, 4 bytes And the last few pages of DMESG: |
I just installed 12.04 on the same hardware and it seems to be working, so far... Hope this helps a bit. |
@n9sla: Yes, it does. Now it looks like a dud kernel was published for Ubuntu 10.04 Lucid Lynx. Thanks for posting a better transcript. |
A new kernel comes in. I'll test tomorrow or Friday, if that kernel is affected too. Änderungen für die Versionen: Version 2.6.32-41.89: [Herton R. Krzesinski]
|
zfs 0.6.0.56 and kernel 2.6.32-41.89 with Ubuntu 10.04.4 LTS shows slightly different behavior - modprobe freezes when trying to load zfs.ko. Could be interrupted, but not loading. Trying to load dependency modules by hand found that freezes spl.ko. Anyway, ZoL is unusable in that state. With kernel 2.6.32-40.87 all is working. |
Closing issue, it's my understanding from reading this history that this was caused by a particular dud kernel. |
Could you please reopen the bug? I'm having the same problem with up-to-date Ubuntu 10.04.4 kernel (2.6.32-45-server). First time modprobe gave me the same stacktrace, then it just hanged. |
Ubuntu Server sources are using preempt, which doesn't work with kernel mode zfs. Am 16.12.2012 um 17:27 schrieb Aleksandr Chuklin notifications@github.com:
|
These days preempt kernels work fine. Can you post the full stack from the panic. |
|
Reopening for now. |
Due to I/O buffering the helper may return successfully before the proc handler has a chance to execute. To catch this case wait up to 1 second to verify spl_kallsyms_lookup_name_fn was updated to a non SYMBOL_POISON value. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Issue openzfs/zfs#699 Issue openzfs/zfs#859
Due to I/O buffering the helper may return successfully before the proc handler has a chance to execute. To catch this case wait up to 1 second to verify spl_kallsyms_lookup_name_fn was updated to a non SYMBOL_POISON value. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs/zfs#699 Closes openzfs/zfs#859
@varepsilon Alright, we got it for real this time. I was able to reproduce the issue under Lucid, it's fixed by commit openzfs/spl@034f1b3 which was just merged in to master. |
The newest Fedora packaging rules print warnings for scripts using the /usr/bin/python shebang: *** WARNING: mangling shebang in /usr/src/spl-0.7.0/cmd/splslab/splslab.py from #!/usr/bin/python to #!/usr/bin/python2. This will become an ERROR, fix it manually! Fedora wants all cross compatible scripts to pick python3. Since we don't want our users to have to pick a specific version of python, we exclude our scripts from the RPM build check. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes: openzfs#699 Closes: openzfs#700
I am running Ubuntu 10.04 LTS, 2.6.32 kernel and can't install zfs.
I have previously used the version from https://github.com/zfs-linux/ but figured I would go for packages.
But, when trying to load modules, it just spits out, killed and returns me to prompt.
The compilation works fine, it builds debs as it should.
Have also tried installing from the PPA, with the same result.
I have installed the latest bios and disabled USB Legacy with no change in error message.
The error from dmesg is below.
I am using a Asus M4A88TD-V EVO/USB3 motherbord with a AMD Phenom II X6 1055T CPU, 12G memory.
[ 73.457106] ******* Your BIOS seems to not contain a fix for K8 errata #93
[ 73.457108] ******* Working around it, but it may cause SEGVs or burn power.
[ 73.457109] ******* Please consider a BIOS update.
[ 73.457109] ******* Disabling USB legacy in the BIOS may also help.
[ 73.457396] BUG: unable to handle kernel paging request at ffffffffabcddcba
[ 73.457481] IP: [] 0xffffffffabcddcba
[ 73.457546] PGD 1003067 PUD 1007063 PMD 0
[ 73.457608] Oops: 0010 [#1] SMP
[ 73.457657] last sysfs file: /sys/devices/pci0000:00/0000:00:15.1/0000:06:00.0/irq
[ 73.457750] CPU 1
[ 73.457778] Modules linked in: spl(+) zlib_deflate binfmt_misc ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables kvm_amd kvm nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc snd_hda_codec_atihdmi fbcon tileblit font bitblit softcursor vga16fb vgastate snd_hda_codec_realtek snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_hda_intel snd_seq_midi_event snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss radeon snd_seq bridge stp ttm snd_pcm drm_kms_helper ftdi_sio snd_seq_device usbserial drm i2c_algo_bit snd_timer asus_atk0110 snd edac_core edac_mce_amd soundcore snd_page_alloc i2c_piix4 shpchp xhci lp parport ohci1394 ieee1394 pata_atiixp r8169 mii ahci pata_via
[ 73.458891] Pid: 2753, comm: modprobe Not tainted 2.6.32-41-generic #88-Ubuntu System Product Name
[ 73.458989] RIP: 0010:[] [] 0xffffffffabcddcba
[ 73.459078] RSP: 0018:ffff8802f139de70 EFLAGS: 00010246
[ 73.459144] RAX: 0000000000000000 RBX: ffff8802fe95ec90 RCX: 0000000000000000
[ 73.459221] RDX: ffffffff817c8a48 RSI: 0000000000000286 RDI: ffffffffa05224e4
[ 73.459298] RBP: ffff8802f139de78 R08: 0000000000000000 R09: 0000000000000000
[ 73.459399] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffffa0538000
[ 73.459476] R13: 00000000018e6c50 R14: 0000000000000000 R15: 00000000018e6c88
[ 73.459558] FS: 00007f0d19455700(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
[ 73.459647] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 73.459710] CR2: ffffffffabcddcba CR3: 00000002f507d000 CR4: 00000000000006e0
[ 73.459791] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 73.459868] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 73.459954] Process modprobe (pid: 2753, threadinfo ffff8802f139c000, task ffff8802f12a0000)
[ 73.460044] Stack:
[ 73.460068] ffffffffa0512e86 ffff8802f139df18 ffffffffa0538655 00000000fffffffb
[ 73.460161] <0> 0000000000000000 ffff8802f139dea8 ffffffff810cea9d ffff8802f139dee8
[ 73.460279] <0> ffffffff8154b016 ffffffffa052279f ffffffffa05227a6 ffffffffa0523cb8
[ 73.460387] Call Trace:
[ 73.460427] [] ? spl_kmem_init_kallsyms_lookup+0x16/0x180 [spl]
[ 73.460516] [] spl_init+0x655/0x87b [spl]
[ 73.460581] [] ? tracepoint_module_notify+0x2d/0x40
[ 73.460656] [] ? notifier_call_chain+0x56/0x80
[ 73.460724] [] do_one_initcall+0x3c/0x1a0
[ 73.460788] [] sys_init_module+0xdf/0x260
[ 73.460852] [] system_call_fastpath+0x16/0x1b
[ 73.460918] Code: Bad RIP value.
[ 73.460973] RIP [] 0xffffffffabcddcba
[ 73.464187] RSP
[ 73.464188] CR2: ffffffffabcddcba
[ 73.464189] ---[ end trace 3c5079637320dcf8 ]---
root@stratos:~#
The text was updated successfully, but these errors were encountered: