-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VERIFY3(sa.sa_magic == SA_MAGIC) failed - ZFS 2.1.0/2.1.3, CentOS 8.5 #13144
Comments
We saw this issues, but we do not have encryption enabled on this dataset. Here is the output of
|
The other thing I'd suggest is to report this issue also to wherever Lustre tracks bugs, and reference this bug - since they're reaching out and directly calling things in OpenZFS code, there's a nonzero chance that this particular case can't be hit without Lustre, at which point the question becomes whether Lustre is violating some expectation of using those calls or the implementation is making some additional assumption which is true without Lustre's usage but false there, and fix the behavior wherever it makes the most sense. You could try setting ZFS_DEBUG_MODIFY (0x10, or 16) in zfs_flags, which should make it check the ARC buffers for modification in between changes any time something "legitimately" goes to change them, and see if it notices something breaking earlier. (You could also set 0x2/0x4, ZFS_DEBUG_DBUF_VERIFY / ZFS_DEBUG_DNODE_VERIFY, but those require you compile with --enable-debug to do something useful.) |
@rincebrain. thank you for the suggestion that it could be a Lustre bug, we've already opened the issue in Lustre: https://jira.whamcloud.com/browse/LU-15586 and we'll try to debug the ARC buffers as well. |
If Lustre doesn't use the ZPL, is it weird that we're going off into zpl_get_file_info? @behlendorf |
@PaulZ-98 since Lustre writes its data out in a format which is compatible with the ZPL on-disk format it does leverage a few of the What is strange here is it looks like we're either getting a damaged data buffer passed, or potentially one which wasn't re-initialized. As @rincebrain mentioned, setting
Can you expand on this a bit. Are you able to reproduce this issue with a new pool? |
@behlendorf we finally run the stack, but we had some difficulties with using ZFS debug with lustre. We cannot load
We made an update of the stack hoping this would fix the problem with the module: we bumped ZFS to the official diff --git a/module/zfs/refcount.c b/module/zfs/refcount.c
index 354e021d9..92d641b68 100644
--- a/module/zfs/refcount.c
+++ b/module/zfs/refcount.c
@@ -38,6 +38,14 @@ int reference_history = 3; /* tunable */
static kmem_cache_t *reference_cache;
static kmem_cache_t *reference_history_cache;
+#if defined(_KERNEL)
+EXPORT_SYMBOL(zfs_refcount_add);
+EXPORT_SYMBOL(zfs_refcount_create);
+#endif
+
+
+
+
void
zfs_refcount_init(void)
{ We also set
The panic is the same (except for the object id) whether it's the osd or mdt mount. We also tried to mount the pool natively, using |
Unfortunately, the stack trace here seems to indicate Lustre isn't quite calling the ZFS interfaces correctly. In particular, this call path appears to be missing a hold. That's something we'll need to look in to but it's unlikely to be related to the issue you hit.
Sorry about the build issue. This was fixed in the master branch but never backported to the 2.1.x releases. I've gone ahead and opened a new PR to make sure it's in the next point release. |
@behlendorf, thank you for the informations, we have two questions:
We are planning the next procedures and we are not sure what option will be the the best one:
|
I'd expect you to see this with the 2.0.x releases as well. The issue here is Lustre is running afoul of some long standing correctness checks ZFS is performing only when debugging is enabled. The upstream Lustre+ZFS testing must (reasonably) be using a production instead of debug build of ZFS in their CI testing. This would explain why the panic and debug build issue went unnoticed.
Another option to consider would be to disable this specific debugging check is ZFS. As I mentioned it's always disabled in production builds so there's no harm in skipping it, and it would allow you to run with debugging. You can disable the problematic check with a relatively small change like this. My expectation would be that this should allow you to use a debug build of ZFS as originally intended with Lustre. diff --git a/include/sys/dmu_tx.h b/include/sys/dmu_tx.h
index 71a9ac7ca..a5e781aea 100644
--- a/include/sys/dmu_tx.h
+++ b/include/sys/dmu_tx.h
@@ -162,7 +162,7 @@ void dmu_tx_add_new_object(dmu_tx_t *tx, dnode_t *dn);
void dmu_tx_dirty_buf(dmu_tx_t *tx, struct dmu_buf_impl *db);
void dmu_tx_hold_space(dmu_tx_t *tx, uint64_t space);
-#ifdef ZFS_DEBUG
+#ifdef ZFS_DEBUG_DX_TX_DIRTY_BUF
#define DMU_TX_DIRTY_BUF(tx, db) dmu_tx_dirty_buf(tx, db)
#else
#define DMU_TX_DIRTY_BUF(tx, db)
diff --git a/module/zfs/dmu_tx.c b/module/zfs/dmu_tx.c
index fe9860066..3e73f3623 100644
--- a/module/zfs/dmu_tx.c
+++ b/module/zfs/dmu_tx.c
@@ -570,7 +570,7 @@ dmu_tx_hold_space(dmu_tx_t *tx, uint64_t space)
}
}
-#ifdef ZFS_DEBUG
+#ifdef ZFS_DEBUG_DX_TX_DIRTY_BUF
void
dmu_tx_dirty_buf(dmu_tx_t *tx, dmu_buf_impl_t *db)
{ If you're then still able to reproduce there issue that would be very helpful. Just for reference, we're running a stack very close this RHEL 8.5, Lustre 2.14, and zfs-2.1 with dRAID and have not observed this issue. It seems reasonable that is may somehow be related to your metadata intensive workload, but it's hard to say exactly how without more information or a consistent reproducer. |
@behlendorf we build and run ZFS with Lustre successfully (thanks a lot!) and after running ADF we have a brand new panic:
From the client perspective we observed that ADF internally writes a bunch of data (between
If you still suspect metadata issues, we can also add our collectd lustre metadata plugin and graphs if it helps you diagnose the problem. |
@behlendorf, do you need more information or tests from our side? |
We decided to close both issues and reformat the filesystem to the supported stack. |
@doma2203 thanks for the update and the debug information above. |
Describe the problem you're observing
We have an additional issues during the heavy metadata-intensive I/O on the same pool as in #13143 (single pool, single dataset, LUN from an all-flash NVMe disk array) :
which causes the ZFS threads to hang. Consequently, the ZFS pool do not perform any read/write operation and the system load increases. This pool acts as the Lustre MDT, so the
mdt
threads are also hang and the whole cluster becomes unresponsive. The ZFS pool works after the reboot of the node. The more reboot operation, the earlier stage of the read/write problems and at the end even the simplesttar
causes the whole filesystem to crash.Up to this point we tried to:
zpool scrub
on the pool - scrub do not detect any errorszfs mount
- the pool seems to have no problem with accessing to ZFS data and the ZFS quotas seems to be set correctlydnodesize
andxattr
from default (legacy
,on
) to thednodesize=auto
ixattr=sa
- no effectzfs send|zfs receive
- it had a good effect, but for the first 10 minutes of the heavy load on the pool. After this time, the problem started to reappear.Describe how to reproduce the problem
The problem is originally triggered by the ADF (quantum chemisty HPC code) on the Lustre MDT with ZFS. It can be also recreated by the
tar
on the kernel sources executed in a loop.Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: