-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't call dump_spill or arc_buf_destroy on unallocated arc_buf #9438
Conversation
Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
return (SET_ERROR(EIO)); | ||
err = dump_spill(dscp, bp, zb.zb_object, abuf->b_data); | ||
arc_buf_destroy(abuf, &abuf); | ||
return (err); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on the return here? The existing code will always return (err)
regardless of dso_dryrun
. The new code will continue through the rest of the case if dso_dryrun
isn't set. It's not immediately obvious which behavior is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing code will try to dump and then destroy a random piece of memory. The err is the return value from dump_spill which clearly shouldn't be called. It's possible that the right solution is:
@@ -939,12 +939,15 @@ do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
ASSERT3U(srdp->datablksz, ==, BP_GET_LSIZE(bp));
ASSERT3U(range->start_blkid + 1, ==, range->end_blkid);
if (BP_GET_TYPE(bp) == DMU_OT_SA) {
arc_flags_t aflags = ARC_FLAG_WAIT;
enum zio_flag zioflags = ZIO_FLAG_CANFAIL;
+ if (dscp->dsc_dso->dso_dryrun)
+ return (0);
+
if (dscp->dsc_featureflags & DMU_BACKUP_FEATURE_RAW) {
ASSERT(BP_IS_PROTECTED(bp));
zioflags |= ZIO_FLAG_RAW;
}
arc_buf_t *abuf;
@@ -952,17 +955,15 @@ do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
ASSERT3U(range->start_blkid, ==, DMU_SPILL_BLKID);
zb.zb_objset = dmu_objset_id(dscp->dsc_os);
zb.zb_object = range->object;
zb.zb_level = 0;
zb.zb_blkid = range->start_blkid;
- if (!dscp->dsc_dso->dso_dryrun && arc_read(NULL, spa,
- bp, arc_getbuf_func, &abuf, ZIO_PRIORITY_ASYNC_READ,
- zioflags, &aflags, &zb) != 0)
+ if (arc_read(NULL, spa, bp, arc_getbuf_func, &abuf,
+ ZIO_PRIORITY_ASYNC_READ, zioflags, &aflags, &zb) != 0)
return (SET_ERROR(EIO));
-
err = dump_spill(dscp, bp, zb.zb_object, abuf->b_data);
arc_buf_destroy(abuf, &abuf);
return (err);
}
if (send_do_embed(dscp, bp)) {
err = dump_write_embedded(dscp, range->object,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, there's no question the existing code is invalid.
There is some evidence that the intention is to call dump_spill()
even in the event of dso_dryrun
being set (i.e. in contrast to your suggestion above). Note the comment with the reference to (misspelt!) dso_dryrun
in dump_record()
, reached from dump_spill()
:
static int
do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
{
...
case DATA: {
...
err = dump_spill(dscp, bp, zb.zb_object, abuf->b_data);
...
}
dump_spill(dmu_send_cookie_t *dscp, const blkptr_t *bp, uint64_t object,
void *data)
{
...
if (dump_record(dscp, data, payload_size) != 0)
return (SET_ERROR(EINTR));
...
}
dump_record(dmu_send_cookie_t *dscp, void *payload, int payload_len)
{
...
if (payload_len != 0) {
*dscp->dsc_off += payload_len;
/*
* payload is null when dso->ryrun == B_TRUE (i.e. when we're
* doing a send size calculation)
*/
if (payload != NULL) {
(void) fletcher_4_incremental_native(
payload, payload_len, &dscp->dsc_zc);
}
...
}
It's possible the intention was something like:
if (dscp->dsc_dso->dso_dryrun) {
err = dump_spill(dscp, bp, zb.zb_object, NULL);
}
else {
if (arc_read(NULL, spa, bp, arc_getbuf_func, &abuf,
ZIO_PRIORITY_ASYNC_READ, zioflags, &aflags, &zb) != 0)
return (SET_ERROR(EIO));
err = dump_spill(dscp, bp, zb.zb_object, abuf->b_data);
arc_buf_destroy(abuf, &abuf);
}
return (err);
But I haven't done enough analysis to work out what is supposed to be happening here - I only have the foggiest notion of what this function is doing in the first place!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anywhere in the dump_spill
call chain where it checks if the data pointer is NULL.
dump_spill
-> dump_record
(which can be passed NULL if payload_size == 0) -> dump_bytes
-> vn_rdwr
. If what you're saying is correct, `dump_spill would need to be amended to check the pointer and set payload_size => 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Sorry, I can't see where vn_rdwr
comes into this.)
The if (payload != NULL) {
in dump_record()
is where it's checking if the data pointer is null.
I don't think it's necessary to have payload_size == 0
for payload == NULL
. To me it looks like when dso_dryrun
is set we're wanting to record sizes (i.e. payload_size
), whilst doing as little actual work (reading data into payload
, calculating checksums over the data etc.) as possible.
payload_size
is already being set in dump_spill()
:
uint64_t blksz = BP_GET_LSIZE(bp);
uint64_t payload_size = blksz;
I.e. we have a valid payload size, just the payload itself hasn't been fetched (because we skipped the arc_read()
).
But I'm really waving my hands around here from a simple code inspection, I have no idea what's actually supposed to be happening in any of this code.
My intention here was to simply raise a warning that your original proposed fix didn't look "obviously correct" to me. If you know what is supposed to be happening, and that's what your proposed solutions implement, then great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see what you're pointing to with dump_bytes-> vn_rdwr
. In dump_record()
, a little after the check for if (payload != NULL) {
, it calls:
dscp->dsc_err = dso->dso_outfunc(dscp->dsc_os, payload,
payload_len, dso->dso_arg);
...and dso_outfunc()
can possibly be dump_bytes()
or send_space_sum()
, per several places in zfs_ioctl.c
:
$ grep -r 'dso_outfunc =' .
module/zfs/zfs_ioctl.c: out.dso_outfunc = dump_bytes;
module/zfs/zfs_ioctl.c: out.dso_outfunc = dump_bytes;
module/zfs/zfs_ioctl.c: out.dso_outfunc = send_space_sum;
...and dump_bytes()
then calls vn_rdwr()
(via dump_bytes_cb()
) which indeed looks like it may possibly barf if it's passed dbi_buf == payload == NULL
.
However closer inspection shows:
zfs_ioc_send(zfs_cmd_t *zc)
{
...
out.dso_outfunc = dump_bytes;
out.dso_arg = fp->f_vnode;
out.dso_dryrun = B_FALSE;
...
}
zfs_ioc_send_new(const char *snapname, nvlist_t *innvl, nvlist_t *outnvl)
{
...
out.dso_outfunc = dump_bytes;
out.dso_arg = fp->f_vnode;
out.dso_dryrun = B_FALSE;
...
}
zfs_ioc_send_space(const char *snapname, nvlist_t *innvl, nvlist_t *outnvl)
{
...
out.dso_outfunc = send_space_sum;
out.dso_arg = &space;
out.dso_dryrun = B_TRUE;
...
}
I.e. dso_outfunc = dump_bytes
is only ever set when dso_dryrun == B_FALSE
. When dso_dryrun == B_TRUE
, dso_outfunc = send_space_sum
, and that is safe with payload == NULL
:
send_space_sum(objset_t *os, void *buf, int len, void *arg)
{
uint64_t *size = arg;
*size += len;
return (0);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I.e. dso_outfunc = dump_bytes is only ever set when dso_dryrun == B_FALSE. When dso_dryrun == B_TRUE, dso_outfunc = send_space_sum, and that is safe with payload == NULL:
Ah. Then your version is likely the correct one. I just need arc_buf_destroy
to not get called on stack garbage. I'm wading through too many diffs to acquaint myself much of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chrisrd should I just close this PR and file an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think close this PR: I've already opened an alternate PR (#9453) with a solution per discussion above, referencing back to this PR.
Codecov Report
@@ Coverage Diff @@
## master #9438 +/- ##
==========================================
+ Coverage 79.15% 79.22% +0.07%
==========================================
Files 412 412
Lines 123602 123603 +1
==========================================
+ Hits 97831 97930 +99
+ Misses 25771 25673 -98
Continue to review full report at Codecov.
|
I don't recall when / how I initially hit this issue but it's self evident from the code that
in the
dso_dryrun != 0
case that we will calldump_spill
andarc_buf_destroy
on a pointer that contains whatever value happens to be at that stack location.Signed-off-by: Matt Macy mmacy@FreeBSD.org
Motivation and Context
Description
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.