Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't call dump_spill or arc_buf_destroy on unallocated arc_buf #9438

Closed
wants to merge 1 commit into from

Conversation

mattmacy
Copy link
Contributor

@mattmacy mattmacy commented Oct 9, 2019

I don't recall when / how I initially hit this issue but it's self evident from the code that
in the dso_dryrun != 0 case that we will call dump_spill and arc_buf_destroy on a pointer that contains whatever value happens to be at that stack location.

Signed-off-by: Matt Macy mmacy@FreeBSD.org

Motivation and Context

Description

How Has This Been Tested?

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

Signed-off-by: Matt Macy <mmacy@FreeBSD.org>
@ahrens ahrens requested review from pcd1193182 and ahrens October 9, 2019 22:42
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 10, 2019
return (SET_ERROR(EIO));
err = dump_spill(dscp, bp, zb.zb_object, abuf->b_data);
arc_buf_destroy(abuf, &abuf);
return (err);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on the return here? The existing code will always return (err) regardless of dso_dryrun. The new code will continue through the rest of the case if dso_dryrun isn't set. It's not immediately obvious which behavior is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code will try to dump and then destroy a random piece of memory. The err is the return value from dump_spill which clearly shouldn't be called. It's possible that the right solution is:

@@ -939,12 +939,15 @@ do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
                ASSERT3U(srdp->datablksz, ==, BP_GET_LSIZE(bp));
                ASSERT3U(range->start_blkid + 1, ==, range->end_blkid);
                if (BP_GET_TYPE(bp) == DMU_OT_SA) {
                        arc_flags_t aflags = ARC_FLAG_WAIT;
                        enum zio_flag zioflags = ZIO_FLAG_CANFAIL;
 
+                       if (dscp->dsc_dso->dso_dryrun)
+                               return (0);
+
                        if (dscp->dsc_featureflags & DMU_BACKUP_FEATURE_RAW) {
                                ASSERT(BP_IS_PROTECTED(bp));
                                zioflags |= ZIO_FLAG_RAW;
                        }
 
                        arc_buf_t *abuf;
@@ -952,17 +955,15 @@ do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
                        ASSERT3U(range->start_blkid, ==, DMU_SPILL_BLKID);
                        zb.zb_objset = dmu_objset_id(dscp->dsc_os);
                        zb.zb_object = range->object;
                        zb.zb_level = 0;
                        zb.zb_blkid = range->start_blkid;
 
-                       if (!dscp->dsc_dso->dso_dryrun && arc_read(NULL, spa,
-                           bp, arc_getbuf_func, &abuf, ZIO_PRIORITY_ASYNC_READ,
-                           zioflags, &aflags, &zb) != 0)
+                       if (arc_read(NULL, spa, bp, arc_getbuf_func, &abuf,
+                           ZIO_PRIORITY_ASYNC_READ, zioflags, &aflags, &zb) != 0)
                                return (SET_ERROR(EIO));
-
                        err = dump_spill(dscp, bp, zb.zb_object, abuf->b_data);
                        arc_buf_destroy(abuf, &abuf);
                        return (err);
                }
                if (send_do_embed(dscp, bp)) {
                        err = dump_write_embedded(dscp, range->object,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, there's no question the existing code is invalid.

There is some evidence that the intention is to call dump_spill() even in the event of dso_dryrun being set (i.e. in contrast to your suggestion above). Note the comment with the reference to (misspelt!) dso_dryrun in dump_record(), reached from dump_spill():

static int
do_dump(dmu_send_cookie_t *dscp, struct send_range *range)
{
...
        case DATA: {
...
                        err = dump_spill(dscp, bp, zb.zb_object, abuf->b_data);
...
}

dump_spill(dmu_send_cookie_t *dscp, const blkptr_t *bp, uint64_t object,
    void *data)
{   
...
        if (dump_record(dscp, data, payload_size) != 0)
                return (SET_ERROR(EINTR));
...
}

dump_record(dmu_send_cookie_t *dscp, void *payload, int payload_len)
{
...
        if (payload_len != 0) {
                *dscp->dsc_off += payload_len;
                /*
                 * payload is null when dso->ryrun == B_TRUE (i.e. when we're
                 * doing a send size calculation)
                 */
                if (payload != NULL) {
                        (void) fletcher_4_incremental_native(
                            payload, payload_len, &dscp->dsc_zc);
                }
...
}

It's possible the intention was something like:

        if (dscp->dsc_dso->dso_dryrun) {
                err = dump_spill(dscp, bp, zb.zb_object, NULL);
        }
        else {
                if (arc_read(NULL, spa, bp, arc_getbuf_func, &abuf,
                    ZIO_PRIORITY_ASYNC_READ, zioflags, &aflags, &zb) != 0)
                        return (SET_ERROR(EIO));
                err = dump_spill(dscp, bp, zb.zb_object, abuf->b_data);
                arc_buf_destroy(abuf, &abuf); 
        }

        return (err);

But I haven't done enough analysis to work out what is supposed to be happening here - I only have the foggiest notion of what this function is doing in the first place!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anywhere in the dump_spill call chain where it checks if the data pointer is NULL.
dump_spill -> dump_record (which can be passed NULL if payload_size == 0) -> dump_bytes-> vn_rdwr. If what you're saying is correct, `dump_spill would need to be amended to check the pointer and set payload_size => 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Sorry, I can't see where vn_rdwr comes into this.)

The if (payload != NULL) { in dump_record() is where it's checking if the data pointer is null.

I don't think it's necessary to have payload_size == 0 for payload == NULL. To me it looks like when dso_dryrun is set we're wanting to record sizes (i.e. payload_size), whilst doing as little actual work (reading data into payload, calculating checksums over the data etc.) as possible.

payload_size is already being set in dump_spill():

        uint64_t blksz = BP_GET_LSIZE(bp);
        uint64_t payload_size = blksz;

I.e. we have a valid payload size, just the payload itself hasn't been fetched (because we skipped the arc_read()).

But I'm really waving my hands around here from a simple code inspection, I have no idea what's actually supposed to be happening in any of this code.

My intention here was to simply raise a warning that your original proposed fix didn't look "obviously correct" to me. If you know what is supposed to be happening, and that's what your proposed solutions implement, then great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see what you're pointing to with dump_bytes-> vn_rdwr. In dump_record(), a little after the check for if (payload != NULL) {, it calls:

                dscp->dsc_err = dso->dso_outfunc(dscp->dsc_os, payload,
                    payload_len, dso->dso_arg);

...and dso_outfunc() can possibly be dump_bytes() or send_space_sum(), per several places in zfs_ioctl.c:

$ grep -r 'dso_outfunc =' .
module/zfs/zfs_ioctl.c:         out.dso_outfunc = dump_bytes;
module/zfs/zfs_ioctl.c: out.dso_outfunc = dump_bytes;
module/zfs/zfs_ioctl.c:         out.dso_outfunc = send_space_sum;

...and dump_bytes() then calls vn_rdwr() (via dump_bytes_cb()) which indeed looks like it may possibly barf if it's passed dbi_buf == payload == NULL.

However closer inspection shows:

zfs_ioc_send(zfs_cmd_t *zc)
{
...
                out.dso_outfunc = dump_bytes;
                out.dso_arg = fp->f_vnode;
                out.dso_dryrun = B_FALSE;
...
}

zfs_ioc_send_new(const char *snapname, nvlist_t *innvl, nvlist_t *outnvl)
{
...
        out.dso_outfunc = dump_bytes;
        out.dso_arg = fp->f_vnode;
        out.dso_dryrun = B_FALSE;
...
}

zfs_ioc_send_space(const char *snapname, nvlist_t *innvl, nvlist_t *outnvl)
{
...
                out.dso_outfunc = send_space_sum;
                out.dso_arg = &space;
                out.dso_dryrun = B_TRUE;
...
}

I.e. dso_outfunc = dump_bytes is only ever set when dso_dryrun == B_FALSE. When dso_dryrun == B_TRUE, dso_outfunc = send_space_sum, and that is safe with payload == NULL:

send_space_sum(objset_t *os, void *buf, int len, void *arg)
{
        uint64_t *size = arg;
        *size += len;
        return (0);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. dso_outfunc = dump_bytes is only ever set when dso_dryrun == B_FALSE. When dso_dryrun == B_TRUE, dso_outfunc = send_space_sum, and that is safe with payload == NULL:

Ah. Then your version is likely the correct one. I just need arc_buf_destroy to not get called on stack garbage. I'm wading through too many diffs to acquaint myself much of the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrisrd should I just close this PR and file an issue?

Copy link
Contributor

@chrisrd chrisrd Oct 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think close this PR: I've already opened an alternate PR (#9453) with a solution per discussion above, referencing back to this PR.

@codecov
Copy link

codecov bot commented Oct 10, 2019

Codecov Report

Merging #9438 into master will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9438      +/-   ##
==========================================
+ Coverage   79.15%   79.22%   +0.07%     
==========================================
  Files         412      412              
  Lines      123602   123603       +1     
==========================================
+ Hits        97831    97930      +99     
+ Misses      25771    25673      -98
Flag Coverage Δ
#kernel 79.74% <100%> (+0.05%) ⬆️
#user 67.21% <0%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 381d91d...2ce207a. Read the comment docs.

@mattmacy mattmacy closed this Oct 14, 2019
@mattmacy mattmacy deleted the projects/dmu_send_fix branch December 19, 2019 22:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants