zfs destroy hangs other zfs commands #3104

stevenburgess · 2015-02-13T16:35:41Z

We are having a problem where any zfs destroy that takes time to return causes all new zfs commands to not start until the zfs destroy has returned.

I am testing on ubuntu 14.04, ZoL 0.6.3, pool version 5000.

My testing shows that a zfs destroy call being passed any number of snapshots takes a long time to return. Destroying a 8GB FS with one snapshot returns almost instantly and causes no downtime. Destroying an 8GB FS with 4000 snapshots can take 10 min. If I break destroying the 4000 snapshot file into 100 snapshot zfs destroy -d fs@1%100, I get 10 second periods of inactivity.

I have a few scripts that I used to demonstrate this problem.

-# Create a file system with a ton of snapshots (or use one you have)
https://gist.github.com/stevenburgess/793f6c8e8c593b6b30f6
-# Watch new calls to zfs succeed with a little scroll bar
while [ true ]; do zfs get name tank 2>&1 > /dev/null; printf '|'; sleep 0.1; done
-# Destroy the FS in question
zfs destroy -r tank/constant
-# Watch the scroll bar stop a few seconds in, and not move until the command returns

I use zfs get to show the slowdown, but it is all zfs commands that halt. This means that if one command destroys an FS with 4000 snapshots, for a 10 min period no one can:

zfs get any stats
zfs snapshot anything
zfs clone a new FS
zfs recv new data
zfs send start a sendfile

Which is downtime for a bunch of processes. Our only workaround currently is to ensure that zfs destroy commands destroy single snapshots at a time. This causes a period of .2 to .4 seconds of inactivity.

I was able to reproduce it on a few machines around here, on 0.6.3 and git master.

stevenburgess · 2015-02-24T21:45:10Z

I was able to reproduce this same behavior on FreeBSD, so this may be an upstream problem. I am seeing that while the freeze is occurring, no new TXGs are created, so that explains why things like zfs snapshot and zfs recv freeze. Here is the last TXG that was active for the destroy:

txg      birth            state ndirty       nread        nwritten     reads    writes   otime        qtime        wtime        stime       
8210363  951316291575     C     213843456    3914752      61543936     644      4269     635023385257 2444         25492        8245082139

otime reads 10 min ((## / 1000000000) / 60), but that time was not spent in the q w or s states.

behlendorf · 2015-02-24T23:36:57Z

@stevenburgess I mean to comment on this earlier but it slipped my mind. I haven't confirmed this with the source but I strongly suspect what happening here is that the recursive destroy needs to be handled in a single TXG. Since destroying 4000 snapshots takes a fair bit of work this TXG takes a long time to process which prevents the TXGs from rolling forward every few seconds like they're supposed too. When you destroy the snapshots with multiple commands you spread the work over multiple TXGs minimize the impact. This is a problem all the implementations suffer from.

stevenburgess · 2015-02-25T15:19:52Z

That certainly stacks up with my observations. I find this behavior to be unexpected, since I would probably not think twice about calling zfs destroy -r on an 8GB filesystem, but that call could have some pretty severe consequences, so I should.

What do you think is best in terms of this ticket? I dont want to load the ZoL GitHub with upstream issues. Since it is an OpenZFS issue, should I try to get it posted there? Or is this an acceptable place?

behlendorf · 2015-02-27T05:44:04Z

@stevenburgess I think it's definitely appropriate to open an issue in the upstream tracker. We can leave it open here as well, it might make it easier for people to find. I'm a little less optimistic about how best to address this. There are good reasons to want to handle this all in a single TXG.

AceSlash · 2018-02-20T13:39:11Z

@behlendorf Is there an upstream ticket since?

This issue is still here on ZoL 0.7.5. I got bitten by it several time now when I forget and naively runs a zfs destroy -r. With monitoring software that issue a log of zfs get and hourly backup script, the amount of zfs commands stuck grows very quickly and the consequences can be very bad.

behlendorf · 2018-02-20T20:58:48Z

@stevenburgess @AceSlash we expect that ZFS Channel Programs (OpenZFS 7431), #6558, should improve performance for this case. This feature was recently merged to master and if your in a postion to test the current master source it would be appreciated.

AceSlash · 2018-04-05T17:31:37Z

@behlendorf I'm sorry but I can't test the latest master branch. I really don't know what to do. Today I just did it again, forgetting this bug and killed all zfs operations on an important server.

No backup (zfs send...)
No monitoring (zfs get...)

Is this merged on 0.7.7 (I'm still on 0.7.6 on this system)? I mean that's a super serious issue from my point of view, since it breaks servers.

I'm just sad that no solution was implemented and we still have to deal with this.

tonyhutter · 2018-04-06T00:25:30Z

I was able to reproduce this while destroying a ~400 snapshot pool and running zfs get name tank at the same time. Here's where zfs get hangs:

     3.643888 ioctl(3, _IOC(0, 0x5a, 0x5, 0), 0x7ffdd3136760) = 0  <<--- ZFS_IOC_POOL_STATS takes 3.6 seconds

I also tried the sample "zfs list" channel program from #7281 (comment) while doing the destroy and saw the delay as well:

$ while [ 1 ] ; do sleep 0.1 && sudo ./cmd/zfs/zfs program tank zfs_rlist.zcp tank name &>/dev/null && date ; done
Wed Apr  4 22:51:55 CDT 2018
Wed Apr  4 22:51:55 CDT 2018
Wed Apr  4 22:51:56 CDT 2018
Wed Apr  4 22:51:56 CDT 2018
Wed Apr  4 22:51:56 CDT 2018
Wed Apr  4 22:51:57 CDT 2018
Wed Apr  4 22:51:57 CDT 2018
Wed Apr  4 22:51:57 CDT 2018
Wed Apr  4 22:52:03 CDT 2018 <<<--- 6 seconds later
Wed Apr  4 22:52:03 CDT 2018

Lastly, I tried running the zfs recursive snapshot destroy channel program from https://www.delphix.com/blog/delphix-engineering/zfs-channel-programs while running zpool get, and still saw the delay.

Note that I did notice that zpool status and zpool get do not hang while doing the destroy, so you may be able to use them for simple heartbeat monitoring. That doesn't solve your problem though...

stale · 2020-08-25T00:58:53Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

tigerblue77 · 2021-10-29T17:18:31Z

This issue is still existing.

behlendorf added the Type: Performance Performance improvement or performance problem label Feb 13, 2015

stevenburgess mentioned this issue Feb 22, 2016

smartos server crashes every hour TritonDataCenter/smartos-live#554

Open

loli10K mentioned this issue Feb 20, 2018

impossible to run any zfs command while doing a zfs destroy -r on a dataset with a lot of snapshots (1000+) #7195

Closed

loli10K mentioned this issue Mar 22, 2018

zfs and zpool commands hangs when znapzend runs backups and influence for the cluster #7331

Closed

dilshat mentioned this issue May 23, 2018

zfs command hangs subutai-io/agent#598

Closed

stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020

stale bot closed this as completed Nov 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zfs destroy hangs other zfs commands #3104

zfs destroy hangs other zfs commands #3104

stevenburgess commented Feb 13, 2015

stevenburgess commented Feb 24, 2015

behlendorf commented Feb 24, 2015

stevenburgess commented Feb 25, 2015

behlendorf commented Feb 27, 2015

AceSlash commented Feb 20, 2018

behlendorf commented Feb 20, 2018

AceSlash commented Apr 5, 2018

tonyhutter commented Apr 6, 2018

stale bot commented Aug 25, 2020

tigerblue77 commented Oct 29, 2021

zfs destroy hangs other zfs commands #3104

zfs destroy hangs other zfs commands #3104

Comments

stevenburgess commented Feb 13, 2015

stevenburgess commented Feb 24, 2015

behlendorf commented Feb 24, 2015

stevenburgess commented Feb 25, 2015

behlendorf commented Feb 27, 2015

AceSlash commented Feb 20, 2018

behlendorf commented Feb 20, 2018

AceSlash commented Apr 5, 2018

tonyhutter commented Apr 6, 2018

stale bot commented Aug 25, 2020

tigerblue77 commented Oct 29, 2021