Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It would be nice to have a 'safe mode' for zfs and zpool commands #4134

Open
dswartz opened this issue Dec 22, 2015 · 42 comments
Open

It would be nice to have a 'safe mode' for zfs and zpool commands #4134

dswartz opened this issue Dec 22, 2015 · 42 comments
Labels
good first issue Indicates a good issue for first-time contributors Type: Feature Feature request or new feature

Comments

@dswartz
Copy link
Contributor

dswartz commented Dec 22, 2015

I really don't like that I can mistype something and vaporize datasets, and even entire pools. I would like to propose a property defined for datasets, zvols and pools that basically says 'if this is set to ON, any destructive operation will require use of the -f flag'. Thinking of destroying datasets, zvols and pools. Maybe snapshots too? This could really be a totally userland abstraction. What do y'all think?

@kernelOfTruth
Copy link
Contributor

👍

@behlendorf behlendorf added the Type: Feature Feature request or new feature label Dec 22, 2015
@dswartz
Copy link
Contributor Author

dswartz commented Dec 22, 2015

I would be happy to take a shot at this...

@behlendorf
Copy link
Contributor

I think that's a totally reasonable idea which would be a great feature. Can you flesh out a little more detail about how it should work. Also I'd be cautious about a -f option which is already heavily overloaded. It might be safer to force users to explicitly disable this 'safe mode' before allowing the operation. Or how about a locked=on|off property which could be set on the dataset to prevent administrative changes?

@dswartz
Copy link
Contributor Author

dswartz commented Dec 22, 2015

Hmm, well, I am not thinking of locking the entity per-se, just making it harder to screw yourself. Although, thinking about this more, I kinda like your idea. If I want to blow away 'tank/vsphere', where all my VMs live, it's not unreasonable for me to have to do 'zfs set locked=off tank/vsphere' first. So, yeah, that sounds good to me. Was there anything else?

@behlendorf
Copy link
Contributor

I think you'll want to think through exactly what commands this should apply too.

@richardelling
Copy link
Contributor

-f flags are bad UI design, you just end up training people to always type -f (see also kill -9)

In general, If you don't want to destroy a dataset, don't type "destroy" :-p

NB, there are UIs that mark datasets for later destruction, giving you an opportunity to change your mind. However, the most common use case for dataset destroy is to reclaim space, so the deferred destroy is not a general solution.

zpool destroy is reversible, no need for more UI complexity.

-- richard

On Dec 22, 2015, at 12:32 PM, Brian Behlendorf notifications@github.com wrote:

I think you'll want to think through exactly what commands this should apply too.


Reply to this email directly or view it on GitHub.

@dswartz
Copy link
Contributor Author

dswartz commented Dec 22, 2015

On 2015-12-22 13:40, Richard Elling wrote:

-f flags are bad UI design, you just end up training people to always
type -f (see also kill -9)

In general, If you don't want to destroy a dataset, don't type
"destroy" :-p

Yes, thank you. I'm familiar with the chainsaw with no safety guard :)

NB, there are UIs that mark datasets for later destruction, giving
you an opportunity to change your mind. However, the most common use
case for dataset destroy is to reclaim space, so the deferred destroy
is not a general solution.

zpool destroy is reversible, no need for more UI complexity.

Fair enough, I'd forgotten about that. That said, I don't think it's
that complex to have any commands that change anything with a pool or
dataset or zvol query that property. The small amount of heavy lifting
can be encapsulated in one fairly small procedure.

@richardelling
Copy link
Contributor

On Dec 22, 2015, at 10:51 AM, dswartz notifications@github.com wrote:

On 2015-12-22 13:40, Richard Elling wrote:

-f flags are bad UI design, you just end up training people to always
type -f (see also kill -9)

In general, If you don't want to destroy a dataset, don't type
"destroy" :-p

Yes, thank you. I'm familiar with the chainsaw with no safety guard :)

What other subcommand can you confuse with "destroy?" Answer: none

So the problem you're trying to solve is one of naming datasets. Since there is
no enforced naming, you might try a less failure-prone convention. For example:
mypool/dataset1, mypool/dataset11, mypool/dataset2
is more failure-prone than:
mypool/Alice, mypool/Bob, mypool/Godzilla

Fully automated systems, when designed well, are not subject to this problem as
they won't mistype.

NB, there are UIs that mark datasets for later destruction, giving
you an opportunity to change your mind. However, the most common use
case for dataset destroy is to reclaim space, so the deferred destroy
is not a general solution.

zpool destroy is reversible, no need for more UI complexity.

Fair enough, I'd forgotten about that. That said, I don't think it's
that complex to have any commands that change anything with a pool or
dataset or zvol query that property. The small amount of heavy lifting
can be encapsulated in one fairly small procedure.

Low effort for sure, but also not effective, which is why it doesn't exist.

For those listening at home, this was one of the first great debates when ZFS was first released.
-- richard

@nwf
Copy link
Contributor

nwf commented Dec 25, 2015

Well, to pitch in $.02, I am always terrified of needing to use "zfs destroy foo/bar@baz" to nuke snapshots, and consider the overloading of "destroy" here to be a little on the hazardous side of things. I'd much rather an "unsnapshot" command or something that errored if it was given a name without @.

@CroneKorkN
Copy link

What other subcommand can you confuse with "destroy?"

As nwf said, 'destroy' is used to destroy both, datasets and snapshots. Removing a snapshot from a dataset always feels a bit risky, thus. Until entering '@', you typed a valid command to accidentally destroy your dataset.

@Fabian-Gruenbichler
Copy link
Contributor

but destroy will not destroy a dataset with children unless passed "-r" as well?

so if you want to type zfs destroy mypool/mydataset@mysnapshot but accidentally hit enter after zfs destroy mypool/mydataset, you will get

cannot destroy 'mypool/mydataset': filesystem has children
use '-r' to destroy the following datasets:
mypool/mydataset@mysnapshot
mypool/mydataset@myothersnapshotwhichiwanttokeep

next step: train yourself to put "-r" AFTER the dataset name, not in front (to prevent the same problem of premature ending of a command when you actually want to recursively delete a dataset)

@CroneKorkN
Copy link

But if there is no snapshot for some reason, the dataset will either be destroyed. I am using dummy-datasets like tank/important-data/dont-delete-me to guarantee the existence of a child-dataset. Separating the commands for handling snapshots from the ones handling datasets or being able to lock a dataset would really feel great.

Thanks for the recommendation to put the '-r'-flag after the target-name.

@yurelle
Copy link

yurelle commented Apr 15, 2018

I would like to vote for adding a settable flag to pools, datasets, & snapshots, that is something like "protected", that can be set to on/off or yes/no. If the object is marked as destroyable, it behaves as normal: a call to zfs destroy myThing destroys it, no questions asked. However, if you manually set the flag to protect the object, it behaves differently. It is still writable/modifiable if the readOnly flag is off, but any call to zfs destroy myThing will abort and error out; something like: "Cannot destroy 'myThing'. Pool/Dataset/Snapshot is protected."

Also, the ability to do a recursive set=on would be useful, but a recursive set=off, seems somewhat dangerous. Perhaps a confirmations message, but, maybe I'm just paranoid.

[yurelle@host ~] zfs set -r protected=on myThing
[yurelle@host ~] zfs destroy myThing
Cannot destroy 'myThing'. Pool/Dataset/Snapshot is protected.
If you wish to destroy it, disable the "protected" flag.
[yurelle@host ~] zfs set -r protected=off myThing
This will remove protection from all objects under 'MyThing'. Are you sure? (yes/no) _

Where entering no, or Ctrl+c 'ing abort the change.

@behlendorf An additional flag of "locked" to prevent all admin changes also sounds good. I imagine each being set independently, but if "locked" is on, maybe it forces "protected" to true?

-Yurelle

@DeHackEd
Copy link
Contributor

Holds are only good for snapshots though.

While you could make a snapshot and put a hold on it to prevent the main dataset's destruction, that snapshot references data you might want to free up. It works, but it's not great.

@richardelling
Copy link
Contributor

richardelling commented Apr 16, 2018 via email

@nwf
Copy link
Contributor

nwf commented Apr 16, 2018

@richardelling That doesn't work for snapshots, since we can't set readonly: "this property can not be modified for snapshots". While I know that some sysadmins don't make mistakes, snapshots are, ostensibly, delegate-able to users, and we know that users like to typo commands.

@chris13524
Copy link

chris13524 commented Jan 9, 2019

I would totally love a protected=true flag I could set on anything I want. Then if I try to destroy it, it would prevent me from doing so. I would have to explicitly remove the flag first.

On my desktop system, I'm often playing around with pools. Creating and destroying them. It's easy for muscle memory to kickin and accidentally type or mistype the wrong dataset name, resulting in destroying the main system's data. Or perhaps you are creating a script that is slightly buggy?

A protected flag would help to alleviate the terror I have anytime I'm working with zfs destroy commands. (I stare at a destroy command for a good 20-30 seconds before executing it.)

@bunder2015
Copy link
Contributor

just throwing it out there but zfs create and destroy, as well as zpool create all have the no-op flag.

@tcf909
Copy link

tcf909 commented Jun 14, 2020

The difference between:

zfs destroy tank/precious

and

zfs destroy tank/precious@snap

Is the difference between my two year old running in and "practicing" his typing while I pause to sip my coffee contemplating if that is actually the command I want to run... Silly....but it absolutely happened.

(There was actually no snapshot on the precious dataset in this case)

Would love to see a protected flag available on the dataset.

@richardelling I'm mostly wanting to protect myself from myself vs an operational procedure. Thoughts?

@richardelling
Copy link
Contributor

automate it away (DRY)

@stellarpower
Copy link

stellarpower commented Jul 12, 2020

I agree with the comment about -f, as this particular flag often prevents people from thinking about what it is they actually asked the machine to do versus what they thought they were asking. I actually think we already have a more zfs/solaris-esque way of doing this, although I have not yet used it, hold for snapshots. Something like zfs hold Pool/dataset might be very nice, and if we type:

# zfs destroy Pool/dataset
Cannot destroy Pool/dataset: dataset is marked as held against destructive operations
# zfs release Pool/dataset
# zfs destroy Pool/dataset

I like how something like this would be lean and clean, and how there's no flag to force it; we explicitly have to run a both seperate and unambiguous command to remove that protection again. By making the command separate, as I feel is a pattern I have seen and liked in illumos, it means at that point, when I hit enter, my mind is thinking about one thing and one thing only, unholding that dataset. Using a flag means that I could go back through my terminal history, edit a command, and accidentally force overwriting a different dataset that I didn't want to, or could mean that we're still thinking about the original command where we were about to damage a filesystem irreperably and the specifics of this, rather than the simple action of "I'm now marking this dataset as okay to be destroyed, but then I must have marked this as not to be overwritten about 9 months aho and can't remember why - what was that reason...?"

However the feature might be implemented, an optional comment on why the dataset is to be marked as not to be destroyed would be fantastic for admins IMO. Using a property on the dataset could also offer all the above features in a different format, I believe (e.g. zfs set nodestroy="read /notes.md because you forgot to copy off the hidden files in the trash when you emptied this back in October" Pool/dataset ; zfs set nodestroy=off Pool/dataset)

I feel an argument that we simply shouldn't run destroy if we're not sure isn't a particularly good one. I recently had a similar issue where I was hoping to make a read-only "copy" of /dev on linux so I could open device files for reading but not writing when I needed to backup my partition tables. I asked about this on the chat for my university's computing society and the only replies I had were that I should use a backup tool (not possible when you need access to raw bytes of course), that dd is nicknamed disk destroyer for a reason and that you should never run anything as root without reading it and checking. I know all of these things, but at 4 AM after 10 hours of sysadmin work, we users inevitably make mistakes in front of a keyboard, and I know myself well in this regard. No matter how much care you take, we all junk a disk from ime to time by accident, somehow.

Our job is often to put in as many sensible measures as we see fit for the job at hand to protect our machine from ourselves, so in my opinion, if a user would like to add another layer of protection because she knows she is likely to make mistakes or errors, perhaps on a system that has not been touched in a very long time, perhaps at a particularly unholy time of day, I think we should absolutely facilitate this if there is little to be compromised by doing so. We have a common policy of offering a program or user as little as they need to get the job done, and nothing more. Unix unfortunately throws this out on many occasions where you are either a normal user called muggins who can do nothing, or become the one all-powerful user named root who cares not for typos or hitting enter too rapidly because the microwave pinged two minutes ago, and who can implicitly do everything, often without being if it's what you really intended to do. We then have ot put in other measures to prevent accidents on a more granular level before we reach this binary decision of normal or everything-including-what-you-didn't-actually-want-this-time.

In my case, I have been writing a script for the last several hours and destroying and creating datasets over and over, necessarily as root, because I cannot mount them in linux without root when I can on illumos. In an ideal world, yes, I would create a pool that I could junk later but I wanted the job done as quickly as possible having spent 12 hours non-stop on this already. I turns out I have both a filesystem called test and a filesystem called temp, and I am also running an SSH session on another machine doing something similar, with temp already containing data I wanted and put there a good while ago; test is, or perhaps, must have been, what I created for the purposes of the script. Right now, I would like to be able to hold on temp such that I don't make any more mistakes.

@bghira
Copy link

bghira commented Jul 13, 2020

zfs set org.user.name:nodestroy='this is why you did not want to destroy the dataset or snapshot' pool/volume@snap

@stellarpower
Copy link

Polkit, Udev, etc. also came to mind - it might be quite nice to be able to write small functions in something that looks a bit like a scripting language that could be run using hooks before/after executing various commands and allow administrators to customise what happens before and after specific operations according to their own organisational policies. Perhaps a data controller is required by law to keep a copy of data for a minimum of say, 30 days. If it were possible to have a small piece of code to be triggered before deleting a snapshot, it could verify that the snapshot in its arguments is less than 30 days old, and spit out an error, before zfs itself goes and deletes the snapshot, if this isn't the case. In this case, we could easily add a property like the above, and prevent destroying any dataset unless the property is removed first.

@gdevenyi
Copy link
Contributor

Just thinking there's already precedent for ZFS environment variables to control how some commands work (see zpool script code).

What about a variable ZFS_SAFE_MODE which users can export which will make potentially destructive operations act as though "-n" is set?

@bghira
Copy link

bghira commented Jul 31, 2020

so you can use zed(lets) to micromanage snapshot deletion policy, as i understand it. i have used it for automatically cleaning up old snapshots when a new one is created. though i'm not sure if it can actually deny an operation or reverse one!

@gdevenyi actually a default deny policy for write operations would be kinda useful but i could see that being problematic if one has it set by default in one terminal and goes to another expecting the same restrictions. it is similar to why shell aliases are dangerous.

@richardelling
Copy link
Contributor

Since the zpool/zfs commands are just open source programs, nothing prevents a user from providing their own that has different policies. For appliances, it is quite common to implement direct calls, like zpool does, from policy elements.

@one-github
Copy link

I found this thread after having run accidentally
zfs destroy -r source/photos/2000 instead of zfs destroy -r backup/photos/2000. The backup pool is a directly attached pool I use to regularly zfs send backups to, while source is the source. According to Murphy's Law (of course) I also deleted snapshots from the backup copy, so I had to zfs send the filesystem from a separate backup pool.

How do I prevent shooting myself in the foot with stuff like this?

PS: In reality the pool name of the source is minitank while the backup pool is called microtank.

@poelzi
Copy link

poelzi commented Feb 1, 2022

I think zfs is the only FS that has such destructive commands with absolutely no warning and questions. Even deleting all the snapshots etc... Sorry, but thats a big - on zfs currently and that this is an open bug for 6 years....

@yurelle
Copy link

yurelle commented Feb 2, 2022

Especially because the command to destroy datasets & snapshots are basically identical, only the path is different, and the command to destroy a small narrow thing, includes the command to destroy a broader thing. if you accidentally hit enter part way through typing a command (or copy & paste a command template, and accidentally get a carriage return in the clipboard), you can end up deleting the entire dataset when you only meant to destroy a single snapshot.

zfs destroy myPool/myDataset
zfs destroy myPool/myDataset@mySnapshot

@gmelikov
Copy link
Member

gmelikov commented Feb 2, 2022

Especially because the command to destroy datasets & snapshots are basically identical

But dataset with children won't be deleted even now, so it's not the case

root@foton:~# zfs create rpool/testds
root@foton:~# zfs snap rpool/testds@snap1
root@foton:~# zfs destroy rpool/testds
cannot destroy 'rpool/testds': filesystem has children
use '-r' to destroy the following datasets:
rpool/testds@snap1

And one more comment to older one:

I found this thread after having run accidentally
zfs destroy -r source/photos/2000 instead of zfs destroy -r backup/photos/2000

Unfortunately, even with -imtotallyagreetodeletemydataset word typo of pool name can't be saved, it's independent from any command.

After some years I began to agree with @richardelling here, basic force argument won't help here.

@poelzi
Copy link

poelzi commented Feb 2, 2022

I think the easiest thing would be to simply add a setting "protected" and no fs or pool can be deleted when the flag is set. You have to manually unset the flag to delete. The flag is not propagated to children.
This is also backward compatible as it does not introduce a flag.
I like cephs pool delete:
https://docs.ceph.com/en/latest/rados/operations/pools/#delete-a-pool

I never ever accidentally deleted a ceph pool.

@gmelikov
Copy link
Member

gmelikov commented Feb 2, 2022

This is also backward compatible as it does not introduce a flag.

But it's not at least in CLI interface-wise.

Even deleting all the snapshots etc

You can't delete all snapshots if you didn't explicitly run command with -r. It's the same "additional argument" as in ceph, but with a different name.

@AbsurdlySuspicious
Copy link

I'd vote for protected=yes|no flag for pools, datasets and shapshots with no way to destroy them without setting flag to no. To be backwards-compatible, feature for this could return from active to enabled once everything in pool is unprotected. "Automate it away" advice just masks the problem and doesn't solve it

@haarp
Copy link

haarp commented Apr 28, 2022

It's scary typing zfs destroy [-r] foo/bar. Until you type @, you run the risk of doing a lot of damage if you (or your cat) accidently hits enter.

But a new fs flag is potential overkill. Making the zfs utility confirm the deletion would probably be sufficient. Some tools also require you to re-enter the exact thing you want to delete as confirmation, which avoids the "enter, y, enter" muscle memory screwing you over.

The default behavior could be set via env variable. e.g.:

  • ZFS_CONFIRM=retype zfs destroy foo -> user needs to type foo again to confirm
  • ZFS_CONFIRM=y zfs destroy foo -> user needs to press 'y' to confirm
  • ZFS_CONFIRM=n zfs destroy foo (or var empty/unset) -> no confirmation
  • ZFS_CONFIRM=y zfs destroy -y foo -> no confirmation

@iio7
Copy link

iio7 commented Sep 19, 2022

ZFS is all about protecting data and this is one issue that is actually really worrisome! I know, in the good sense of Unix, you can shoot yourself in the foot with rm -rf too, but the whole point of using ZFS is to get better filesystem protection. Let's extend this to a tired sysadmin making a mistake!

When I have:

$ zfs list -t all
pool1/test                  25.1M   236G     24.4M  /pool1/test
pool1/test@snap1             144K      -      984K  -
pool1/test@snap2             144K      -     1.07M  -
pool1/test@snap3             144K      -     3.49M  -
...

And I no longer need the snapshots until I make a new, doing this by mistake is REALLY SCARY (even with the -r option AFTER the name of the dataset):

$ doas zfs destroy pool1/test -r

All it takes is for me is to have my son run into the room while I am typing, making me loose my concentration for a second, and when I type I forget to provide the name of the snapshot rather than the -r option.

Having an option is the best solution, it is backwards compatible and it can protect a dataset optimally:

$ doas zfs set locked=on pool1/myimportantdataset
$ doas zfs destroy pool1/myimportantdataset -r
cannot destroy 'pool1/myimportantdataset': filesystem is locked

Please add this feature!

UPDATE: It seems like Oracle has added this feature as well, if I read the doc correctly.

@hulla-bulla
Copy link

Well, to pitch in $.02, I am always terrified of needing to use "zfs destroy foo/bar@baz" to nuke snapshots, and consider the overloading of "destroy" here to be a little on the hazardous side of things. I'd much rather an "unsnapshot" command or something that errored if it was given a name without @.

This is my main concern handling my pools right now and doing more scripts. Don't trust myself that much, cause I know I'll do a typo sooner or later that will destroy my 30TB of data..

@bigbellyburger
Copy link

bigbellyburger commented Jan 28, 2023

When is this coming? ZFS should not make it this easy to lose your data. A protected flag is absolutely essential.

@one-github
Copy link

one-github commented Jan 28, 2023

Come on - issue opened in Dec 2015, now it’s 2023. SEVEN f*ing years later we’re still debating. What’s the problem with solving this?

@GregorKopka
Copy link
Contributor

# zfs create tank/test
# zfs snapshot tank/test@nodelete
# zfs hold "Do not delete" tank/test@nodelete
# zfs destroy -r tank/test
cannot destroy snapshot tank/test@nodelete: dataset is busy
# zfs holds tank/test@nodelete
NAME                TAG            TIMESTAMP
tank/test@nodelete  Do not delete  Wed Mar 15 09:31 2023

While we're back in bad error message country with that (#14538) it solves this issue.

@hilbix
Copy link

hilbix commented Mar 17, 2023

I, too, would like to see such an option.

But as nearly always my view is a bit different. What I am lacking is something, which changes depending on where and what I am. Hence just a flag like protect=on vs. protect=off won't be enough, as this does not change depending on where I am.

The idea is to protect against a workflow. Something you have done a billion of times, but this time you happen to be on the wrong side of the cluster. So the complete layout of both sides looks identical. So the same command sequence could work. Or does it need to be this way?

No! My proposal to how to protect ZFS is something like

cluster1a: zfs protect MONTY      zfs/expensive-data
cluster1b: zfs protect PYTHON     zfs/expensive-data
cluster2a: zfs protect SOMETHING  zfs/expensive-data
cluster2b: zfs protect COMPLETELY zfs/expensive-data
backup:    zfs protect DIFFERENT  zfs/expensive-data

with the inverse

cluster1a: zfs unprotect MONTY   zfs/expensive-data
cluster2b: zfs unprotect PYTHON  zfs/expensive-data

(Oh no, second command fails. What happend?)

I'd also vote that pools, fs, snapshots and zvols can independently protected/unprotected with more than just one word, such that different pieces of software (or different roles of people) can set different pieces of locks independently and do not need to form complex error prone cooperation meshes with scripts which then easily introduce nasty race conditions and so on.

  • zpool protect A zfs; zpool protect B zfs; zpool unprotect A zfs; then zpool destroy zfs must fail as there still is protection B on it.
  • zfs protect A zfs zfs/vol1 zfs/vol2; zfs protect -r B zfs; zfs unprotect -r B zfs/vol1; then zpool destroy zfs/vol1 must fail, as there is still the protection A on it.

The important part is, that this setting is not replicated with zfs send (therfor no zfs set or zpool set), such that zfs receive -F fails for some purpose if something is kept protected. Hence the idea to not do this as a filesystem flag but do it in some independent way (please stored in the pool and not outside).

Why? Because of zfs receive might replicate settings, too.

zfs receive -F is some major annoyance to me, because without this flag quite often incremental receives no more work. The sequence zfs umount zfs/share; zfs rollback zfs/share@lastest-snapshot most time does not restore the possibility to receive the incremental send without -F (is this a bug?). Now you have the choice:

  • Either be very-very-very-very careful never to touch something (read: mount -r) which receives incremental snapshots, such that the fs stays exactly at the last snapshot received
  • or live with the fact, that -F not only rolls back the FS to the last (wanted) snapshot but also
    unconditionally removes all the snapshots which are deleted on the source and might wanted to be kept alive on the backup!

Think about somebody accidentally deleting the source with all snapshots. Then after a receive -F your backup will be empty, too. Now think about some Blackhat doing this in your production and the automated backup process to the backup server manages to kill the complete archived backup, too, thanks to receive -F. Losing backups due to some attack is quite common. Think about HBGary.

But with protected snapshots the -F will fail because it tries to delete a protected snapshot and fails. Hence the backup stays safe (as long as the backup server stays safe, which is relatively easy to archive if you put it outside of your production and pulling the sends from production).

Also it is always a PITA to find out which snapshot may be important in the archive while it is already destroyed on the source, BEFORE you apply -F in case ZFS else refuses to receive the incremental snapshot (again: zfs restore is sometimes a solution, but does not work in the general case).

But with some independent protection feature you do no more puzzle as you can see the ticket ID on the protection without need to keep some additional informational hints. So you can easily spot important snapshots held in the archive and save the important parts before releasing the protection. And independent people can set independent locks this way, too, as you can set multiple protections.

Note that I am talking about productions which has several TiB online with a incremental of several GiB per day. And some law which perhaps says, you must keep the backup history for 30 years or so.

Hence backing up the archive to recreate it, with several 100 TiB, isn't very feasible. So neither is possible:

  • You cannot re-send production, as even with 100 MB/s this takes several days of downtime to take a full backup
  • You cannot backup the archive before receive -F, as this takes even longer at data rates of 1 GB/s and above.

In 99% of the time there will be no problem, as you can just receive -F and kill the no more necessary snapshots in the archive by this. But in the 1% case you want receive -F to fail, and these are usually the important cases to spot some unanticipated problem.

To sum it up:

  • The protection feature shall be independent from zfs receive -F to protect against accidental dropping of important snapshots
  • The protection feature shall use words, such that you can protect against running the right procedure on the wrong system, which then makes it very easy to implement safeguards by yourself.
  • Multiple independent protections shall be possible on the same thing. (Just some word is enough. Please provide at least an entropy of 127^56 for each word and at least 64 different words per item - read: 56 bytes out of an Alphabet of 127 different characters with everything fitting into some 4K sector which leaves enough room for additional list management.)

It is right that you can always write scripts which are plain too clever and hence automatically adapt to the safeguards. But this then is no more a ZFS issue, this is purely on your own local side.

But without such protections and with the critical need of zfs receive -F now and then, there is definitively some (IMHO bigger) issue on the ZFS side, which I'd like to see fixed properly.

(Perhaps fix the zfs receive -F problem as well? I do not understand why it removes all snapshots which are not on the source, even those which do not affect the latest state at all. But perhaps there is some reason for it which I do not know.)

PS: For those who think zfs hold is enough, there is no similar command on zpool and I need it for pools, too. There are automated scripts which create and destroy pools. And these scripts are triggered by some UI (like ProxMox) via some editable feature (like Ansible or CloudInit). The UI is not written by me (i.E. ProxMox) and the scripts are always done in a hurry (who gets the time necessary these days from your boss?). There should be some way to protect the system against accidental errors on zpool level, too, and I am not convinced that there is a guarantee, that a zfs hold paradigm can be safely and correctly extended to incremental send/receive situation. (I did not do the tests, sorry, but this does not say anything, perhaps somebody finds a way to receive changes to hold. I am just not sure.)

@GregorKopka
Copy link
Contributor

(Perhaps fix the zfs receive -F problem as well? I do not understand why it removes all snapshots which are not on the source, even those which do not affect the latest state at all. But perhaps there is some reason for it which I do not know.)

To answer this: it only happens for streams created using zfs send -R|-p, see #5341

@ridingtheflow
Copy link

You can use following method to protect datasets:

zfs create tank/expensive-data/nodelete
zfs snapshot tank/expensive-data/nodelete@nodelete
zfs hold safety tank/expensive-data/nodelete@nodelete

First you create child dataset which will be empty and occupy no space.
Then you make snapshot on this dataset. Because its owning dataset is empty, snapshot will never take any space either.
Then you hold the snapshot. This will stop any destroy commands on its parent datasets (expensive-data). It won't be possible to remove expensive-data even with -r since it has child and this child can't be removed because it has a held snapshot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indicates a good issue for first-time contributors Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests