Replace ancient 64-btrfs.rules with newer systemd rules #1984

flukejones · 2018-11-01T10:01:53Z

This solves the failure to boot when the root fs is on a btrfs handled
raid set-up. The reason it fails is typically because one or both disks
failure are not initialised before the kernel wants to mount root.

Solves issues #1329 and #1980

phillxnet · 2018-11-01T19:34:30Z

@Luke-Nukem Well done on making your first pull request here.

@Luke-Nukem and @schakrava, with respect however I do not recommend that this is merged as it only serves at the CentOS systemd level. Rockstor code itself is still in needs of more work on this front (btrfs raid root). There is the assumption throughout the code that the system disk is a single device. And once that code is updated appropriately we will also need the appropriate unit tests to address this additional behaviour. There is also no evidence of testing existing behaviour and how this change might:

1: add no regressions
2: improve a supported behaviour.

Especially in the context of our data pools.

Plus I think openSUSE are also only currently supporting single disk system installs so that also does not bode well for these changes and our proposed plans.

I.e. it fixes a currently unsupported install arrangement with not other indicated advantages but there may well be a number of issue that it introduces. I know for instance that there is still disagreement between the btrfs developers and the systemd developers on how things should be done and the last few years of Rockstor's development has been done with this rule in place.

What we need here first if for our system disk associated code to first be moved over to using the more recent redirect role, as all but the system disk associated code already does / can. And then, once our upstream OS supports btrfs raid we begin to accommodate it ourselves. It would definitely be a key feature to have when the time is right.

@Luke-Nukem Sorry if this sounds a bit negative but we have only just reached a certain level of stability and I'm very weary of changing our base system in this way. For instance has this change been tested with anything other than a currently unsupported arrangement (how it affects the system disk when in btrfs raid). I.e how does it change our existing behaviour re data disk missing behaviour etc.

Just very weary of this level of change, especially as a first commit in what is actually a large and complicated project.

I propose that we leave this open and re-asses it's relevance once we move to openSUSE as then I suspect it will be redundant as a major reason for that move it to inherit their expertise on such things and presumably also their appropriately configured btrfs systemd rules, against which we can re-asses our own behaviours.

So essentially, if it isn't broken (i.e. only aids in an otherwise unsupported arrangement that in turn is also currently un-managed / unrecognised by our own code).

However much thanks for your efforts and well done on linking also to @greghensley
"Conflicting udev rules causing failure to boot with root on multi-device btrfs" #1329

Plus the final say is of course down to project lead @schakrava who may see this entirely differently.

I'm just a bit nervous at this late stage with btrfs systemd changes that don't address a currently supported install arrangement, even in openSUSE. But of course I may have the potential side effects of this rather low level change all out of context.

Could you maybe explain the exact nature of this change and it's potential ramifications on our current btrfs device detection etc, ie does it affect our behaviour booting when one device is missing from a data pool as we are currently able to change mount options and have that pool re-mounted in for example ro,degraded and then lead the user through a series of contextual messages on assisting them with the necessary repair (see #1700). Also keep in mind that at least my efforts are mainly focused on mending existing regressions or short falls in basic / expected behaviour and ensuring we move to a more modern OS base re btrrfs. Also could you reference the source of the file, if it is not of your own construction.

Thanks again on this and if anyone else wishes to attempt to ally my worries re this systemd change then do please chip in, I'm certainly no systemd expert and may just be overly cautions on adding such a low level change; especially when it seems to only address an unsupported arrangement.

Note that if this does not get merged directly I suggest that you leave it open as it may well serve those interested who can also in turn test it's potential ramifications, ie I see that it pertains to your most welcome recent form thread:
Linking for context.
https://forum.rockstor.com/t/booting-with-rockstor-installed-on-btrfs-raid/5381

@greghensley are you able to reviewing this pr? Given my above worries.

flukejones · 2018-11-01T20:17:37Z

1: add no regressions
2: improve a supported behaviour.

Especially in the context of our data pools.

Plus I think openSUSE are also only currently supporting single disk system installs so that also does not bode well for these changes and our proposed plans.

Okay, so yeah there are no regressions (tested using my NAS + in VM, single disk + multidisk). The old udev rule included in Rockstor is I think 7 years old or so, it's a remnant of the older script type init systems and is a bit of a hack.

This new rule comes directly from systemd, and fixes pretty much all issues I ever had. In fact I would prefer that this rule file be removed from the ./conf directory completely and new installs use the one provided by systemd.

The purpose of the script as far as I can see is to prevent systemd trying to mount root before btrfs has scanned and initialised any disks, once they are ready then systemd proceeds with mounting. This goes for single and multiple.

I'll restate - this fix isn't specific to multi-disk btrfs root, it's to fix btrfs multi-disk raid in general. And I'd much prefer to remove the rule from ./conf completely so that it always comes from the systemd packaging. This means that there will be no regressions in openSUSE either.

I really can't find, or think of any case where the old udev rule is helpful. systemd does a much better job of handling btrfs disk scans and initialising than that.

phillxnet · 2018-11-01T21:06:11Z

@Luke-Nukem Thanks for the follow up, much appreciated.

The following does seem like more the way to go actually, at least in the light of the openSUSE move where we will have to start afresh on assessing our behaviour with their given systemd / btrfs configs:

"In fact I would prefer that this rule file be removed from the ./conf directory completely and new installs use the one provided by systemd."

One of my concerns is in the following:

"The purpose of the script as far as I can see is to prevent systemd trying to mount root before btrfs has scanned and initialised any disks, once they are ready then systemd proceeds with mounting."

for a multi device redundant data pool, if a single device is missing will systemd wait indefinitely for all members to be present? This needs to be tested in a Rockstor scenario as unlike the system disk we manage all mounts of the data pools in our own code and cycle through the available disks and can then cope with a missing disk by advising the required steps to mount it (at least in stable / master branch). Hopefully systemd will leave our data pools alone given they have no fstab entries.

"I'll restate - this fix isn't specific to multi-disk btrfs root, it's to fix btrfs multi-disk raid in general. And I'd much prefer to remove the rule from ./conf completely so that it always comes from the systemd packaging. This means that there will be no regressions in openSUSE either."

Well put and I think we are converging on that approach at least.

Do you fancy reworking this pr so we take the more hands off approach as per apparently is our preference / convergence?

I'd also like to be sure that when we have a data disk missing from a pool that we are still able to function as we do currently. I.e. present the warning to the user and advise, at each step (in the pool's details page) accordingly, and once advice is enacted, things proceed as intended. I.e. a 3 drive btrfs raid1 data pool booted with a disk member detached. I.e. preserving the functionality added in #1700 (given the existing caveat of #1722) for example.

Thanks for working through this one with me by the way. I'm just concerned that we don't break things and a change at such a low level as this could bite us easily is my worry.

flukejones · 2018-11-01T22:30:52Z

for a multi device redundant data pool, if a single device is missing will systemd wait indefinitely for all members to be present?

No, this only tells systemd that the disks are ready or not. btrfs knows if a disk is missing and behaves accordingly.

Hopefully systemd will leave our data pools alone given they have no fstab entries.

It will as the discs are not in the fstab.

I'd also like to be sure that when we have a data disk missing from a pool that we are still able to function as we do currently

I imagine that it would end up functioning much better 😸

Yeah I'll re-commit with the rule removed completely.

This solves the failure to boot when the root fs is on a btrfs handled raid set-up. The reason it fails is typically because one or both disks failure are not initialised before the kernel wants to mount root. This generally affects only disks that are mounted during boot. Solves issues rockstor#1329 and rockstor#1980

phillxnet · 2019-01-07T17:43:30Z

Quick note pertaining to this pr / related issue.
We also have a post install script within our CentOS rpm's that pertains to this:

rpm -qp --scripts rockstor-3.9.2-43.x86_64.rpm

postinstall scriptlet (using /bin/sh):
#create /dev/btrfs-control if it doesn't exist already.
if [ ! -c /dev/btrfs-control ]; then
   /bin/mknod /dev/btrfs-control c 10 234
fi

#create the udev rule
cp /opt/rockstor/conf/64-btrfs.rules /etc/udev/rules.d/

mkdir -p /etc/systemd/system-preset
cp /opt/rockstor/conf/90-default.preset /etc/systemd/system-preset/

mkdir -p /opt/rockstor/var/log
mkdir -p /opt/rockstor/src/rockstor/logs
if [ "$1" = "2" ]; then

   cp /opt/rockstor/conf/django-hack /opt/rockstor/bin/django
   for f in `find /opt/rockstor/var/log -type f`;
   do echo "Package upgrade" >> $f; done

fi

cp /opt/rockstor/conf/rockstor-pre.service /etc/systemd/system/rockstor-pre.service
cp /opt/rockstor/conf/rockstor.service /etc/systemd/system/rockstor.service
cp /opt/rockstor/conf/rockstor-bootstrap.service /etc/systemd/system/rockstor-bootstrap.service
systemctl enable rockstor-pre rockstor rockstor-bootstrap.service
systemctl daemon-reload
echo "systemctl daemon-reload done"

exit 0
...

Obviously a candidate for change when we start releasing the openSUSE based rpms.

And something to watch if we end up removing the the file this rpm post install script expects to find.

phillxnet · 2020-05-20T11:07:31Z

On the rpm %post side the following was removed as from 1st April 2020:

#create the udev rule
cp /opt/rockstor/conf/64-btrfs.rules /etc/udev/rules.d/

Additionally, given our target distros can now be assumed to already have this, via there native udev rules, we can remove the following from the same section:

#Create /dev/btrfs-control if it doesn't exist already.
if [ ! -c /dev/btrfs-control ]; then
/bin/mknod /dev/btrfs-control c 10 234
fi

And if this device node doesn't exist then it's no longer our remit to establish it, as per the prior btrfs udev rule.

phillxnet · 2021-12-29T19:03:50Z

Re:

Additionally, given our target distros can now be assumed to already have this, via there native udev rules, we can remove the following from the same section:

As from 3.9.2-58, around a year ago, we no longer take on the responsibility of establishing the btrfs-control node.

phillxnet · 2021-12-29T19:11:02Z

Closing as we and our new openSUSE upstream, via our V4 "Built on openSUSE" endeavour do not yet support mult-dev boot setups. And the referenced file updated in this pull request is also no longer resourced by our post install rpm script as it was in our CentOS days (as from 1st April 2020 as per a prior comment). And we no longer release updates for CentOS.

We should have a fresh issue and pull request to simply remove the now unused file of conf/64-btrfs.rules.

Closing as part of house keeping as we approach our first v4 stable release.

flukejones force-pushed the master branch from 5ed7c5c to 0b17763 Compare November 1, 2018 22:32

phillxnet closed this Dec 29, 2021

Hooverdan96 mentioned this pull request Dec 16, 2023

Conflicting udev rules causing failure to boot with root on multi-device btrfs #1329

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace ancient 64-btrfs.rules with newer systemd rules #1984

Replace ancient 64-btrfs.rules with newer systemd rules #1984

flukejones commented Nov 1, 2018

phillxnet commented Nov 1, 2018

flukejones commented Nov 1, 2018 •

edited

Loading

phillxnet commented Nov 1, 2018

flukejones commented Nov 1, 2018

phillxnet commented Jan 7, 2019

phillxnet commented May 20, 2020

phillxnet commented Dec 29, 2021

phillxnet commented Dec 29, 2021

Replace ancient 64-btrfs.rules with newer systemd rules #1984

Replace ancient 64-btrfs.rules with newer systemd rules #1984

Conversation

flukejones commented Nov 1, 2018

phillxnet commented Nov 1, 2018

flukejones commented Nov 1, 2018 • edited Loading

phillxnet commented Nov 1, 2018

flukejones commented Nov 1, 2018

phillxnet commented Jan 7, 2019

phillxnet commented May 20, 2020

phillxnet commented Dec 29, 2021

phillxnet commented Dec 29, 2021

flukejones commented Nov 1, 2018 •

edited

Loading