Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrub status "unknown" - Leap 15.6 OS base #2872

Closed
2 tasks done
phillxnet opened this issue Jul 15, 2024 · 9 comments
Closed
2 tasks done

Scrub status "unknown" - Leap 15.6 OS base #2872

phillxnet opened this issue Jul 15, 2024 · 9 comments
Assignees

Comments

@phillxnet
Copy link
Member

phillxnet commented Jul 15, 2024

Thanks to forum member Tex1954 for highlighting this issue. We currently report "unknown" for scrub status on a 15.6 base, but work as expected on a 15.5 base. Thanks to @FroggyFlox & @Hooverdan96 for helping to narrow down this issue.

N.B. @Hooverdan96 reports that a Stable Backport kernel does not exibity this error, this instance would also have a newer btrfs-progs userland as per: https://rockstor.com/docs/howtos/stable_kernel_backport.html

This issue is currently believed to be cosmetic in nature as the scrub itself is executed as expected. However cosmetic failure, in this context, translates to a Web-UI failure in a critical report: adding to our current Stable Milestone as a consequence.

  • Establish unit-test reproducer
  • Establish fix in the context of our existing tests and the new reproducer test.

Forum thread reference: https://forum.rockstor.com/t/5-0-11-scrub-not-working/9585

Seen last in 5.0.12-0

@phillxnet phillxnet added this to the 5.1.X-X Stable release milestone Jul 15, 2024
@phillxnet
Copy link
Member Author

@FroggyFlox has now established the likely area/origin of this issue: hence it's creation. It looks like our scrub parser is failing over to reporting "unknown" when trying to establish a scrub status; for an as yet unknown reason. But we suspect it is down to differences in kernel versions, or our failure to apply the correct parsing accordingly. We currently have a multi-path parsing capability re scrub that adapts to the expected scrub status output: based on kernel version (from memory).

@phillxnet phillxnet self-assigned this Jul 15, 2024
@phillxnet
Copy link
Member Author

In our view we call low-level scrub_status with the Pool and the output of btrfsprogs_legacy: our designation for the appropriate parsing path for scrub_status to take.

@transaction.atomic
def _scrub_status(self, pool):
try:
ps = PoolScrub.objects.filter(pool=pool).order_by("-id")[0]
except:
return Response()
if ps.status == "started" or ps.status == "running":
cur_status = scrub_status(pool, btrfsprogs_legacy())
if (
cur_status["status"] == "finished"
or cur_status["status"] == "halted"
or cur_status["status"] == "cancelled"
):
duration = int(cur_status["duration"])
cur_status["end_time"] = ps.start_time + timedelta(seconds=duration)
del cur_status["duration"]
PoolScrub.objects.filter(id=ps.id).update(**cur_status)
return ps

And our initial value for the pool scrub status (PoolScrub.status) is established when we create the associated model used to track the subprocess we initiate the scrub via; by way of the "started" default:

class PoolScrub(models.Model):
pool = models.ForeignKey(Pool, on_delete=models.CASCADE)
# with a max of 10 chars we use 'halted' to indicated 'interrupted'
status = models.CharField(max_length=10, default="started")
# pid is the process id of a scrub job
pid = models.IntegerField()

So the model is always initiated with scrub status assumed "started" and we then update this via a low level call to scrub_status(): but only if it is a new model (status == "started") or already established as "running".

@FroggyFlox
Copy link
Member

btrfsprogs_legacy

Thanks a lot for creating all of that, @phillxnet!
Your mention of btrfsprogs_legacy makes me wonder now whether we have an issue with parsing btrfs progs version... if we indeed fail to accurately detect that version, then our parsing of the scrub status command will indeed fail most likely. Your idea would fit very well in explaining what I've seen.

@phillxnet
Copy link
Member Author

@FroggyFlox Thanks for kicking this one off. I'm dabbling now and should hopefully have some more idea about where we are failing in a bit.

@phillxnet
Copy link
Member Author

So it seems we are getting to the nub of it here. Just got the following on a 15.6 OS base via some temporary logging:

[15/Jul/2024 14:25:13] INFO [fs.btrfs:1891] btrfsprogs_legacy() out=['btrfs-progs v6.5.1 ', ''], err=[''], rc=0, with btrfs_progs_version=['6', '5', '1']
[15/Jul/2024 14:25:13] INFO [fs.btrfs:1965] scrub_status_raw(mnt_pt=/mnt2/rock-pool, legacy=True) CALLED.

But legacy here is meant to be:

Returns True if "btrfs version" considered legacy: i.e. < "v5.1.2" (approximately).

With the reproducer of an "unknown" status: even after page refresh:
unknown-scrub-status-leap-15 6

@phillxnet
Copy link
Member Author

phillxnet commented Jul 15, 2024

Noteworthy here is that our current 'legacy' parsing pertains to btrfs-progs "... < "v5.1.2" ...", which from the following existing unit test:

def test_btrfsprogs_legacy(self):
"""
Test btrfsprogs_legacy for expected function, of boolean return on depricated
btrfs version.
"""
err = [""]
rc = 0
# btrfs-progs v4.12
outset = [["btrfs-progs v4.12 ", ""]] # e.g. Leap 15.2
is_legacy = [True]
outset.append(["btrfs-progs v4.19.1 ", ""]) # e.g. Leap 15.3
is_legacy.append(True)
outset.append(["btrfs-progs v5.14 ", ""]) # e.g. Leap 15.4
is_legacy.append(False)
outset.append(
["btrfs-progs v6.1.8 ", ""]
) # e.g. Leap 15.4 Stable Kernel Backports
is_legacy.append(False)

equates to Leap versions before 15.4 !! We are no longer publishing rpms for anything older.

As such, it may be we can now drop this legacy test entirely.

@phillxnet
Copy link
Member Author

As such, it may be we can now drop this legacy test entirely.

I'll begin by proving the findings here and working out why we label this newer version as legacy: most likely as we inadvertently roll-over to legacy in our now dated btrfsprogs_legacy() procedure.

@phillxnet
Copy link
Member Author

phillxnet commented Jul 15, 2024

Test extension re legacy btrfs parsing requirement:

Test btrfsprogs_legacy for expected function, of boolean return on deprecated ... test out=['btrfs-progs v4.12 ', ''], expected_result=True
test out=['btrfs-progs v4.19.1 ', ''], expected_result=True
test out=['btrfs-progs v5.14 ', ''], expected_result=False
test out=['btrfs-progs v6.1.8 ', ''], expected_result=False
test out=['btrfs-progs v6.5.1 ', ''], expected_result=False
FAIL
...
======================================================================
FAIL: test_btrfsprogs_legacy (rockstor.fs.tests.test_btrfs.BTRFSTests.test_btrfsprogs_legacy)
Test btrfsprogs_legacy for expected function, of boolean return on deprecated
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/fs/tests/test_btrfs.py", line 1027, in test_btrfsprogs_legacy
    self.assertEqual(
AssertionError: True != False : Un-expected boolean returned: btrfsprogs_legacy. Mock (['btrfs-progs v6.5.1 ', '']), result (True), expected (False)

----------------------------------------------------------------------
Ran 58 tests in 0.083s

FAILED (failures=1)

And as we have @Hooverdan96 report of current Stable Kernel backports functioning as expected (on our reproducer 15.6 base) I've added this to the above test additions:

Test btrfsprogs_legacy for expected function, of boolean return on deprecated ... test out=['btrfs-progs v4.12 ', ''], expected_result=True
test out=['btrfs-progs v4.19.1 ', ''], expected_result=True
test out=['btrfs-progs v5.14 ', ''], expected_result=False
test out=['btrfs-progs v6.1.8 ', ''], expected_result=False
test out=['btrfs-progs v6.9.2 ', ''], expected_result=False
test out=['btrfs-progs v6.5.1 ', ''], expected_result=False
FAIL
...
======================================================================
FAIL: test_btrfsprogs_legacy (rockstor.fs.tests.test_btrfs.BTRFSTests.test_btrfsprogs_legacy)
Test btrfsprogs_legacy for expected function, of boolean return on deprecated
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/rockstor/src/rockstor/fs/tests/test_btrfs.py", line 1030, in test_btrfsprogs_legacy
    self.assertEqual(
AssertionError: True != False : Un-expected boolean returned: btrfsprogs_legacy. Mock (['btrfs-progs v6.5.1 ', '']), result (True), expected (False)

----------------------------------------------------------------------
Ran 58 tests in 0.088s

FAILED (failures=1)

So using TW btrfs-progs version as a stand in for Stable Kernel backports, we successfully identify as non-legacy: we look to have a simple comparison bug here: exposed by the 6.5.1 version specifically. And our prior insufficient testing data.

phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jul 15, 2024
Adopt packaging library as a direct dependency to enable
more sophisticated btrfs-progs version assessment. Previously
already an indirect dependency of gunicorn. Moving to explicit
dependency to avoid issues re gunicorn's deprecation in the
future.

Includes
- Additional test data pertaining to Leap 15.6 and newer,
re btrfs-progs version.
- Additional test data re exception handling (fail throught)
re non PEP 440 compliant 'btrfs-progs version' output.
phillxnet added a commit to phillxnet/rockstor-core that referenced this issue Jul 15, 2024
Adopt packaging library as a direct dependency to enable
more sophisticated btrfs-progs version assessment. Previously
already an indirect dependency of gunicorn. Moving to explicit
dependency to avoid issues re gunicorn's deprecation in the
future.

Includes
- Additional test data pertaining to Leap 15.6 and newer,
re btrfs-progs version.
- Additional test data re exception handling (fail throught)
re non PEP 440 compliant 'btrfs-progs version' output.
phillxnet added a commit that referenced this issue Jul 16, 2024
…eap-15.6-OS-base

Scrub status "unknown" - Leap 15.6 OS base #2872
@phillxnet
Copy link
Member Author

Closing as:
Fixed by #2873

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants