Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: added RBD features support for krbd #2514

Merged
merged 1 commit into from
Dec 7, 2021
Merged

rbd: added RBD features support for krbd #2514

merged 1 commit into from
Dec 7, 2021

Conversation

k0ste
Copy link
Contributor

@k0ste k0ste commented Sep 16, 2021

Added support for object-map, fast-diff

Describe what this PR does

This PR added support for all currently supported krbd image features:

#define RBD_FEATURES_ALL	(RBD_FEATURE_LAYERING |		\
				 RBD_FEATURE_STRIPINGV2 |	\
				 RBD_FEATURE_EXCLUSIVE_LOCK |	\
				 RBD_FEATURE_OBJECT_MAP |	\
				 RBD_FEATURE_FAST_DIFF |	\
				 RBD_FEATURE_DEEP_FLATTEN |	\
				 RBD_FEATURE_DATA_POOL |	\
				 RBD_FEATURE_OPERATIONS)

/* Features supported by this (client software) implementation. */

#define RBD_FEATURES_SUPPORTED	(RBD_FEATURES_ALL)

The features is described in man rbd

Specifies which RBD format 2 feature should be enabled when creating an image. Multiple features can be enabled by repeating this option multiple times. The following features are supported:

· layering: layering support
· striping: striping v2 support
· exclusive-lock: exclusive locking support
· object-map: object map support (requires exclusive-lock)
· fast-diff: fast diff calculations (requires object-map)
· deep-flatten: snapshot flatten support
· journaling: journaled IO support (requires exclusive-lock)
· data-pool: erasure coded pool support

Related issues

Mention any github issues relevant to this PR. Adding below line
will help to auto close the issue once the PR is merged.

Fixes: #2513

@mergify mergify bot added the component/rbd Issues related to RBD label Sep 16, 2021
@k0ste k0ste force-pushed the help branch 3 times, most recently from 07fdc0e to 54a5e6e Compare September 16, 2021 10:45
@k0ste k0ste changed the title rbd: added RBD features for support for krbd rbd: added RBD features support for krbd Sep 16, 2021
Copy link

@pkalever pkalever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be great if we can add e2e tests for each of these features with krbd and rbd-nbd mounters?

Copy link
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k0ste Thanks for the contribution please take care of the below items

  • Update the existing documentations
  • Update storageclass in helm charts
  • Add E2E for CreateVolume, CreateSnapshot, PVC-PVC Clone, PVC restore, etc for new image features added

@k0ste k0ste force-pushed the help branch 2 times, most recently from fe67565 to 0a4a8dd Compare September 16, 2021 11:32
@k0ste
Copy link
Contributor Author

k0ste commented Sep 16, 2021

  • Update the existing documentations

Updated

  • Update storageclass in helm charts

Updated

  • Add E2E for CreateVolume, CreateSnapshot, PVC-PVC Clone, PVC restore, etc for new image features added

Test added

@k0ste k0ste force-pushed the help branch 4 times, most recently from 2576ff0 to e7390ee Compare September 16, 2021 12:56
Copy link
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k0ste CI is failing can you please check

@tomiles
Copy link

tomiles commented Sep 30, 2021

@k0ste Thanks for this PR, looking forward to this merging. We've recently hit this issue too of using the features already supported in the kernel but not being allowed by ceph-csi.

@pkalever
Copy link

pkalever commented Oct 1, 2021

Honestly! I'm not quite convinced with the approach that we are taking in using various features of krbd. The feature set will be dependent on the kernel version (rbd driver loaded).

We should check for krbd supported features first and if the feature is available use krbd else we should fall back to rbd-nbd.

@tomiles
Copy link

tomiles commented Oct 1, 2021

I agree that detecting it would be even better. But without this PR we are just blocking features supported in most of the current LTS distros kernel versions which also isn’t a good approach.

@k0ste k0ste force-pushed the help branch 3 times, most recently from ce754bc to 9dfb83a Compare October 1, 2021 14:18
@k0ste
Copy link
Contributor Author

k0ste commented Oct 1, 2021

We should check for krbd supported features first and if the feature is available use krbd else we should fall back to rbd-nbd

I think better is fail, because krbd & rbd-nbd performance is dramatically different. Admin should get error and make a self decision - use nbd or update kernel

@pkalever
Copy link

pkalever commented Oct 3, 2021

I think better is fail, because krbd & rbd-nbd performance is dramatically different. Admin should get error and make a self decision - use nbd or update kernel

Do you have any benchmarking results comparing both?
Right, architecture wise it makes sense to agree that the nbd performance would be a bit low, but I don't think they should be too bad or incomparable with krbd.

@pkalever
Copy link

pkalever commented Oct 5, 2021

FYI: for detecting the runtime krbd features supported, here is the PR: #2556

@humblec
Copy link
Collaborator

humblec commented Oct 13, 2021

We should check for krbd supported features first and if the feature is available use krbd else we should fall back to rbd-nbd

I think better is fail, because krbd & rbd-nbd performance is dramatically different. Admin should get error and make a self decision - use nbd or update kernel

@k0ste agreed, however if admin is flexible to fallback, may be the SC option ( fallbackNbd) as in #2556 helps? just checking what is the normal practice/desire from admins in different setups in these kind of scenarios .

@mergify
Copy link
Contributor

mergify bot commented Nov 1, 2021

This pull request now has conflicts with the target branch. Could you please resolve conflicts and force push the corrected changes? 🙏

@mergify mergify bot added the ci/skip/e2e skip running e2e CI jobs label Nov 23, 2021
@Madhu-1
Copy link
Collaborator

Madhu-1 commented Dec 6, 2021

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Dec 6, 2021

rebase

✅ Branch has been successfully rebased

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Dec 6, 2021

@k0ste do you need any help in moving this forward? if you are busy I can cherry-pick your commit and take this to completion by this week. please let me know what do you think?

Copy link
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to adjust the go-tests and fix go linter errors

needRbdNbd: false,
dependsOn: []string{librbd.FeatureNameObjectMap},
},
librbd.FeatureNameDeepFlatten: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k0ste any specific reason for adding deep-flatten image feature? currently deep flatten is handled internally by cephcsi to make clones and snapshots independent of each other

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Dec 6, 2021

I have tested this on my local minikube machine, as expected the mapping fails as minikube uses a lower kernel version (4.9.x).

rbd info replicapool/csi-vol-c3f1eb8a-5650-11ec-af4c-ca30c5bfe2cb
rbd image 'csi-vol-c3f1eb8a-5650-11ec-af4c-ca30c5bfe2cb':
	size 1 GiB in 256 objects
	order 22 (4 MiB objects)
	snapshot_count: 0
	id: 11cbfe59d66d
	block_name_prefix: rbd_data.11cbfe59d66d
	format: 2
	features: layering, exclusive-lock, object-map, fast-diff
	op_features: 
	flags: 
	create_timestamp: Mon Dec  6 04:55:42 2021
	access_timestamp: Mon Dec  6 04:55:42 2021
	modify_timestamp: Mon Dec  6 04:55:42 2021

  • Pod describe ouput
 Warning  FailedMount             0s (x4 over 4s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-47d81c8c-76de-420e-8c63-a130641dbec4" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 6) occurred while running rbd args: [--id csi-rbd-node -m 10.104.3.155:6789 --keyfile=***stripped*** map replicapool/csi-vol-c3f1eb8a-5650-11ec-af4c-ca30c5bfe2cb --device-type krbd --options noudev], rbd error output: rbd: sysfs write failed
rbd: map failed: (6) No such device or address
  • dmesg logs from the minikube node
$ uname -r
4.19.202

$ dmesg
...
[  892.675164] libceph: mon0 10.104.3.155:6789 session established
[  892.675246] libceph: mon0 10.104.3.155:6789 socket closed (con state OPEN)
[  892.675262] libceph: mon0 10.104.3.155:6789 session lost, hunting for new mon
[  892.675594] libceph: mon0 10.104.3.155:6789 session established
[  892.675693] libceph: client4642 fsid e65ffd0f-8425-443f-97e8-276410d951cb
[  892.683899] rbd: image csi-vol-c3f1eb8a-5650-11ec-af4c-ca30c5bfe2cb: image uses unsupported features: 0x18
[  924.899178] libceph: mon0 10.104.3.155:6789 session established
[  924.899249] libceph: mon0 10.104.3.155:6789 socket closed (con state OPEN)
[  924.899265] libceph: mon0 10.104.3.155:6789 session lost, hunting for new mon
[  924.899601] libceph: mon0 10.104.3.155:6789 session established
[  924.899705] libceph: client4657 fsid e65ffd0f-8425-443f-97e8-276410d951cb
[  924.905023] rbd: image csi-vol-c3f1eb8a-5650-11ec-af4c-ca30c5bfe2cb: image uses unsupported features: 0x18
$ 

@k0ste if you can remove the e2e changes we can continue on reviews other changes

@k0ste
Copy link
Contributor Author

k0ste commented Dec 6, 2021

@k0ste if you can remove the e2e changes we can continue on reviews other changes

What exactly e2e changes? I was included everything tips that your team left🤔

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Dec 6, 2021

@k0ste if you can remove the e2e changes we can continue on reviews other changes

What exactly e2e changes? I was included everything tips that your team leftthinking

the newly added e2e test will fail in our e2e environment as we are using a minikube to create a kubernetes cluster. please check #2514 (comment)

@k0ste k0ste force-pushed the help branch 2 times, most recently from 6209d06 to 6d462d8 Compare December 6, 2021 08:43
@k0ste
Copy link
Contributor Author

k0ste commented Dec 6, 2021

deep-flatten was removed as requested

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Dec 6, 2021

Please run CONTAINER_CMD=docker make containerized-test TARGET=go-test and run: CONTAINER_CMD=docker make containerized-test TARGET=go-lint locally to see what is the issue if you see any go-test or go-lint test failure.

@k0ste
Copy link
Contributor Author

k0ste commented Dec 6, 2021

Some CI test failed due:

kernel 4.19.202 does not support required features

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Dec 6, 2021

Some CI test failed due:

kernel 4.19.202 does not support required features

CI is not going to pass as minikube comes with a kernel version of 4.19.202 which does not support fast-diff,obj-map... image feature. i tested this one and mentioned the same at #2514 (comment) and also you need to remove the e2e changes you are adding as I mentioned here #2514 (comment).

Added support for `object-map, fast-diff`

Signed-off-by: Konstantin Shalygin <k0ste@k0ste.ru>
@k0ste
Copy link
Contributor Author

k0ste commented Dec 6, 2021

Some CI test failed due:

kernel 4.19.202 does not support required features

CI is not going to pass as minikube comes with a kernel version of 4.19.202 which does not support fast-diff,obj-map... image feature. i tested this one and mentioned the same at #2514 (comment) and also you need to remove the e2e changes you are adding as I mentioned here #2514 (comment).

e2e changes was removed

Copy link

@pkalever pkalever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we shouldn't have removed e2e tests rather defended the test with util.CheckKernelSupport for the minimum required version, that way the CI will not bailout.

@k0ste can this be reverted with added defence check?

cc: @Madhu-1

{
"layering,journaling",
&rbdVolume{
Mounter: rbdNbdMounter,
Mounter: rbdDefaultMounter,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no harm in this one.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Dec 7, 2021

IMO, we shouldn't have removed e2e tests rather defended the test with util.CheckKernelSupport for the minimum required version, that way the CI will not bailout.

Yes, but I don't think it's useful as this will never get executed in near time as we are still on 4.9 and what we need is 5.3 and moreover, we need to rework that one to check the kernel version on all nodes and schedule pod for that node.

Copy link
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

{
"layering,journaling",
&rbdVolume{
Mounter: rbdNbdMounter,
Mounter: rbdDefaultMounter,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no harm in this one.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Dec 7, 2021

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Dec 7, 2021

rebase

☑️ Nothing to do

  • -closed [:pushpin: rebase requirement]
  • #commits-behind>0 [:pushpin: rebase requirement]

@pkalever
Copy link

pkalever commented Dec 7, 2021

Yes, but I don't think it's useful as this will never get executed in near time as we are still on 4.9 and what we need is 5.3 and moreover, we need to rework that one to check the kernel version on all nodes and schedule pod for that node.

Soon minikube will have 5.x, minkube community is working on it. I just wanted to avoid rework on it later.

@k0ste I don't insist on having the e2e, but please create an issue for this so that we don't miss the e2e coverage later.

Copy link

@pkalever pkalever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the changes LGTM. My only concern is on the e2e tests. Assuming @k0ste will open an issue for e2e tests later. Approving this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/rbd Issues related to RBD
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for all Ceph RBD features
6 participants