Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-27163] Add label for the session job to help list the session … #215

Closed
wants to merge 2 commits into from

Conversation

Aitozi
Copy link

@Aitozi Aitozi commented May 14, 2022

…jobs in the same session cluster

This PR is meant to add label for the session job to help to list session jobs under the same session.

image

@Aitozi Aitozi changed the title [FLINK-26784] Add label for the session job to help list the session … [FLINK-27163] Add label for the session job to help list the session … May 14, 2022
@morhidi
Copy link
Contributor

morhidi commented May 14, 2022

LGTM overall. nit: Shouldn't we use a shorter label, e.g:

k get sessionjob -l target=basic-session-cluster

It's easier to remember :)

@morhidi
Copy link
Contributor

morhidi commented May 14, 2022

On second thought we could use something like this for filtering, aren't we?

k get sessionjob --field-selector spec.deploymentName=basic-session-cluster

@@ -76,6 +78,24 @@ public void reconcile(FlinkSessionJob flinkSessionJob, Context context) throws E
OperatorUtils.getSecondaryResource(
flinkSessionJob, context, configManager.getOperatorConfiguration());

var labels = flinkSessionJob.getMetadata().getLabels();
Copy link
Contributor

@morhidi morhidi May 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to extract adding labels into a utility method to keep the reconcile method concise

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a dedicated method.

.equals(labels.get(LABEL_SESSION_CLUSTER))) {
labels.put(LABEL_SESSION_CLUSTER, flinkSessionJob.getSpec().getDeploymentName());
flinkSessionJob.getMetadata().setLabels(labels);
kubernetesClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use an update control for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reconciler do not return UpdateControl now, If we want to use the update control for it, we should adjust the interface first to return the update control :/ . We have just decide not to do this now. I'm not sure whether we need to break it for this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, but this is the first use case that we modify something else then the status. We might need to reconsider this again. Updating the CR deep in the stack sounds worse to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree with it. I'm ok to rework the interface to work out it. Also cc @gyfora @wangyang0918

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We hit some race conditions anyway with the status updates, so it maybe good as is with the java-operator-sdk v2 and we can change it later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I know we have manually patch the status after #199 . So this change do not bring the resourceVersion conflicts, I think. Do you mean keep it as now ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's see what @gyfora and @wangyang0918 think/suggest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you don't explicitly set the resourceversion to null before patching you might still get an error.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but I think this operation is retriable

@Aitozi
Copy link
Author

Aitozi commented May 15, 2022

k get sessionjob --field-selector spec.deploymentName=basic-session-cluster

I tried this way before, but the custom resource seems do not support field selector

kubectl get sessionjob --field-selector spec.deploymentName=basic-session-cluster
Error from server (BadRequest): Unable to find "flink.apache.org/v1beta1, Resource=flinksessionjobs" that match label selector "", field selector "spec.deploymentName=basic-session-cluster": field label not supported: spec.deploymentName

kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.24.0
Kustomize Version: v4.5.4
Server Version: v1.23.3

After search the related issues, I think the custom resource can only support field selector by metadata.namespace and metadata.name

kubernetes/kubernetes#53345

@Aitozi
Copy link
Author

Aitozi commented May 15, 2022

LGTM overall. nit: Shouldn't we use a shorter label, e.g:

k get sessionjob -l target=basic-session-cluster

It's easier to remember :)

Good idea, but the target is a bit too general, I lean to rename it to target.session

@morhidi
Copy link
Contributor

morhidi commented May 15, 2022

k get sessionjob --field-selector spec.deploymentName=basic-session-cluster

I tried this way before, but the custom resource seems do not support field selector

kubectl get sessionjob --field-selector spec.deploymentName=basic-session-cluster
Error from server (BadRequest): Unable to find "flink.apache.org/v1beta1, Resource=flinksessionjobs" that match label selector "", field selector "spec.deploymentName=basic-session-cluster": field label not supported: spec.deploymentName

kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.24.0
Kustomize Version: v4.5.4
Server Version: v1.23.3

After search the related issues, I think the custom resource can only support field selector by metadata.namespace and metadata.name

kubernetes/kubernetes#53345

k get sessionjob --field-selector spec.deploymentName=basic-session-cluster

I tried this way before, but the custom resource seems do not support field selector

kubectl get sessionjob --field-selector spec.deploymentName=basic-session-cluster
Error from server (BadRequest): Unable to find "flink.apache.org/v1beta1, Resource=flinksessionjobs" that match label selector "", field selector "spec.deploymentName=basic-session-cluster": field label not supported: spec.deploymentName

kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.24.0
Kustomize Version: v4.5.4
Server Version: v1.23.3

After search the related issues, I think the custom resource can only support field selector by metadata.namespace and metadata.name

kubernetes/kubernetes#53345

Yeah, I was almost sure you've tried it, but I thought it was worth asking :)

@morhidi
Copy link
Contributor

morhidi commented May 15, 2022

+1
For the record we need this functionality due to: Enable arbitrary CRD field selectors

@wangyang0918
Copy link
Contributor

TBH, I do not like to do the CR .metadata or .spec updates in the reconciler. Especially, this is only for easier filter, not for operator internal use case. In the production environment, I believe the users could do this manually or covered by external job management tools.

I also have another feeling this is the responsibility of mutating admission webhook.

@morhidi
Copy link
Contributor

morhidi commented May 15, 2022

TBH, I do not like to do the CR .metadata or .spec updates in the reconciler. Especially, this is only for easier filter, not for operator internal use case. In the production environment, I believe the users could do this manually or covered by external job management tools.

I also have another feeling this is the responsibility of mutating admission webhook.

I've been also thinking about putting this into the webhook or maybe using the label exclusively for linking the two CRs.

@Aitozi
Copy link
Author

Aitozi commented May 15, 2022

@wangyang0918 My thought on why we need this: although we can manage the job with external job management tools, but we also may need to operate in command line. In the case to operate on all the jobs of a session cluster, eg: delete all the session jobs before shutdown the cluster, it will be useful I think.

I think the mutating admission webhook is a better way. But one shortcoming is it's optional. If disabled, it will not work


But I think we can leave the choice to user, since it's a functionality for convenience it will not affect the core ability of the operator. So I'm +1 to move this to the mutating admission webhook.

@gyfora
Copy link
Contributor

gyfora commented May 15, 2022

I am also leaning toward -1 on the approach in this PR as so far we have explicitly avoided making changes to the metadata/spec for other features.

@gyfora
Copy link
Contributor

gyfora commented May 15, 2022

I think mutating webhooks provide a good alternative solution but I would prefer to revisit this issue after the release.

@Aitozi
Copy link
Author

Aitozi commented May 16, 2022

Thanks for your good suggestion @morhidi @wangyang0918 @gyfora . I will try to implement it with mutating webhook. It's ok to me to leave it for next release

@gyfora
Copy link
Contributor

gyfora commented May 30, 2022

Should we close this PR for now?

@Aitozi
Copy link
Author

Aitozi commented May 30, 2022

closing it now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants