Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX katib-db-manager secrets and service to connect to mysql #2981

Closed

Conversation

pedrovgp
Copy link

@pedrovgp pedrovgp commented Feb 6, 2025

This PR was originated by issue 2980

Investigating the issue, we discovered two problems:

  1. katib-db-manager was using the secret katib-mysql-secret to connect to the database. But in this install (1) the database used by katib is mysql (not katib-mysql, which is not even deployed) and (2) the secrets (username and password) are different
  2. katib-db-manager points to the Service katib-mysql (instead of pointing to the mysql service), which is fine, since the service should point to the same Deployment as the mysql Service. But the selector was not working properly, and it was not pointing to the right Deployment.

This PR fixes both issues by adding patches to the "katib-with-kubeflow" install mode.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Feb 6, 2025

But lets keep this PR open for our integration tests

@juliusvonkohout
Copy link
Member

CC @andreyvelich

@juliusvonkohout juliusvonkohout requested review from andreyvelich and Electronic-Waste and removed request for kimwnasptd February 6, 2025 16:35
@pedrovgp
Copy link
Author

pedrovgp commented Feb 6, 2025

But i am wondering because our cicd works well https://github.com/kubeflow/manifests/blob/master/.github/workflows/katib_test.yaml as you can see here https://github.com/kubeflow/manifests/actions/runs/13119220501/job/36600993054

Indeed, that's weird. You mentioned it is the same as in Kubeflow 1.9. Katib in 1.9 was also not working for us.

I don't think it is anything EKS related because the secrets and app mappings were wrong within the cluster, it had nothing to do with any external AWS resources.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Feb 6, 2025

@kubeflow/release-team definitely something we need to consider for Kubeflow 1.10

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Feb 6, 2025

@pedrovgp

also

# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'patchesStrategicMerge' is deprecated. Please use 'patches' instead. Run 'kustomize edit fix' to update your Kustomization automatically.
# Warning: 'vars' is deprecated. Please use 'replacements' instead. [EXPERIMENTAL] Run 'kustomize edit fix' to update your Kustomization automatically.

needs to be fixed in the upstream manifests

@juliusvonkohout
Copy link
Member

and the tests must be fixed to fail on such problems. Maybe wee need to extend https://github.com/kubeflow/manifests/blob/588e49e267f070578ad0e7a717d9ba79d652371d/.github/workflows/katib_test.yaml to also check whether stuff is persisted to the database.

@juliusvonkohout juliusvonkohout self-assigned this Feb 6, 2025
@juliusvonkohout juliusvonkohout added this to the 1.10 milestone Feb 6, 2025
@pedrovgp
Copy link
Author

pedrovgp commented Feb 6, 2025

Hi @juliusvonkohout , I made the PR in the Katib repo.

Should I now wait for it to merge to master and after that run https://github.com/kubeflow/manifests/blob/master/hack/synchronize-katib-manifests.sh. and make the PR here?

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Feb 7, 2025

Hi @juliusvonkohout , I made the PR in the Katib repo.

Should I now wait for it to merge to master and after that run https://github.com/kubeflow/manifests/blob/master/hack/synchronize-katib-manifests.sh. and make the PR here?

Well as a first step you should extend the tests mentioned above to fail with the current setup, see #2981 (comment). Such problems must be caught in our CI/CD. But otherwise yes, you can work in parallel on both repositories and i can do the synchronization later on.

@google-oss-prow google-oss-prow bot added size/L and removed size/M labels Feb 7, 2025
@pedrovgp pedrovgp force-pushed the issue-2980-katib-db-manager branch from 8ea8b03 to da72660 Compare February 7, 2025 14:02
@juliusvonkohout
Copy link
Member

/ok-to-test

@pedrovgp please make sure to also have the changes int the upstream katib master branch. In the best case we keep them in synchronization for now.

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot removed the size/L label Feb 7, 2025
@juliusvonkohout
Copy link
Member

juliusvonkohout commented Feb 7, 2025

@pedrovgp why did you close the PR ? We need it to run the integration tests here as well.

@juliusvonkohout
Copy link
Member

ah ok, continued in #2983

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants