Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lora controller doesn't clean up the stable endpoint ip #600

Closed
Jeffwan opened this issue Jan 24, 2025 · 2 comments
Closed

Lora controller doesn't clean up the stable endpoint ip #600

Jeffwan opened this issue Jan 24, 2025 · 2 comments
Assignees
Labels
Milestone

Comments

@Jeffwan
Copy link
Collaborator

Jeffwan commented Jan 24, 2025

🐛 Describe the bug

Image

Image

I notice the endpoint slice is not updated. I only have one pod but there're two ips. Considering we are doing upgrades, I think the model adapter controller doesn't handle this case well which brings the technical issues.

Image

Steps to Reproduce

No response

Expected behavior

No response

Environment

No response

@Jeffwan Jeffwan added area/gateway area/lora kind/bug Something isn't working labels Jan 24, 2025
@Jeffwan Jeffwan self-assigned this Jan 24, 2025
@Jeffwan Jeffwan added this to the v0.2.0 milestone Jan 24, 2025
@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Jan 25, 2025

this can be reproduced if we have some rolling upgrade happening.

Image

I determine to use this way to update the list instead of using append. When we start to support multiple replicas, we need some refactor anyway.

https://github.com/aibrix/aibrix/blob/3f6ce3ded76b9fed8d3837a60786800cf64d1d71/pkg/controller/modeladapter/modeladapter_controller.go#L862-L865

@Jeffwan
Copy link
Collaborator Author

Jeffwan commented Jan 25, 2025

There's a potential improvement, scheduler should pick up the new pod rather than old pod. Otherwise it will experience another round migration

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant