Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v0.3] Add ContainerFilesystems to ImageFsInfoResponse #407

Merged
merged 2 commits into from
Oct 31, 2024

Conversation

kinarashah
Copy link

@kinarashah kinarashah commented Oct 25, 2024

Problem

Kubelet complaining about missing image stats in logs:

E1025 00:48:49.497486    6244 eviction_manager.go:212] "Eviction manager: failed to synchronize" err="eviction manager: failed to get HasDedicatedImageFs: missing image stats: &ImageFsInfoResponse{ImageFilesystems:[]*FilesystemUsage{&FilesystemUsage{Timestamp:1729817329497103360,FsId:&FilesystemIdentifier{Mountpoint:/var/lib/docker,},UsedBytes:&UInt64Value{Value:4951261692,},InodesUsed:&UInt64Value{Value:255210,},},},ContainerFilesystems:[]*FilesystemUsage{},}"

Issue

#408

Solution

In version v1.31, the feature gate KubeletSeparateDiskGC is enabled, and the ImageFsInfoResponse now includes container filesystem information. kubernetes/kubernetes#120914

Leveraging the current rwLayerSize for each container to determine the total writable layer size.

Note: I attempted to bump Kubernetes up to v1.31, but that requires significant changes for k8s.io/cri-api/v1alpha2. So I decided to stick with v1.29 instead which is the minimum version of k8s.io/cri-api that introduces ContainerFilesystems. (Looks like this part is already being covered here: #333)

@kinarashah
Copy link
Author

@xinfengliu Could you take a look at this?

@@ -63,6 +63,15 @@ func (ds *dockerService) imageFsInfo() (*runtimeapi.ImageFsInfoResponse, error)
}
logrus.Debugf("Total used bytes by docker images: %v", totalImageSize)

var totalRWLayerSize uint64
for cid, _ := range ds.containerStatsCache.stats {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a RW lock for accessing containerStatsCache, maybe a better way is to add a new method for containerStatsCache in core/stats.go?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, will update

@xinfengliu
Copy link
Contributor

@kinarashah Thank you for the pull request.

Could you confirm if the PR will break older versions of kubernetes?

Also, We generally create PR against the master branch first, then back-port it to other release branches.

@kinarashah
Copy link
Author

@xinfengliu Thank you for the quick review and context, I'll test it out with older versions and open a PR against master branch today.

@kinarashah kinarashah changed the title Add ContainerFilesystems to ImageFsInfoResponse [v0.3] Add ContainerFilesystems to ImageFsInfoResponse Oct 29, 2024
@kinarashah
Copy link
Author

@xinfengliu Updated the PR. Here's the PR for the master branch - #409.

Also, confirmed that this doesn't break earlier kubernetes versions because the feature flag is disabled by default. Feature flag is enabled by default for the first time with v1.31 and is still in beta.

Copy link
Contributor

@xinfengliu xinfengliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@xinfengliu
Copy link
Contributor

Thank you @kinarashah . I have approved both PRs.
I'm not the maintainer of this repo, so you may also need other folks such as @nwneisen to review and merge them.

Copy link
Collaborator

@nwneisen nwneisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kinarashah Please address the failing unit tests

@kinarashah
Copy link
Author

kinarashah commented Oct 30, 2024

@nwneisen Unit test fixed and make test is now passing locally. Also updated the PR for master branch.

@kinarashah
Copy link
Author

@nwneisen Thanks for restarting the workflow. I noticed two actions are failing, but they don’t appear to be related to this PR. Could you check if any additional changes are needed from my side?

@nwneisen
Copy link
Collaborator

@kinarashah They do not look like they are related failures. I will merge with them failing

@nwneisen nwneisen merged commit a3f6344 into Mirantis:release/0.3 Oct 31, 2024
9 of 11 checks passed
@xinfengliu
Copy link
Contributor

@kinarashah and @nwneisen
Shall we get the corresponding PR #409 in master branch merged too?

@kinarashah
Copy link
Author

@xinfengliu sorry for the delay, I'll update the PR for master branch today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants