Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/oci registry collector #2185

Merged
merged 1 commit into from
Nov 4, 2024

Conversation

robert-cronin
Copy link
Collaborator

@robert-cronin robert-cronin commented Oct 10, 2024

Description of the PR

This is my attempt at progressing #1279 cc @ridhoq
I think this feature is still one we want to add long-term

Partial fix for #298, this should be merged first and then #2241 will fix the issue.

A couple of thoughts I've had along the way:

Any guidance on the testing or concurrency would be appreciated!

PR Checklist

  • All commits have a Developer Certificate of Origin (DCO) -- they are generated using -s flag to git commit.
  • All new changes are covered by tests
  • If GraphQL schema is changed, make generate has been run
  • If GraphQL schema is changed, GraphQL client updates/additions have been made
  • If OpenAPI spec is changed, make generate has been run
  • If ent schema is changed, make generate has been run
  • If collectsub protobuf has been changed, make proto has been run
  • All CI checks are passing (tests and formatting)
  • All dependent PRs have already been merged

@ridhoq ridhoq mentioned this pull request Oct 10, 2024
10 tasks
@robert-cronin robert-cronin force-pushed the feat/oci-registry-collector branch from 2bffbec to 80631d0 Compare October 14, 2024 05:21
@robert-cronin
Copy link
Collaborator Author

robert-cronin commented Oct 15, 2024

This is still a WIP, but thought I should push what I have so far. A couple more thoughts:

  • I think mocking the registry endpoints is probably the most feasible way to get reliable unit testing. To this end, I have adapted a httptest handler found in the regclient lib that returns mock data for the catalog, manifests and tags endpoints.
  • To address the concern of potentially having all repos in a registry collected serially, there are two mechanisms that might be suitable:
    • processing the tags of an image in a worker queue which I have attempted in the latest commit; and
    • passing each new repo into CollectSub via gRPC which I have yet to attempt.

Any feedback or suggestions would be greatly appreciated!

cc @ridhoq

@robert-cronin robert-cronin force-pushed the feat/oci-registry-collector branch 5 times, most recently from 8f87c71 to 3bac6ae Compare October 18, 2024 05:23
@robert-cronin robert-cronin marked this pull request as ready for review October 18, 2024 05:23
@robert-cronin
Copy link
Collaborator Author

robert-cronin commented Oct 18, 2024

I've created a basic test suite which is working for the registry collector. It is an attempt to model the Pull Workflow that starts from a call to the catalog endpoint. Below are a couple of points that might be of interest to reviewers:

  • OCIRegistryCollector is essentially a wrapper over OCICollector so the collected documents will have the source labelled as OCICollector and not OCIRegistryCollector. Not sure how best to fix this without duplicating the OCICollector code but maybe it is okay.
  • There is probably an easier way to test the registry collector; it was useful for my learning and I figured it might also be useful for other test suites in the future, but I am happy to find some other way to test the feature as well.
  • I've added a new method: DeregisterDocumentCollector to the collector in order to facilitate testing. Without it, the original OCI test suite fails when run in tandem with the new test suite because the registry collector is still registered.

@robert-cronin robert-cronin force-pushed the feat/oci-registry-collector branch 5 times, most recently from 45009e2 to 871dc46 Compare October 21, 2024 01:15
@pxp928
Copy link
Collaborator

pxp928 commented Oct 21, 2024

@robert-cronin is this ready for review?

@robert-cronin
Copy link
Collaborator Author

robert-cronin commented Oct 21, 2024

@robert-cronin is this ready for review?

yes, all ready for review 👍 thank you!

Sorry, I've just noticed some TODO statements that need to be resolved. I guess at this stage any comments on the overall approach and whether I'm missing any key elements would be appreciated. I am still learning my way around the source code.

@robert-cronin robert-cronin force-pushed the feat/oci-registry-collector branch from 871dc46 to 0a415eb Compare October 22, 2024 05:43
@robert-cronin
Copy link
Collaborator Author

I've now addressed the TODO comments and also tried to add the registry collector to the collect sub although I'm not confident I haven't missed anything there; I think I am still confused about how exactly the CollectSources get picked up by the respective collectors.

@robert-cronin
Copy link
Collaborator Author

just to add a bit of extra context; in my local testing I have run the collector against a local ACR registry with a mock image + referrers and that seems to be working (assuming the 404's are expected to be there, I guess they are a part of the OCI collector logging?)
Screenshot below:
Screenshot 2024-10-22 173457
could probably have added a known manifest type but I think it demonstrates that the registry collector is successfully indexing and passing the images onto the oci collector.

@pxp928
Copy link
Collaborator

pxp928 commented Oct 23, 2024

Thanks @robert-cronin! Let me take a look first thing in the morning and I will review your PR.

@robert-cronin
Copy link
Collaborator Author

Thanks @robert-cronin! Let me take a look first thing in the morning and I will review your PR.

@pxp928 no problems, feel free to take your time. Thank you!

Copy link
Contributor

@lumjjb lumjjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR @robert-cronin , cool implementation with the recursive collectors and great testing! I have a couple comments suggesting some changes, but mostly LGTM. Can i also request we split this into the collector implementation and the CLI changes in another PR? Thanks!

@robert-cronin robert-cronin force-pushed the feat/oci-registry-collector branch from 0a415eb to 9272eb5 Compare October 28, 2024 05:04
@lumjjb lumjjb added the needs-review Needs writer LGTM label Oct 28, 2024
@pxp928 pxp928 self-requested a review October 28, 2024 15:08
@robert-cronin robert-cronin force-pushed the feat/oci-registry-collector branch from 9272eb5 to 4f629c2 Compare October 29, 2024 03:35
@robert-cronin robert-cronin requested a review from lumjjb October 29, 2024 03:39
@robert-cronin
Copy link
Collaborator Author

Awesome PR @robert-cronin , cool implementation with the recursive collectors and great testing! I have a couple comments suggesting some changes, but mostly LGTM. Can i also request we split this into the collector implementation and the CLI changes in another PR? Thanks!

I agree, this PR could do with some splitting in two. I've added the CLI-relevant changes to a new PR: #2241. This PR should be merged before that one. Thanks for the review @lumjjb ! back over to you for round 2 👍

@hown3d
Copy link
Contributor

hown3d commented Oct 29, 2024

Really nice!
I could see having a filter for the repositories being useful. For example we have a registry with around 3000+ repos where not all of them are interesting for ingestion into GUAC.

Copy link
Collaborator

@pxp928 pxp928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Just a very minor nit otherwise it looks great! LGTM

@robert-cronin
Copy link
Collaborator Author

robert-cronin commented Oct 29, 2024

Really nice! I could see having a filter for the repositories being useful. For example we have a registry with around 3000+ repos where not all of them are interesting for ingestion into GUAC.

I can see how that would be helpful! I'm wondering would it make sense then to have an option on the collector CLI that takes a regex pattern? e.g. guacone collect registry example.azurecr.io --exclude ".*-deprecated$". Perhaps we can add an issue and tackle it after this PR is merged

@robert-cronin
Copy link
Collaborator Author

This is great! Just a very minor nit otherwise it looks great! LGTM

Thank you @pxp928!

@robert-cronin robert-cronin force-pushed the feat/oci-registry-collector branch from 4f629c2 to 38a060b Compare October 29, 2024 22:59
Signed-off-by: robert-cronin <robert.owen.cronin@gmail.com>
@robert-cronin robert-cronin force-pushed the feat/oci-registry-collector branch from 38a060b to 20bdd4f Compare October 31, 2024 02:32
@kodiakhq kodiakhq bot merged commit bc034e3 into guacsec:main Nov 4, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-review Needs writer LGTM size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants