Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move analyzer before detector #197

Merged
merged 15 commits into from
Apr 29, 2021
Merged

Conversation

jabrown85
Copy link
Contributor

First take at this rfc

platform.md Outdated
[-cache-image <cache-image>]
[-cache-dir <cache-dir>] \
[-cache-image <cache-image>] \
[-daemon] \ # sets <daemon>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this gets confusing in conjunction with our registry only cache image. Maybe we can use the analysis metadata to record whether the previous image is a daemon or registry image?

Also, this might be a really good opportunity to make a long desired change and move from have a single reference key in analyzed.toml that stores either an imageID or digest reference, to having two explicit keys (image_id, registry_digest)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies I thought I updated this comment with an edit. I realized existing analyzer already has this issue (if you want to call it an issue) and this suggestion should probably be out of scope for now. But I wanted to leave these thoughts somewhere, in case we want to incorporate them in future changes.

@hone hone marked this pull request as ready for review March 3, 2021 19:13
@hone hone requested a review from a team as a code owner March 3, 2021 19:13
@jabrown85
Copy link
Contributor Author

@jabrown85 - Add missing items from RFC to support analyzer features not yet represented

First take at [this rfc](https://github.com/buildpacks/rfcs/blob/main/text/0075-move-analyze-phase.md)

Signed-off-by: Jesse Brown <jabrown85@gmail.com>
Signed-off-by: Jesse Brown <jabrown85@gmail.com>
Signed-off-by: Jesse Brown <jabrown85@gmail.com>
Signed-off-by: Jesse Brown <jabrown85@gmail.com>
@jkutner jkutner requested a review from a team March 10, 2021 15:31
@ekcasey ekcasey linked an issue Mar 11, 2021 that may be closed by this pull request
Signed-off-by: Jesse Brown <jabrown85@gmail.com>
platform.md Outdated

- The lifecycle MUST write [analysis metadata](#analyzedtoml-toml) to `<analyzed>` if `<image>` is accessible.
- **If** `<skip-layers>` is `true` the lifecycle MUST NOT perform layer analysis.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **If** `<skip-layers>` is `true` the lifecycle MUST NOT perform layer analysis.
- For each buildpack in `<group>`, if persistent metadata for that buildpack exists in the analysis metadata, lifecycle MUST write a toml representation of the persistent metadata to `<layers>/<buildpack-id>/store.toml`
- **If** `<skip-layers>` is `true` the lifecycle MUST NOT perform layer analysis.

We also need to modify the data format for io.buildpacks.lifecycle.metadata[https://github.com/buildpacks/spec/blob/main/platform.md#iobuildpackslifecyclemetadata-json] to included persistent metadata in the format currently expected by the lifecycle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What has changed here? Sorry, I don't see what needs to be modified.

jabrown85 and others added 2 commits March 12, 2021 16:20
Got all of the ones I could do from GitHub

Signed-off-by: Jesse Brown <jabrown85@gmail.com>

Co-authored-by: Emily Casey <emilykimballcasey@gmail.com>
Signed-off-by: Jesse Brown <jabrown85@gmail.com>
platform.md Outdated
[-analyzed <analyzed>] \
[-daemon] \ # sets <daemon>
[-gid <gid>] \
[-group <group>] \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the lifecycle PR: it looks like group is only defined for earlier platform apis, is that intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I'm remembering how all of this works... we can't have group if analyze goes first...

Suggested change
[-group <group>] \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have group, but not order

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have group right? There is no group until detection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the future we will support stack-group in order to validate mixins, but not group?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I confused myself with the names. We need <order>, not <group>. Fixed in 6480d2e

@nebhale nebhale requested a review from a team March 17, 2021 18:09
@nebhale nebhale added this to the Platform 0.7 milestone Mar 17, 2021
Signed-off-by: Jesse Brown <jabrown85@gmail.com>
Removed cache-dir/cache-image from analyzer
Detail what image compatibility means for previous-image and run-image
Updated stack.toml with build-image's stack and mixin information
Updated analyzed.toml with run and build image mixin information
Updated analzyed.toml with run image reference

Signed-off-by: Jesse Brown <jabrown85@gmail.com>
@jabrown85
Copy link
Contributor Author

I updated the spec after discussions last week that were loosely documented here.

cc @ekcasey @jkutner

@jabrown85 jabrown85 requested review from jkutner and hone March 30, 2021 16:10
@nebhale nebhale requested a review from a team April 7, 2021 18:10
@ekcasey ekcasey changed the base branch from main to platform/0.7 April 9, 2021 13:42
platform.md Outdated
[-analyzed <analyzed>] \
[-daemon] \ # sets <daemon>
[-gid <gid>] \
[-group <group>] \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have group right? There is no group until detection.

platform.md Outdated

- **If** `<daemon>` is `false`, `<image>` MUST be a valid image reference
- **If** `<daemon>` is `true`, `<image>` MUST be either a valid image reference or an imageID
- **If** `<run-image>` is not provided by the platform the value will be resolved from the contents of stack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- **If** `<run-image>` is not provided by the platform the value will be resolved from the contents of stack
- **If** `<run-image>` is not provided by the platform the lifecycle MUST [resolve](#run-image-resolution) the run image from the contents of `stack` or fail if `stack` does not contain a valid run image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 6480d2e

platform.md Outdated
| `<analyzed>` | `CNB_ANALYZED_PATH` | `<layers>/analyzed.toml` | Path to output analysis metadata (see [`analyzed.toml`](#analyzedtoml-toml)
| `<daemon>` | `CNB_USE_DAEMON` | `false` | Analyze image from docker daemon
| `<gid>` | `CNB_GROUP_ID` | | Primary GID of the stack `User`
| `<group>` | `CNB_GROUP_PATH` | `<layers>/group.toml` | Path to group definition (see [`group.toml`](#grouptoml-toml))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `<group>` | `CNB_GROUP_PATH` | `<layers>/group.toml` | Path to group definition (see [`group.toml`](#grouptoml-toml))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 6480d2e

platform.md Outdated
- Stack IDs must match
- Platform/Architecture must be the same
- `<run-image>` `mixins` must be a superset of `<previous-image>` mixins
- The lifecycle MUST ensure registry read/write access to `<run-image>`, `<previous-image>`, `<tag>`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The lifecycle MUST ensure registry read/write access to `<run-image>`, `<previous-image>`, `<tag>`
- The lifecycle MUST ensure registry write access to `<image>` and any provided `<tag>`s.

We don't need write access to the previous image or the run image.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in 6480d2e

platform.md Outdated
Comment on lines 889 to 890
[build-image]
mixins = ["jq", "libgc", "libpq"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[build-image]
mixins = ["jq", "libgc", "libpq"]
[build-image]
stack-id = "<string>"
mixins = ["jq", "libgc", "libpq"]

I want to add this meanly for symmetry with stack.toml. It could be useful eventually if we wanted to ensure that the detector and builder are in fact running on stack provided to analyzer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we bubble this up so stack-id is shared between build and run? When I tried implementing this in pack, it was a little awkward because we don't usually scope a stack ID to only build or run.

Copy link
Member

@ekcasey ekcasey Apr 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkutner I could get behind that idea but I would insist that we make the change in stack.toml as well. We should still check that the run-image actually has the declared stack ID though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added stack-id in 6480d2e

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with [build-image] stack-id, in this PR and I'll open a follow-up PR to change (then we can debate that separately from this PR).

platform.md Outdated
##### Layer analysis
When analyzing a given layer the lifecycle SHALL:
##### Layer Restoration
When restoring a given layer the lifecycle SHALL:
- **If** `build=true`, `cache=false`:
- Do nothing
- **Else if** `launch=true`:
- Write layer metadata read from the analyzed image to `<layers>/<buildpack-id>/<layer-name>.toml`
- Write the layer diffID from the analyzed image to `<layers>/<buildpack-id>/<layer-name>.sha`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Write the layer diffID from the analyzed image to `<layers>/<buildpack-id>/<layer-name>.sha`

We don't need the .sha files anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 6480d2e

platform.md Outdated
Comment on lines 443 to 451
- **If** `build=true`, `cache=false`:
- Do nothing
- **Else if** `launch=true`:
- Write layer metadata read from the analyzed image to `<layers>/<buildpack-id>/<layer-name>.toml`
- Write the layer diffID from the analyzed image to `<layers>/<buildpack-id>/<layer-name>.sha`
- **Else if** `cache=true` AND the cache DOES NOT contain a layer with matching diffID
- Must not write layer metadata
- **Else if** `cache=true`:
- Write layer metadata read from the cache to `<layers>/<buildpack-id>/<layer-name>.toml`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we are missing a description of when to restore cached layer contents? In the interest of normalized independent API maybe we should just refer out to https://github.com/buildpacks/spec/blob/main/buildpack.md#layer-types for a description of what to restore when. Then, any future changes could be a buildpack API concern. We merely need to say that the lifecycle must use the provided cache-dir or cache-image to retrieve layer contents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a stab at removing all the detail here, but I might've removed too much. Let me know what you think. Fixed in 6480d2e

| [exit status] | (see Exit Code table below for values)
| `/dev/stdout` | Logs (info)
| `/dev/stderr` | Logs (warnings, errors)
| `<analyzed>` | Analysis metadata (see [`analyzed.toml`](#analyzedtoml-toml)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are writing run-image metadata in this output. analyzed.toml needs to be an input to exporter and stack.toml can be removed as an exporter input.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<analyzed> is already an input, I removed <stack> in 6480d2e

platform.md Outdated
@@ -1003,6 +1018,9 @@ Where:
[run-image]
image = "<image>"
mirrors = ["<mirror>", "<mirror>"]
[build-image]
stack-id = "<string>"
mixins = ["jq", "libgc", "libpq"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mixins = ["jq", "libgc", "libpq"]
mixins = ["<mixin name>"]

jq etc. are very example-y and we generally don't do that in the data formats. See https://github.com/buildpacks/spec/blob/main/buildpack.md#buildpacktoml-toml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 6480d2e

| `<layers>/<buidpack-id>/<layer>.toml` || | Files containing the layer content metadata of each analyzed layer (see data format in [Buildpack Interface Specification](buildpack.md))
- For each buildpack in `<group>`, if persistent metadata for that buildpack exists in the analysis metadata, lifecycle MUST write a toml representation of the persistent metadata to `<layers>/<buildpack-id>/store.toml`
- **If** `<skip-layers>` is `true` the lifecycle MUST NOT perform layer restoration.
- **Else** the lifecycle MUST perform [layer restoration](#layer-restoration) for any app image layers or cached layers created by any buildpack present in the provided `<group>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this apply to app image layers? #layer-restoration only talks about the cache.

jabrown85 added a commit to buildpacks/lifecycle that referenced this pull request Apr 15, 2021
This is a re-do of #505, porting to the recently updated main branch.

[RFC](buildpacks/rfcs#135)
[Spec PR](buildpacks/spec#197)

Signed-off-by: Jesse Brown <jabrown85@gmail.com>
@ekcasey ekcasey requested a review from sclevine April 21, 2021 18:11
@jkutner
Copy link
Member

jkutner commented Apr 26, 2021

@jabrown85 dco

jabrown85 and others added 3 commits April 26, 2021 11:04
Signed-off-by: Jesse Brown <jabrown85@gmail.com>
Signed-off-by: Jesse Brown <jabrown85@gmail.com>

Co-authored-by: Emily Casey <emilykimballcasey@gmail.com>
For mixing validation, detector will need the mixins written by analyzer and the buildpacks on the builder image.

Signed-off-by: Jesse Brown <jabrown85@gmail.com>
Signed-off-by: Jesse Brown <jabrown85@gmail.com>

Co-authored-by: Emily Casey <emilykimballcasey@gmail.com>
@jabrown85
Copy link
Contributor Author

@sclevine can you take a look at this? We have the approvals but the change is large enough it wouldn't hurt to have more 👀 on it prior to merge.

Thanks!

@jkutner jkutner merged commit c82dd28 into buildpacks:platform/0.7 Apr 29, 2021
jabrown85 added a commit to buildpacks/lifecycle that referenced this pull request Apr 30, 2021
Run `analyzer` before `detector`

[Related RFC](buildpacks/rfcs#135)
[Spec PR](buildpacks/spec#197)

Signed-off-by: Jesse Brown <jabrown85@gmail.com>
Co-authored-by: Natalie Arellano <narellano@vmware.com>
Co-authored-by: Yael Harel <yharel@vmware.com>
jabrown85 added a commit to jabrown85/spec that referenced this pull request May 13, 2021
See [here](buildpacks#197 (comment)). It was intended to remove `<image>` in favor of `-previous-image` and `-tag` but `<image>` got left in by accidently.
jabrown85 added a commit to jabrown85/spec that referenced this pull request May 13, 2021
See [here](buildpacks#197 (comment)). It was intended to remove `<image>` in favor of `-previous-image` and `-tag` but `<image>` got left in by accidently.

Signed-off-by: Jesse Brown <jabrown85@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC 0075] Move analyze phase before detect
7 participants