Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move analyze phase before detect #135

Merged
merged 12 commits into from
Feb 17, 2021
111 changes: 111 additions & 0 deletions text/0000-move-analyze-phase.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Meta
[meta]: #meta
- Name: Move analyze phase
- Start Date: 2021-01-13
- Author(s): [@jkutner](github.com/jkutner/) [@jabrown85](github.com/jabrown85)
- RFC Pull Request: (leave blank)
- CNB Pull Request: [buildpacks/spec#172](https://github.com/buildpacks/spec/pull/172)
- CNB Issue: (leave blank)
- Supersedes: N/A

# Summary
[summary]: #summary

This is a proposal to re-order and adjust Lifecycle phases. Specifically, moving "analyze" before "detect".

# Definitions
[definitions]: #definitions

* __project descriptor__ - the [`project.toml`](https://github.com/buildpacks/spec/blob/main/extensions/project-descriptor.md) extension specification

# Motivation
[motivation]: #motivation

Doing this would support the following features and capabilities:
* [Stack buildpacks](https://github.com/buildpacks/rfcs/pull/111), which require a phase to read run-image mixins validation prior to detection
* [Inline buildpacks](https://github.com/buildpacks/rfcs/blob/main/text/0048-inline-buildpack.md), which require parsing of the `project.toml` in the lifecycle
* [Lifecycle configuration](https://github.com/buildpacks/rfcs/pull/128)

# What it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth adding in what Platform API we're going to introduce these changes.

[what-it-is]: #what-it-is

The analyze phase will now run before the detect phase. The analyze phase will have access to secrets and credentials used to access registries and other services, as it does today. Analyze will no longer require a [`group.toml`](https://github.com/buildpacks/spec/blob/main/platform.md#grouptoml-toml). It will do this by splitting off some of the responsibilites into the restore phase.

## Responsibilities
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth calling out that these are new responsibilities, vs. what the analyzer does currently.


* Stack validation, to ensure that a new run-image is campatible with the previous app image
* Retrieve identifier (imageID or digest), stack ID, and mixins, which will be used by subsequent phases
* Validation of registry credentials, to avoid a long build that fails during export phase
* Parsing the project descriptor and performance various operations based on its contents, include:
- downloading buildpacks
- creating ephemeral buildpacks
- applying include and exclude rules
- adding environment variables to <platform>/env
- producing an [`order.toml`](https://github.com/buildpacks/spec/blob/main/platform.md#ordertoml-toml) to be consumed by later phases

## Inputs

* Log level
* Run-image
* Stack ID
* Project descriptor (optional)
* App source code
* Previous Image
* Destination tag(s)
* gid
* uid
* log-level

## Output

* Exit status
* Info-level logs to `stdout`
* Error-level logs to `stderr`
* Analysis metadata [`analyzed.toml`](https://github.com/buildpacks/spec/blob/main/platform.md#analyzedtoml-toml), including run-image information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does analyzer need to care about the cache at this stage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think analyzer will need the cache now, actually. It is already omitted from the inputs here, but only by chance :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well hmm.. maybe analyzer does want the cache-image. But maybe not the cache-dir. So it can validate the registry keychain earlier.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is tangential to this RFC, but maybe it's worth noting that this RFC exists: https://github.com/buildpacks/rfcs/pull/21/files#diff-588fc1758ea351208954c917064a1907620d2e7a5984cd07d433c55bcd683e9b. I don't think anything in this RFC contradicts the other one, just keeping context.

* Buildpacks (derived from inline buildpacks in project descriptor, or buildpacks in project descriptor that are not present in the builder)
* Buildpacks order [`order.toml`](https://github.com/buildpacks/spec/blob/main/platform.md#ordertoml-toml)
* Lifecycle configuration (derived from configuration in project descriptor)
* Mutated app source code (applying include and exclude rules in project descriptor)

# How it Works
[how-it-works]: #how-it-works

A platform MUST execute the analyze phase either by invoking the `/cnb/lifecycle/detector` binary or by executing `/cnb/lifecycle/creator`.

The `analyzer` binary will have access to the [`Keychain`](https://github.com/buildpacks/lifecycle/blob/main/auth/env_keychain.go), and MUST NOT execute arbitrary code provided by either the buildpack user or buildpack author.

The [logic in the `analyzer` phase that reads image metadata and outputs an `analyzed.toml`](https://github.com/buildpacks/lifecycle/blob/main/analyzer.go#L34-L40) would be remain.

The [logic in `pack` that parses a `project.toml`](https://github.com/buildpacks/pack/blob/main/project/project.go) would be copied or moved into the `analyzer`.

The [logic in the `analyzer` phase that analyzes layers](hhttps://github.com/buildpacks/lifecycle/blob/main/analyzer.go#L54-L116) would be moved to the `restorer`. `restorer` already takes in `group.toml` as a flag.

The app source code (which may be provided to the prepare either as a directory, volume, or tarball) would be mutated (either by copying it to a new location, or making changes directly). The `analyzer` may delete files to apply the include and exclude rules from `project.toml`.

# Drawbacks
[drawbacks]: #drawbacks

* Platform maintainers will need to update the order of their container execution and also update flags for `analyzer`, `detector`, and `restorer`.
* Lifecycle will now take on the responsibility of processing `project.toml`

# Alternatives
[alternatives]: #alternatives

- [Introduce Prepare Phase](https://github.com/buildpacks/rfcs/blob/4547fe1ce602877db24f09e5b08bc9713c979be0/text/0000-prepare-phase.md) (this same rfc, previous version)

# Prior Art
[prior-art]: #prior-art

- [Tekton prepare step](https://github.com/tektoncd/catalog/blob/11a17cfe87779099b0b61be3f1e496dfa79646b3/task/buildpacks-phases/0.1/buildpacks-phases.yaml#L61-L78)

# Unresolved Questions
[unresolved-questions]: #unresolved-questions

- Does `pack` still need to parse `project.toml`, or is there value in reading it early on (before lifecycle runs)?
- Should we create a shared library for `project.toml` parsing?
- How should `analyzed.toml` be changed to include run-image information (mixins)

# Spec. Changes
[spec-changes]: #spec-changes

See [buildpacks/spec PR #172](https://github.com/buildpacks/spec/pull/172)