Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connector base image: declare the base image package and implement #30303

Merged
merged 8 commits into from
Sep 22, 2023
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ repos:
- id: black
args: ["--config", "pyproject.toml"]
- repo: https://github.com/timothycrosley/isort
rev: 5.10.1
rev: 5.12.0
hooks:
- id: isort
args:
Expand Down
80 changes: 80 additions & 0 deletions airbyte-ci/connectors/base_images/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# airbyte-connectors-base-images

This python package contains the base images used by Airbyte connectors.
It is intended to be used as a python library.
Our connector build pipeline ([`airbyte-ci`](https://github.com/airbytehq/airbyte/blob/master/airbyte-ci/connectors/pipelines/README.md#L1)) **will** use this library to build the connector images.
Our base images are declared in code, using the [Dagger Python SDK](https://dagger-io.readthedocs.io/en/sdk-python-v0.6.4/).



## Where are the Dockerfiles?
Our base images are not declared using Dockerfiles.
They are declared in code using the [Dagger Python SDK](https://dagger-io.readthedocs.io/en/sdk-python-v0.6.4/).
We prefer this approach because it allows us to interact with base images container as code: we can use python to declare the base images and use the full power of the language to build and test them.
However, we do artificially generate Dockerfiles for debugging and documentation purposes.



### Example for `airbyte/python-connector-base`:
```dockerfile
FROM docker.io/python:3.9.18-slim-bookworm@sha256:44b7f161ed03f85e96d423b9916cdc8cb0509fb970fd643bdbc9896d49e1cad0
RUN ln -snf /usr/share/zoneinfo/Etc/UTC /etc/localtime
RUN pip install --upgrade pip==23.2.1
ENV POETRY_VIRTUALENVS_CREATE=false
ENV POETRY_VIRTUALENVS_IN_PROJECT=false
ENV POETRY_NO_INTERACTION=1
RUN pip install poetry==1.6.1
```



## Base images


### `airbyte/python-connector-base`
bnchrch marked this conversation as resolved.
Show resolved Hide resolved

| Version | Published | Docker Image Address | Changelog |
|---------|-----------|--------------|-----------|
| 1.0.0 | ✅| docker.io/airbyte/python-connector-base:1.0.0@sha256:dd17e347fbda94f7c3abff539be298a65af2d7fc27a307d89297df1081a45c27 | Initial release: based on Python 3.9.18, on slim-bookworm system, with pip==23.2.1 and poetry==1.6.1 |


## How to release a new base image version (example for Python)

### Requirements
* [Docker](https://docs.docker.com/get-docker/)
* [Poetry](https://python-poetry.org/docs/#installation)
* Dockerhub logins

### Steps
1. `poetry install`
2. Open `base_images/python/bases.py`.
3. Make changes to the `AirbytePythonConnectorBaseImage`, you're likely going to change the `get_container` method to change the base image.
4. Implement the `container` property which must return a `dagger.Container` object.
5. **Recommended**: Add new sanity checks to `run_sanity_check` to confirm that the new version is working as expected.
6. Cut a new base image version by running `poetry run generate-release`. You'll need your DockerHub credentials.

It will:
- Prompt you to pick which base image you'd like to publish.
- Prompt you for a major/minor/patch/pre-release version bump.
- Prompt you for a changelog message.
- Run the sanity checks on the new version.
- Optional: Publish the new version to DockerHub.
- Regenerate the docs and the registry json file.
7. Commit and push your changes.
8. Create a PR and ask for a review from the Connector Operations team.

**Please note that if you don't publish your image while cutting the new version you can publish it later with `poetry run publish <repository> <version>`.**
No connector will use the new base image version until its metadata is updated to use it.
If you're not fully confident with the new base image version please:
- please publish it as a pre-release version
- try out the new version on a couple of connectors
- cut a new version with a major/minor/patch bump and publish it
- This steps can happen in different PRs.


## Running tests locally
```bash
poetry run pytest
# Static typing checks
poetry run mypy base_images --check-untyped-defs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really helpful to catch the kind of errors a compiler would catch in the java land.

```
7 changes: 7 additions & 0 deletions airbyte-ci/connectors/base_images/base_images/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#

from rich.console import Console

console = Console()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ why this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to have a global console object to log with rich, it has nice output by default. I think its currently only used in commands.py . If it is I'll move it there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, if its used in multiple places can we move it to its own file?

# console.py

from rich.console import Console

global_console = Console()

Copy link
Contributor Author

@alafanechere alafanechere Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to like to declare global things in __init__.py . Conventionally in python the logger is instantiated in __init__.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair! Ill chalk this up to still learning the python way :)

101 changes: 101 additions & 0 deletions airbyte-ci/connectors/base_images/base_images/bases.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#

"""This module declares common (abstract) classes and methods used by all base images."""
from __future__ import annotations

from abc import ABC, abstractmethod
from typing import final

import dagger
import semver

from .published_image import PublishedImage


class AirbyteConnectorBaseImage(ABC):
"""An abstract class that represents an Airbyte base image.
Please do not declare any Dagger with_exec instruction in this class as in the abstract class context we have no guarantee about the underlying system used in the base image.
"""

@final
def __init__(self, dagger_client: dagger.Client, version: semver.VersionInfo):
"""Initializes the Airbyte base image.

Args:
dagger_client (dagger.Client): The dagger client used to build the base image.
version (semver.VersionInfo): The version of the base image.
"""
self.dagger_client = dagger_client
self.version = version

# INSTANCE PROPERTIES:

@property
def name_with_tag(self) -> str:
"""Returns the full name of the Airbyte base image, with its tag.

Returns:
str: The full name of the Airbyte base image, with its tag.
"""
return f"{self.repository}:{self.version}"

# MANDATORY SUBCLASSES ATTRIBUTES / PROPERTIES:

@property
@abstractmethod
def root_image(self) -> PublishedImage:
"""Returns the base image used to build the Airbyte base image.

Raises:
NotImplementedError: Raised if a subclass does not define a 'root_image' attribute.

Returns:
PublishedImage: The base image used to build the Airbyte base image.
"""
raise NotImplementedError("Subclasses must define a 'root_image' attribute.")

@property
@abstractmethod
def repository(self) -> str:
"""This is the name of the repository where the image will be hosted.
e.g: airbyte/python-connector-base

Raises:
NotImplementedError: Raised if a subclass does not define an 'repository' attribute.

Returns:
str: The repository name where the image will be hosted.
"""
raise NotImplementedError("Subclasses must define an 'repository' attribute.")

# MANDATORY SUBCLASSES METHODS:

@abstractmethod
def get_container(self, platform: dagger.Platform) -> dagger.Container:
"""Returns the container of the Airbyte connector base image."""
raise NotImplementedError("Subclasses must define a 'get_container' method.")

@abstractmethod
async def run_sanity_checks(self, platform: dagger.Platform):
"""Runs sanity checks on the base image container.
This method is called before image publication.

Args:
base_image_version (AirbyteConnectorBaseImage): The base image version on which the sanity checks should run.

Raises:
SanityCheckError: Raised if a sanity check fails.
"""
raise NotImplementedError("Subclasses must define a 'run_sanity_checks' method.")

# INSTANCE METHODS:
@final
def get_base_container(self, platform: dagger.Platform) -> dagger.Container:
"""Returns a container using the base image. This container is used to build the Airbyte base image.

Returns:
dagger.Container: The container using the base python image.
"""
return self.dagger_client.pipeline(self.name_with_tag).container(platform=platform).from_(self.root_image.address)
189 changes: 189 additions & 0 deletions airbyte-ci/connectors/base_images/base_images/commands.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
#
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.
#
import argparse
import sys
from typing import Callable, Type

import anyio
import dagger
import inquirer # type: ignore
import semver
from base_images import bases, console, consts, errors, hacks, publish, utils, version_registry
from jinja2 import Environment, FileSystemLoader


async def _generate_docs(dagger_client: dagger.Client):
"""This function will generate the README.md file from the templates/README.md.j2 template.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 Love the doc string!

It will first load all the registries to render the template with up to date information.
"""
docker_credentials = utils.docker.get_credentials()
env = Environment(loader=FileSystemLoader("base_images/templates"))
template = env.get_template("README.md.j2")
rendered_template = template.render({"registries": await version_registry.get_all_registries(dagger_client, docker_credentials)})
with open("README.md", "w") as readme:
readme.write(rendered_template)
console.log("README.md generated successfully.")


async def _generate_release(dagger_client: dagger.Client):
"""This function will cut a new version on top of the previous one. It will prompt the user for release details: version bump, changelog entry.
The user can optionally publish the new version to our remote registry.
If the version is not published its changelog entry is still persisted.
It can later be published by running the publish command.
In the future we might only allow publishing new pre-release versions from this flow.
"""
docker_credentials = utils.docker.get_credentials()
select_base_image_class_answers = inquirer.prompt(
[
inquirer.List(
"BaseImageClass",
message="Which base image would you like to release a new version for?",
choices=[(BaseImageClass.repository, BaseImageClass) for BaseImageClass in version_registry.MANAGED_BASE_IMAGES],
)
]
)
BaseImageClass = select_base_image_class_answers["BaseImageClass"]
registry = await version_registry.VersionRegistry.load(BaseImageClass, dagger_client, docker_credentials)
latest_entry = registry.latest_entry

# If theres in no latest entry, it means we have no version yet: the registry is empty
# New version will be cut on top of 0.0.0 so this one will actually never be published
seed_version = semver.VersionInfo.parse("0.0.0")
if latest_entry is None:
latest_version = seed_version
else:
latest_version = latest_entry.version

if latest_version != seed_version and not latest_entry.published: # type: ignore
console.log(
f"The latest version of {BaseImageClass.repository} ({latest_version}) has not been published yet. Please publish it first before cutting a new version."
)
sys.exit(1)

new_version_answers = inquirer.prompt(
[
inquirer.List(
"new_version",
message=f"Which kind of new version would you like to cut? (latest version is {latest_version}))",
choices=[
("prerelease", latest_version.bump_prerelease()),
("patch", latest_version.bump_patch()),
("minor", latest_version.bump_minor()),
("major", latest_version.bump_major()),
],
),
inquirer.Text("changelog_entry", message="What should the changelog entry be?", validate=lambda _, entry: len(entry) > 0),
inquirer.Confirm("publish_now", message="Would you like to publish it to our remote registry now?"),
]
)
new_version, changelog_entry, publish_now = (
new_version_answers["new_version"],
new_version_answers["changelog_entry"],
new_version_answers["publish_now"],
)

base_image_version = BaseImageClass(dagger_client, new_version)

try:
await publish.run_sanity_checks(base_image_version)
console.log("Sanity checks passed.")
except errors.SanityCheckError as e:
console.log(f"Sanity checks failed: {e}")
console.log("Aborting.")
sys.exit(1)
dockerfile_example = hacks.get_container_dockerfile(base_image_version.get_container(consts.PLATFORMS_WE_PUBLISH_FOR[0]))

# Add this step we can create a changelog entry: sanity checks passed, image built successfully and sanity checks passed.
changelog_entry = version_registry.ChangelogEntry(new_version, changelog_entry, dockerfile_example)
if publish_now:
published_docker_image = await publish.publish_to_remote_registry(base_image_version)
console.log(f"Published {published_docker_image.address} successfully.")
else:
published_docker_image = None
console.log(
f"Skipping publication. You can publish it later by running `poetry run publish {base_image_version.repository} {new_version}`."
)

new_registry_entry = version_registry.VersionRegistryEntry(published_docker_image, changelog_entry, new_version)
registry.add_entry(new_registry_entry)
console.log(f"Added {new_version} to the registry.")
await _generate_docs(dagger_client)
console.log("Generated docs successfully.")


async def _publish(
dagger_client: dagger.Client, BaseImageClassToPublish: Type[bases.AirbyteConnectorBaseImage], version: semver.VersionInfo
):
"""This function will publish a specific version of a base image to our remote registry.
Users are prompted for confirmation before overwriting an existing version.
If the version does not exist in the registry, the flow is aborted and user is suggested to cut a new version first.
"""
docker_credentials = utils.docker.get_credentials()
registry = await version_registry.VersionRegistry.load(BaseImageClassToPublish, dagger_client, docker_credentials)
registry_entry = registry.get_entry_for_version(version)
if not registry_entry:
console.log(f"No entry found for version {version} in the registry. Please cut a new version first: `poetry run generate-release`")
sys.exit(1)
if registry_entry.published:
force_answers = inquirer.prompt(
[
inquirer.Confirm(
"force", message="This version has already been published to our remote registry. Would you like to overwrite it?"
),
]
)
if not force_answers["force"]:
console.log("Not overwriting the already exiting image.")
sys.exit(0)

base_image_version = BaseImageClassToPublish(dagger_client, version)
published_docker_image = await publish.publish_to_remote_registry(base_image_version)
console.log(f"Published {published_docker_image.address} successfully.")
await _generate_docs(dagger_client)
console.log("Generated docs successfully.")


async def execute_async_command(command_fn: Callable, *args, **kwargs):
"""This is a helper function that will execute a command function in an async context, required by the use of Dagger."""
async with dagger.Connection(dagger.Config(log_output=sys.stderr)) as dagger_client:
await command_fn(dagger_client, *args, **kwargs)


def generate_docs():
"""This command will generate the README.md file from the templates/README.md.j2 template.
It will first load all the registries to render the template with up to date information.
"""
anyio.run(execute_async_command, _generate_docs)


def generate_release():
"""This command will cut a new version on top of the previous one. It will prompt the user for release details: version bump, changelog entry.
The user can optionally publish the new version to our remote registry.
If the version is not published its changelog entry is still persisted.
It can later be published by running the publish command.
In the future we might only allow publishing new pre-release versions from this flow.
"""
anyio.run(execute_async_command, _generate_release)


def publish_existing_version():
"""This command is intended to be used when:
- We have a changelog entry for a new version but it's not published yet (for future publish on merge flows).
- We have a good reason to overwrite an existing version in the remote registry.
"""
parser = argparse.ArgumentParser(description="Publish a specific version of a base image to our remote registry.")
parser.add_argument("repository", help="The base image repository name")
parser.add_argument("version", help="The version to publish")
args = parser.parse_args()

version = semver.VersionInfo.parse(args.version)
BaseImageClassToPublish = None
for BaseImageClass in version_registry.MANAGED_BASE_IMAGES:
if BaseImageClass.repository == args.repository:
BaseImageClassToPublish = BaseImageClass
if BaseImageClassToPublish is None:
console.log(f"Unknown base image name: {args.repository}")
sys.exit(1)

anyio.run(execute_async_command, _publish, BaseImageClassToPublish, version)
Loading