Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add build push docker image image in the publish ci #311

Merged
merged 22 commits into from
Nov 30, 2024

Conversation

mo-dkrz
Copy link
Contributor

@mo-dkrz mo-dkrz commented Nov 14, 2024

Related Issue(s):

Description:
Since I needed to have this docker images of elastic search and open search, I added a new publish docker image step in publish.yml
ping @jonhealy1
PR Checklist:

  • Code is formatted and linted (run pre-commit run --all-files)
  • Tests pass (run make test)
  • Documentation has been updated to reflect changes, if applicable
  • Changes are added to the changelog

@pedro-cf pedro-cf requested a review from jonhealy1 November 14, 2024 12:48
@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Nov 14, 2024

PR is not ready yet ... I will ping you when it's ready

@mo-dkrz mo-dkrz changed the title add build push docker image image in the publish ci draft: add build push docker image image in the publish ci Nov 14, 2024
@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Nov 14, 2024

it's done from my side
green pipeline:
https://github.com/mo-dkrz/stac-fastapi-elasticsearch-opensearch/actions/runs/11840542263/job/32994615762
stac-fastapi-es image:
https://github.com/mo-dkrz/stac-fastapi-elasticsearch-opensearch/pkgs/container/stac-fastapi-es
stac-fastapi-os image:
https://github.com/mo-dkrz/stac-fastapi-elasticsearch-opensearch/pkgs/container/stac-fastapi-os

ping @jonhealy1 and @jamesfisher-geo it's ready for code review. also if it;s possible please take a test on the produced images to ensure workflow works fine, thanks

@mo-dkrz mo-dkrz changed the title draft: add build push docker image image in the publish ci add build push docker image image in the publish ci Nov 14, 2024
jonhealy1
jonhealy1 previously approved these changes Nov 15, 2024
Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice work here! If @jamesfisher-geo @rhysrevans3 @pedro-cf and/or @StijnCaerts have the time to take a look as well it would be really helpful. Thank you.

@jamesfisher-geo
Copy link
Collaborator

Hey @mo-dkrz great work and thanks for the contribution. I have a couple questions/comments

Could you make it clear in the readme which .env parameters are required and which are optional?

There is an issue where default values being set to empty strings if they are not defined in the .env file. For example,

If I run without WEB_CONCURRENCY set in my .env file I get:

File "/usr/local/lib/python3.11/dist-packages/uvicorn/config.py", line 365, in __init__
    self.workers = int(os.environ["WEB_CONCURRENCY"])
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: ''

If I run without setting RELOAD in my .env file:

 File "/usr/local/lib/python3.11/dist-packages/stac_fastapi/elasticsearch/app.py", line 5, in <module>
    from stac_fastapi.api.app import StacApi
  File "/usr/local/lib/python3.11/dist-packages/stac_fastapi/api/app.py", line 31, in <module>
    from stac_fastapi.types.core import AsyncBaseCoreClient, BaseCoreClient
  File "/usr/local/lib/python3.11/dist-packages/stac_fastapi/types/core.py", line 37, in <module>
    api_settings = ApiSettings()
                   ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pydantic_settings/main.py", line 167, in __init__
    super().__init__(
  File "/usr/local/lib/python3.11/dist-packages/pydantic/main.py", line 212, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for ApiSettings
reload
  Input should be a valid boolean, unable to interpret input [type=bool_parsing, input_value='', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/bool_parsing

@jamesfisher-geo
Copy link
Collaborator

jamesfisher-geo commented Nov 16, 2024

Generally I am curious if it is best practice to separate the backend (elasticsearch or opensearch) containers from the stac-fastapi-es and stac-fastapi-os. So the published containers are the STAC API service only @jonhealy1 @StijnCaerts @pedro-cf @rhysrevans3

@@ -0,0 +1,75 @@
FROM debian:bookworm-slim AS base

ARG STAC_FASTAPI_TITLE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is setting all values not specified in the .env file to empty strings. Could you modify to handle default values?

@@ -0,0 +1,75 @@
FROM debian:bookworm-slim AS base

ARG STAC_FASTAPI_TITLE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is setting all values not specified in the .env file to empty strings. Could you modify to handle default values?

README.md Outdated

You need to provide a `.env` file to configure the environment variables. Here's a list of variables you can configure:

- `STAC_FASTAPI_TITLE`: Title of the API shown in the documentation (default: `stac-fastapi-elasticsearch` or `stac-fastapi-opensearch`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make it clear which parameters are required?

@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Nov 16, 2024

Hey @mo-dkrz great work and thanks for the contribution. I have a couple questions/comments

Could you make it clear in the readme which .env parameters are required and which are optional?

There is an issue where default values being set to empty strings if they are not defined in the .env file. For example,

If I run without WEB_CONCURRENCY set in my .env file I get:

File "/usr/local/lib/python3.11/dist-packages/uvicorn/config.py", line 365, in __init__
    self.workers = int(os.environ["WEB_CONCURRENCY"])
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: ''

If I run without setting RELOAD in my .env file:

 File "/usr/local/lib/python3.11/dist-packages/stac_fastapi/elasticsearch/app.py", line 5, in <module>
    from stac_fastapi.api.app import StacApi
  File "/usr/local/lib/python3.11/dist-packages/stac_fastapi/api/app.py", line 31, in <module>
    from stac_fastapi.types.core import AsyncBaseCoreClient, BaseCoreClient
  File "/usr/local/lib/python3.11/dist-packages/stac_fastapi/types/core.py", line 37, in <module>
    api_settings = ApiSettings()
                   ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/pydantic_settings/main.py", line 167, in __init__
    super().__init__(
  File "/usr/local/lib/python3.11/dist-packages/pydantic/main.py", line 212, in __init__
    validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 1 validation error for ApiSettings
reload
  Input should be a valid boolean, unable to interpret input [type=bool_parsing, input_value='', input_type=str]
    For further information visit https://errors.pydantic.dev/2.9/v/bool_parsing

Good point! Let’s tidy up the environment variables. It seems that STAC_FASTAPI_TITLE, STAC_FASTAPI_DESCRIPTION, and STAC_FASTAPI_VERSION already have default values. Similarly, APP_HOST, APP_PORT, and RELOAD are already defined with default values in the base model.
I don’t see any use case for ENVIRONMENT, so it can be removed. As for BACKEND, it’s only used in unit tests, so including it in Dockerfiles or the README isn’t necessary since it defaults automatically.
Plus, ES_VERIFY_CERTS and ES_USE_SSL are set to true by default, but since we want them to default to false, we can keep them as they were. Finally, I can say, none of these variables are strictly necessary anymore. I made the default values on dockerfiles and make the env section on README optional.

Done! pipeline got green, so I’ve updated the Dockerfiles and README accordingly. @jamesfisher-geo could you please take a smoke test with/without defining any env var and review again the README as well? thanks

@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Nov 16, 2024

It’s also a good idea to have all search engines in the same environment, allowing users to enable or disable a specific search engine via environment variables, such as IS_ES or IS_OS. But, it’s unlikely that users can enable both simultaneously. The main issue isn’t about running all search systems at the same time in image; it’s about how STAC FastAPI manages ports. Currently, a single STAC FastAPI instance is not configured to work with both Elasticsearch and OpenSearch in a same time. This means that if both search engines are configured and running, but one will be idle. To address this, we could exit the running image with a warning, prompting the user to select only one search engine. Or we could consider one search engine to run by default.
If you want to run two STAC FastAPI instances from one setup image, this would first require core-level development in STAC FastAPI and, secondly, would make the configuration overly complex for user. Practically speaking, the best approach would be to run one STAC FastAPI instance per search engine in an image, with the flexibility to choose the desired search system. if I'm wrong please correct me!

@jonhealy1
Copy link
Collaborator

Generally I am curious if it is best practice to separate the backend (elasticsearch or opensearch) containers from the stac-fastapi-es and stac-fastapi-os. So the published containers are the STAC API service only @jonhealy1 @StijnCaerts @pedro-cf @rhysrevans3

@jamesfisher-geo @mo-dkrz I think James is right and the database service should be separated from the api. The stac-fastapi-elasticsearch instance for example should be able to connect to an elasticsearch instance/ cluster running anywhere. Many people will be running the api container and the database on different specialised services.

@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Nov 17, 2024

Generally I am curious if it is best practice to separate the backend (elasticsearch or opensearch) containers from the stac-fastapi-es and stac-fastapi-os. So the published containers are the STAC API service only @jonhealy1 @StijnCaerts @pedro-cf @rhysrevans3

@jamesfisher-geo @mo-dkrz I think James is right and the database service should be separated from the api. The stac-fastapi-elasticsearch instance for example should be able to connect to an elasticsearch instance/ cluster running anywhere. Many people will be running the api container and the database on different specialised services.

ähm, based on my experience, I think providing backends in a container as an optional tool offer more flexibility to users and increases its appeal due to ease of configuration and maintenance. For instance, by defining IS_ES and IS_OS, users can opt to disable both, run only the API, and rely on an external backen container hosted elsewhere by defining env vars.. Alternatively, they have the option to run both the search systems and the API within one container by activating one of IS_ES and IS_OS. This approach provides users with better control and more choices. What are your thoughts? If you agree with having optional backends in containers, please let me know to configure it. Otherwise I dismantle the backends from there. @jamesfisher-geo @jonhealy1

@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Nov 17, 2024

I think HAS_ES and HAS_OS are better naming!

@jonhealy1
Copy link
Collaborator

@mo-dkrz having the option to turn the database in the container on or off is interesting. It would have to be well documented

@jonhealy1 jonhealy1 self-requested a review November 17, 2024 09:36
@jonhealy1
Copy link
Collaborator

Using docker compose is useful for development I think. The ghcr containers are useful for deployment but not for reflecting local changes to the api during development, unless I'm missing something here?

@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Nov 17, 2024

@jonhealy1 I tried to document it better in README. could you please review it again?
https://github.com/mo-dkrz/stac-fastapi-elasticsearch-opensearch/tree/add-image?tab=readme-ov-file#installation-and-running

And also docker-compose is still working as the first method ...

@jamesfisher-geo
Copy link
Collaborator

Thanks for the quick responses @mo-dkrz

I would like to suggest a change in the approach here.

For developing the API locally, it is the best-practice to clone the repo and run

docker compose up app-opensearch

or

docker compose up app-elasticsearch

Either of these will create two local Docker images, the respective backend and the STAC API, which you can use to build new features.

Published packages are generally for production use-cases. So users would pull the images of a release and deploy that to their infrastructure, pointing to their instance of elasticsearch/opensearch. It is not useful to include elasticsearch/opensearch in the same published image as the STAC API, at least not in the published images for our main branch here because:

  1. Generally users connect the API to a managed or external instance of elasticsearch/opensearch in production.
  2. Bundling the STAC API and backend in the same image will make updating the image and backend challenging.
  3. Including elasticsearch/opensearch in the image increased the size of the image from ~570MB to ~3GB. That would not be appealing for users that are not using the internal instance of elasticsearch/opensearch.

The publishing steps you have are a great contribution. Would you be open to contributing the work to publish the stac-fastapi-es and stac-fastapi-os images without the backends?

@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Nov 20, 2024

Thanks @jamesfisher-geo for the review; on the weekend I will be back to dismantle the backend from image

@jamesfisher-geo
Copy link
Collaborator

Thanks @jamesfisher-geo for the review; on the weekend I will be back to dismantle the backend from image

Thanks so much!

@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Nov 24, 2024

I think it's done; I removed the backend from the images and reflected the changes on README and action as well.
Could you please review it again. @jamesfisher-geo @jonhealy1 sorry again for making the PR so long. If any changes is needed please let me know to change it. Thanks

btw I reflected the stac-utils link of images on docker-compose file. So if you want to test it, you can replace them with
ghcr.io/mo-dkrz/stac-fastapi-os:v2.2.12 for OpenSearch and ghcr.io/mo-dkrz/stac-fastapi-es:v2.2.12 for ElasticSearch

Copy link
Collaborator

@jamesfisher-geo jamesfisher-geo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @mo-dkrz , a few more changes needed here. Thanks

###### Key variables to configure:

| Variable | Description | Default | Required |
|------------------------------|--------------------------------------------------------------------------------------|--------------------------|---------------------------------------------------------------------------------------------|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great

@@ -3,11 +3,8 @@ version: '3.9'
services:
app-elasticsearch:
container_name: stac-fastapi-es
image: stac-utils/stac-fastapi-es
image: ghcr.io/stac-utils/stac-fastapi-es:latest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert this to pulling from the local code like it was doing before (stac-utils/stac-fastapi-es)? The docker-compose file is used for local development, so we want it to build an image from the local code.

restart: always
build:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this back to use the local Dockerfile.dev.es

@@ -35,11 +32,8 @@ services:

app-opensearch:
container_name: stac-fastapi-os
image: stac-utils/stac-fastapi-os
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert this to pulling from the local code like it was doing before (stac-utils/stac-fastapi-os)? The docker-compose file is used for local development, so we want it to build an image from the local code.

restart: always
build:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this back to use the local Dockerfile.dev.os

@@ -0,0 +1,34 @@
FROM python:3.12-slim

ENV APP_HOST="0.0.0.0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but could you remove the ENV lines here? The environment variables should be set when you run the Docker image. Either from an ,env file or as -e tags in the docker run command. They don't need to be included in the Dockerfile.

@@ -0,0 +1,34 @@
FROM python:3.12-slim

ENV STAC_FASTAPI_TITLE="stac-fastapi-opensearch"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but could you remove the ENV lines here? The environment variables should be set when you run the Docker image. Either from an ,env file or as -e tags in the docker run command. They don't need to be included in the Dockerfile.

@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Nov 27, 2024

@jamesfisher-geo I'v updated the change reqs, could you please review it again, thanks

Copy link
Collaborator

@jamesfisher-geo jamesfisher-geo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @mo-dkrz!

@jonhealy1
Copy link
Collaborator

Thanks for the hard work here @mo-dkrz @jamesfisher-geo. The only thing I think we may be missing is some documentation explaining how to use these new docker images.

@mo-dkrz
Copy link
Contributor Author

mo-dkrz commented Nov 28, 2024

Thanks for the hard work here @mo-dkrz @jamesfisher-geo. The only thing I think we may be missing is some documentation explaining how to use these new docker images.

I made a bit of docs in README about installing and running via pre-built docker imgs, but I think it's not regular to add images docs in REAME since we have them in the pkg registry container of github and when each user needs the imgs, they pull them from there. Could you have a look and review it again @jamesfisher-geo @jonhealy1
Thanks

README.md Show resolved Hide resolved
jonhealy1
jonhealy1 previously approved these changes Nov 30, 2024
Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @mo-dkrz thank you

@jonhealy1
Copy link
Collaborator

Quite a few errors with the tests

ERROR stac_fastapi/tests/resources/test_mgmt.py::test_ping_no_param - TypeError: __init__() got an unexpected keyword argument 'app'
ERROR stac_fastapi/tests/route_dependencies/test_route_dependencies.py::test_not_authenticated - TypeError: __init__() got an unexpected keyword argument 'app'
ERROR stac_fastapi/tests/route_dependencies/test_route_dependencies.py::test_authenticated - TypeError: __init__() got an unexpected keyword argument 'app'
============ 27 passed, 6 skipped, 4 warnings, 175 errors in 16.36s ============

@jonhealy1
Copy link
Collaborator

same errors in main branch now

@jonhealy1 jonhealy1 self-requested a review November 30, 2024 05:09
Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are failing because httpx used in tests just released a new version.

@jonhealy1 jonhealy1 merged commit 2fceed1 into stac-utils:main Nov 30, 2024
0 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants