Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: Update Airflow Airbyte Provider to use the new Airbyte API #37244

Closed
wants to merge 5 commits into from

Conversation

mitcheccles
Copy link

closes: #35011

Updated the Airbyte provider to use the new Airbyte API. This update ensures continued functionality as Airbyte intends to discontinue the Configuration API, which this provider perviously used to interact with Airbyte.

The discussion in issue #35011 mentioned two possible considerations when implementing a fix:

  1. Use Airbyte's official Python SDK
  2. Continue using the Configuration API too, for those running Airbyte v0.4x releases.

[Airbyte's API Python SDK]: The changes in this PR do not use the official Python SDK. This SDK is still in beta. Furthermore, I wanted to keep the changes lightweight and not add another dependency, especially considering the Airbyte Airflow provider only uses 4 of Airbyte's API endpoints.

[Dual API Support]:The change removes support for the Configuration API. It would have been cumbersome to support both API's. Those who are running an older version of Airbyte can install an older version of this provider when setting up Airflow.

Also worth noting, I have tested the changes in this PR on my production Airflow deployment to verify the code works.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link

boring-cyborg bot commented Feb 8, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@eladkal
Copy link
Contributor

eladkal commented Feb 8, 2024

The changes in this PR do not use the official Python SDK. This SDK is still in beta.

Is there official statement of Airbyte saying that? I couldn't find such statement in the SDK GitHub nor in PyPi

Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduce breaking changes

Note

Existed users might use pretty old OSS (last year release) version of Airbyte which only support Configuration API

Option 1
If we would like to introduce Breaking Changes then we need to create notes into the Change log and potentially simple migration guide, e.g. minimal version of Airbyte which support Official API

Option 2
In the other way round if we would like to keep both support for a while we need to create separate Hooks, Operators and etc (simple copy-paste) which communicate with Airbyte API only. In addition we might add UserWarning / FutureWarning into the old Hook, which raise a warning that end users use old API, which discontinued soon.

According to the recent changes into the Airbyte provider: #36780 the users still use Configuration API so if I were you I would choose Option 2

@mitcheccles
Copy link
Author

The changes in this PR do not use the official Python SDK. This SDK is still in beta.

Is there official statement of Airbyte saying that? I couldn't find such statement in the SDK GitHub nor in PyPi
@eladkal
https://github.com/airbytehq/airbyte-api-python-sdk?tab=readme-ov-file#maturity

@mitcheccles
Copy link
Author

Option 1
If we would like to introduce Breaking Changes then we need to create notes into the Change log and potentially simple migration guide, e.g. minimal version of Airbyte which support Official API

@Taragolis - thanks for your comment. What do you think about asking people to downgrade their Airflow Airbyte provider in the case that someone couldn't upgrade their Airbyte deployment? - if we went for option 1 that is. It'd keep this code base clean; without copy/pasted code.

I'm happy if this PR is left open for a while - perhaps get a few more opinions.

@Taragolis
Copy link
Contributor

Sorry for delay response.

What do you think about asking people to downgrade their Airflow Airbyte

In general it is an option, however always unknown how many users read changelogs and not blindly install latest version of provider, or just use version bundled into the Official Apache Airflow Docker image

The problem here that currently provider supports only Configurator API and that is mean that users of this provider only use it as result it might affect almost 100% current users. I'm not sure how many users would be affected, because we do not have any statistic rather than package how often package downloaded: PyPI Stats, PePy

We always tried to deprecate something first, so end users have a chance to smooth transition to the new API.

In addition we have no idea how many users extend AirbyteHook as well as related operators, so it might take a time to them to migrate.

@potiuk
Copy link
Member

potiuk commented Feb 9, 2024

If it was a simple breaking change in parameters that users could correct in their DAGs. I'd be for immediate breaking change. But the fact that users might be using an older version of airbyte OSS makes a big difference, because that might require way more effort to upgrade.

As I see it - this change is not breaking for DAG Authoring but breaking for - potentially - deployment.

Do we know how old version of the OSS Airbyte does not support the new API ? How old it is? Any stats of usage that Airbyte might have?

I think we should assess it based on likelihood of breaking multiple users workflows. If the OSS version of Airbyte that is incompatible is like 2 years old, for example, the likelihood is small. Then we could capture the error that happens in this case and explicitly ask users to downgrade. To @Taragolis comment - as long as we raise a very explicitt error and clearly communicate the user "Your Airbyte OSS version does not support the new API this provider supports, please downgrade porovider to version x.y.z" - or something like that - then option 1 is acceptable. But if someone's OSS fresh version installed 4 months ago will stop working then option 2 is better.

@Taragolis
Copy link
Contributor

Just for the record, I've checked latest Airbyte 0.50.48 locally and both APIs available. I'm not sure but last time I've found that presumably v0.50.13 (released 5 month ago) it is the first version where Airbyte API available for OSS users

Configuration API

curl -X POST "http://127.0.0.1:8000/api/v1/workspaces/list" \
     --header 'accept: application/json' \
     --user airbyte:password

{"workspaces":[{"workspaceId":"c81b0f84-e935-4ca4-974f-b1dbc8e4af58","customerId":"3aa7d861-9af9-4708-86ea-cd089605ef6e","email":"example@org.org","name":"c81b0f84-e935-4ca4-974f-b1dbc8e4af58","slug":"c81b0f84-e935-4ca4-974f-b1dbc8e4af58","initialSetupComplete":true,"displaySetupWizard":false,"anonymousDataCollection":false,"notifications":[],"notificationSettings":{},"defaultGeography":"auto","organizationId":"00000000-0000-0000-0000-000000000000"}]}

Airbyte API

curl -X GET "http://127.0.0.1:8006/v1/workspaces?includeDeleted=false&limit=20&offset=0" \
     --header 'accept: application/json' \
     --user airbyte:password

{"data":[{"workspaceId":"c81b0f84-e935-4ca4-974f-b1dbc8e4af58","name":"c81b0f84-e935-4ca4-974f-b1dbc8e4af58","dataResidency":"auto"}]}

@potiuk
Copy link
Member

potiuk commented Feb 9, 2024

so it's ~ the threshold I had in mind. I'd say it's a bit too early to remove it then and supporting both should be preferred

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 26, 2024
@github-actions github-actions bot closed this Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:airbyte stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Airflow Airbyte Provider to use the new Airbyte API
4 participants