forked from apache/airflow
-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add pre-commits preventing accidental API changes in common.sql
We had recently some mishaps when it comes to changes to the common.sql provider that accidentally changed API of the common.sql providers and they passed both tests and reviews. Common.sql is pretty special provider because it used by other providers and we have to make sure the API it epxoses is backwards compatible, otherwise upgrading common.sql provider might break those providers. This had already happened: * The apache#26761 refactored common.sql provided and removed some methods that were actually used by Google Provider * This caused a problem described in apache#27838 and forced us ot yank common.sql 1.3.0 and release 1.3.1 with the methods added back in apache#27843 This change introduces tools and process to flag common.sql API changes that are likely backwards compatible. They will require a deliberate action from the contributor who introduces a breaking API to common.sql, it will also help to flag to reviewer when such a change is made. The idea is based on how Android OS handles flagging API changes for their public APIs.
- Loading branch information
Showing
10 changed files
with
743 additions
and
164 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# Keeping the API of common.sql provider backwards compatible | ||
|
||
The API of the `common.sql` provider should be kept backwards compatible with previously released versions. | ||
The reason is - there are already released providers that use `common.sql` provider and rely on the API and | ||
behaviour of the `common.sql` provider, and any updates in the API or behaviour of it, might potentially | ||
break those released providers. | ||
|
||
Therefore, we should keep an extra care when changing the APIs. | ||
|
||
The approach we take is similar to one that has been applied by Android OS team to keep their API in check, | ||
and it is based on storing the current varsion of API and flagging changes that are potentially breaking. | ||
This is done by comparing the previous API (store in [api.txt](api.txt) and the upcoming API from the PR. | ||
The upcoming API is automatically extracted from `common.sql` Python files using `update-common-sql-api` | ||
pre-commit. If the comparison determines that the change is potentially breaking, the contributor is asked | ||
to review the changes and manually regenerate the [api.txt](api.txt). | ||
|
||
The details of the workflow are as follows: | ||
|
||
1) The previous API is stored in the (committed to repository) [api.txt](api.txt) file in this folder. | ||
2) Every time when common.sql Python files are modified the `update-common-sql-api` pre-commit regenerates | ||
the API file content and looks for potentially breaking changes are detected (removals or updates of | ||
the existing classes/methods. | ||
3) If the check reveals there are no changes to the API, nothing happens. | ||
4) If there are only additions, the pre-commit automatically updates the [api.txt](api.txt) file, | ||
asks the contributor to commit resulting update to the [api.txt](api.txt) and fails the pre-commit. This | ||
is very similar to other static checks that automatically modify/fix source code. | ||
5) If the pre-commit detects potentially breaking changes, the process is a bit more involved for the | ||
contributor. The pre-commit flags such changes to the contributor by failing the pre-commit and | ||
asks the contributor to review the change looking specifically for breaking compatibility with previous | ||
providers (and fix any backwards compatibility). Once this is completed, the contributor is asked to | ||
manually and explicitly regenerate and commit the new version of the API by running the pre-commit | ||
with manually added environment variable: `UPDATE_COMMON_SQL_API=1 pre-commit run update-common-sql-api`. | ||
|
||
# Verifying other providers to use only public API of the `common.sql` provider | ||
|
||
The file which holds the current API is used by a second pre-commit - `check-common-sql-api-usage`. | ||
|
||
This pre-commit does not do any update - it only checks that any of the providers uses only the | ||
public API stored in the [api.txt](api.txt) when accessing `common.sql` provider. The reason is that some | ||
methods or classes in the common.sql are not supposed to be part of the API - there are helper methods | ||
and classes. Those methods and classes are usually starting with `_` (protected methods) and they are | ||
excluded. | ||
|
||
However - historically - some protected methods have been used, so until we determine it is | ||
safe to remove them, the pre-commit will continue to manually select them as part of the API and they | ||
are protected by this mechanism. This way we can be more certain that the community providers that use | ||
`common.sql` provider only use the APIs that are intended to be used as public. | ||
|
||
Of course, this is never 100% fool-proof, due to the dynamic Python behaviours and due to possible internal | ||
changes in behaviours of the common.sql APIs, but at the very least any of "interface" changes will be | ||
flagged in any change to contributor making a change and also reviewers will be notified whenever any | ||
API change happens. | ||
|
||
# External usage, tracking API changes | ||
|
||
The `api.txt` file might be used by any external provider developer to make sure that only official | ||
APIs are used, which will also help those external providers to use the `common.sql` provider | ||
functionality without the fear of using methods and classes that are deemed non-public. | ||
|
||
Since this file is stored in the repository, we can automatically generate the API differences between | ||
future released version of `common.sql` provider, and we will be able to easily reason about changes of the | ||
API between versions. | ||
|
||
# Possible future work | ||
|
||
This mechanism providing the "official" API of the `common.sql` provider is just the first attempt to | ||
standardise APIs provided by various Airflow's components. In the future, we might decide to extend | ||
after experimenting and testing its behaviour for the `common.sql`. | ||
|
||
Keeping a formal, automatically generated and maintained list of official classes and methods that | ||
providers can use. Once we get used to it and confirm that it is a good mechanism, we will be able | ||
to extend this approach to more cases: | ||
|
||
* for other cross-provider-dependencies, we will be able to flag potentially breaking changes between them | ||
* we should be able to use very similar mechanism for exposing and flagging the public API of Airflow's core | ||
and flag any misuse of methods in the Community providers | ||
* as a follow-up of those two - we should be able to produce more "formal" API that providers written | ||
externally could use and write a "compatibility tool" that will be able to verify if the provider | ||
written externally is compatible (on API level) with certain version of Airflow, or in case of the | ||
cross-provider dependencies whether it is compatible (on API level) with certain versions of the released | ||
providers. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# This is a common.sql provider API dump | ||
# | ||
# This file is generated automatically by the update-common-sql-api pre-commit | ||
# and it shows the current API of the common.sql provider | ||
# | ||
# Any, potentially breaking change in the API will require deliberate manual action from the contributor | ||
# making a change to the common.sql provider. This file is also used in checking if only public API of | ||
# the common.sql provider is used by all the other providers. | ||
# | ||
# You can read more in the README_API.md file | ||
# | ||
airflow.providers.common.sql.hooks.sql.ConnectorProtocol | ||
airflow.providers.common.sql.hooks.sql.ConnectorProtocol.connect() -> Any | ||
airflow.providers.common.sql.hooks.sql.DbApiHook | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.__init__(schema<KW-ONLY>: str | None = None, log_sql<KW-ONLY>: bool = True, *args, **kwargs) | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.bulk_dump() | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.bulk_load() | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.get_autocommit() -> bool | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.get_conn() | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.get_cursor() | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.get_first(self = None) -> Any | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.get_pandas_df(self = None, **kwargs) | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.get_pandas_df_by_chunks(self = None, chunksize<KW-ONLY>, **kwargs) | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.get_records(self = None) -> Any | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.get_sqlalchemy_engine(self = None) | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.get_uri() -> str | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.insert_rows(self = None, table = 1000, rows = False, **kwargs) | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.last_description() -> Sequence | None | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.run(self = False, sql: str | Iterable = None, autocommit: bool = None, parameters: Iterable | Mapping | None = False, handler: Callable | None = True) -> Any | list | None | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.set_autocommit() | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.split_sql_string() -> list | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.strip_sql_string() -> str | ||
airflow.providers.common.sql.hooks.sql.DbApiHook.test_connection() | ||
airflow.providers.common.sql.hooks.sql.fetch_all_handler() -> list | None | ||
airflow.providers.common.sql.hooks.sql.fetch_one_handler() -> list | None | ||
airflow.providers.common.sql.hooks.sql.return_single_query_results() | ||
airflow.providers.common.sql.operators.sql.BaseSQLOperator | ||
airflow.providers.common.sql.operators.sql.BaseSQLOperator.__init__(conn_id<KW-ONLY>: str | None = None, database<KW-ONLY>: str | None = None, hook_params<KW-ONLY>: dict | None = None, retry_on_failure<KW-ONLY>: bool = True, **kwargs) | ||
airflow.providers.common.sql.operators.sql.BaseSQLOperator.get_db_hook() -> DbApiHook | ||
airflow.providers.common.sql.operators.sql.BranchSQLOperator | ||
airflow.providers.common.sql.operators.sql.BranchSQLOperator.__init__(sql<KW-ONLY>: str, follow_task_ids_if_true<KW-ONLY>: list, follow_task_ids_if_false<KW-ONLY>: list, conn_id<KW-ONLY>: str = default_conn_id, database<KW-ONLY>: str | None = None, parameters<KW-ONLY>: Iterable | Mapping | None = None, **kwargs) -> None | ||
airflow.providers.common.sql.operators.sql.BranchSQLOperator.execute() | ||
airflow.providers.common.sql.operators.sql.SQLCheckOperator | ||
airflow.providers.common.sql.operators.sql.SQLCheckOperator.__init__(sql<KW-ONLY>: str, conn_id<KW-ONLY>: str | None = None, database<KW-ONLY>: str | None = None, parameters<KW-ONLY>: Iterable | Mapping | None = None, **kwargs) -> None | ||
airflow.providers.common.sql.operators.sql.SQLCheckOperator.execute() | ||
airflow.providers.common.sql.operators.sql.SQLColumnCheckOperator | ||
airflow.providers.common.sql.operators.sql.SQLColumnCheckOperator.__init__(table<KW-ONLY>: str, column_mapping<KW-ONLY>: dict, partition_clause<KW-ONLY>: str | None = None, conn_id<KW-ONLY>: str | None = None, database<KW-ONLY>: str | None = None, accept_none<KW-ONLY>: bool = True, **kwargs) | ||
airflow.providers.common.sql.operators.sql.SQLColumnCheckOperator.execute() | ||
airflow.providers.common.sql.operators.sql.SQLExecuteQueryOperator | ||
airflow.providers.common.sql.operators.sql.SQLExecuteQueryOperator.__init__(sql<KW-ONLY>: str | list, autocommit<KW-ONLY>: bool = False, parameters<KW-ONLY>: Mapping | Iterable | None = None, handler<KW-ONLY>: Callable = fetch_all_handler, split_statements<KW-ONLY>: bool = False, return_last<KW-ONLY>: bool = True, **kwargs) -> None | ||
airflow.providers.common.sql.operators.sql.SQLExecuteQueryOperator.execute() | ||
airflow.providers.common.sql.operators.sql.SQLExecuteQueryOperator.prepare_template() -> None | ||
airflow.providers.common.sql.operators.sql.SQLIntervalCheckOperator | ||
airflow.providers.common.sql.operators.sql.SQLIntervalCheckOperator.__init__(table<KW-ONLY>: str, metrics_thresholds<KW-ONLY>: dict, date_filter_column<KW-ONLY>: str | None = ds, days_back<KW-ONLY>: SupportsAbs = -7, ratio_formula<KW-ONLY>: str | None = max_over_min, ignore_zero<KW-ONLY>: bool = True, conn_id<KW-ONLY>: str | None = None, database<KW-ONLY>: str | None = None, **kwargs) | ||
airflow.providers.common.sql.operators.sql.SQLIntervalCheckOperator.execute() | ||
airflow.providers.common.sql.operators.sql.SQLTableCheckOperator | ||
airflow.providers.common.sql.operators.sql.SQLTableCheckOperator.__init__(table<KW-ONLY>: str, checks<KW-ONLY>: dict, partition_clause<KW-ONLY>: str | None = None, conn_id<KW-ONLY>: str | None = None, database<KW-ONLY>: str | None = None, **kwargs) | ||
airflow.providers.common.sql.operators.sql.SQLTableCheckOperator.execute() | ||
airflow.providers.common.sql.operators.sql.SQLThresholdCheckOperator | ||
airflow.providers.common.sql.operators.sql.SQLThresholdCheckOperator.__init__(sql<KW-ONLY>: str, min_threshold<KW-ONLY>: Any, max_threshold<KW-ONLY>: Any, conn_id<KW-ONLY>: str | None = None, database<KW-ONLY>: str | None = None, **kwargs) | ||
airflow.providers.common.sql.operators.sql.SQLThresholdCheckOperator.execute() | ||
airflow.providers.common.sql.operators.sql.SQLThresholdCheckOperator.push() | ||
airflow.providers.common.sql.operators.sql.SQLValueCheckOperator | ||
airflow.providers.common.sql.operators.sql.SQLValueCheckOperator.__init__(sql<KW-ONLY>: str, pass_value<KW-ONLY>: Any, tolerance<KW-ONLY>: Any = None, conn_id<KW-ONLY>: str | None = None, database<KW-ONLY>: str | None = None, **kwargs) | ||
airflow.providers.common.sql.operators.sql.SQLValueCheckOperator.execute() | ||
airflow.providers.common.sql.operators.sql._parse_boolean() -> str | bool | ||
airflow.providers.common.sql.sensors.sql.SqlSensor | ||
airflow.providers.common.sql.sensors.sql.SqlSensor.__init__(conn_id<KW-ONLY>, sql<KW-ONLY>, parameters<KW-ONLY> = None, success<KW-ONLY> = None, failure<KW-ONLY> = None, fail_on_empty<KW-ONLY> = False, hook_params<KW-ONLY> = None, **kwargs) | ||
airflow.providers.common.sql.sensors.sql.SqlSensor.poke() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.