-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
airbyte-ci: refactor GradleTask to run in subprocess #52033
airbyte-ci: refactor GradleTask to run in subprocess #52033
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
This stack of pull requests is managed by Graphite. Learn more about stacking. |
8399d12
to
b0000a6
Compare
5c9d7e2
to
68d0f82
Compare
68d0f82
to
def2396
Compare
CDK_MAVEN_METADATA_URL = ( | ||
"https://airbyte.mycloudrepo.io/public/repositories/airbyte-public-jars/io/airbyte/cdk/airbyte-cdk-core/maven-metadata.xml" | ||
) | ||
STATIC_GRADLE_OPTIONS = ("--build-cache", "--scan") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which options shall we keep compared to the containerized implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should keep --no-daemon and --no-watch-fs since this is a single build so we don't need to incur the overhead of start the daemon/watching the fs that is mostly useful for systems that are long running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about --no-daemon, aren't we going to call gradle multiple times in the same VM now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question - I'm assuming we only run 1 connector test per CI runner and most devs will not invoke airbyte-ci
locally, and will instead directly run gradle, so we won't invoke the gradle command multiple times in a single VM using the airbyte-ci
command. Augustine, does that sound right? Or is there a possibility we run different gradle commands within a single CI runner?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming we only run 1 connector test per CI runner
Nope, the default airbyte-ci implementation is to run all modified connectors tests in the same runner, concurrently for python connectors, sequentially for java connectors as concurrent access to the gradle cache is not possible. I can try to revisit this logic later. Colocating execution on the same runner has benefits for python connectors for dagger caching. Following this change colocation for java connectors makes less sense and hurts DX.
In any case we do run multiple gradle command in a single connector pipeline execution: distTar
, unitTests
, integrationTests
.
TLDR;: I don't think we should add --no-daemon
back. I'll re-add --no-watch-fs
. Thanks for your inputs.
@@ -115,7 +115,9 @@ runs: | |||
ls -la | |||
ls -la airbyte-python-cdk || echo "No airbyte-python-cdk directory" | |||
ls -laL ../airbyte-python-cdk || echo "No airbyte-python-cdk symlink" | |||
|
|||
- name: Install Java Environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a followup: that would be cool to spare python connector only PR from installing the java env. We could rely on the changes
output to check if any java connector was modified
@@ -60,7 +60,8 @@ async def run_connector_build(context: ConnectorContext) -> StepResult: | |||
build_connector_tar_result = await BuildConnectorDistributionTar(context).run() | |||
if build_connector_tar_result.status is not StepStatus.SUCCESS: | |||
return build_connector_tar_result | |||
dist_dir = await build_connector_tar_result.output.directory(dist_tar_directory_path(context)) | |||
|
|||
dist_dir = await build_connector_tar_result.output.directory("build/distributions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for understanding: why are we using "build/distributions" directly? feels unrelated to the other changes, maybe I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It changes because the original path related to an "absolute path airbyte repo path". the build_connector_tar
step now returns a directory which is the connector directory.
airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/steps/gradle.py
Outdated
Show resolved
Hide resolved
# When running locally, this dependency update is slower and less useful than within a CI runner. Skip it. | ||
warm_dependency_cache_args = ["--dry-run"] | ||
return stdout, stderr, process.returncode | ||
finally: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the started process hangs, how does this finally block trigger?
this is probably my lack of python knowledge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it hangs it's never reached. Good point. I can add a timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alafanechere this looks sane to me. I'm not really qualified to review the Dagger code or Python best practices, so I mostly focused on understanding those pieces. Everything else seems good.
One comment on the python process handling, but non-blocking.
I see CI works. Did you manage to test this locally?
👍
I'll address that tomorrow.
I did, but it was pretty slow - my machine was dying. I'm going to try again tomorrow with a less busy env. |
I realised I didn't answer this. What is the overhead of doing so? I think the use case for this is being able to see logs as tests run in the CI. I imagine it's useful if trying to debug in CI, however I'm less sure of the value since I'm not sure how the tests outputs will look like once this change goes in. If the dx is the same as the current dagger tests, I think value is low, since the output is only at the end when everything is run. Value is high if we are able to see things live in the GHA UI, though lower if perf overhead is large. I think we can poll various Java devs and fast follow in another PR since this PR by itself has lots of value. |
We can show the output live in the GHA console, but when I tried locally I had the impression that the output streaming had overhead. I can definitely experiment on this on a follow up PR. |
process.terminate() | ||
process.wait(timeout=5) # Give it 5 seconds to terminate gracefully | ||
try: | ||
process.kill() # Force kill if still running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this specific scenario be logged somehow? We might want to know that it happened?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like something worth doing! I'll note that there's more followup work in the form of ripping out all the stuff we did to support dind (wasn't /tmp mounted as a volume at some point?)
cache-write-only: ${{ inputs.gradle_cache_write_only }} | ||
- name: Install system tools (jq, xargs) | ||
shell: bash | ||
run: sudo apt-get -y install jq findutils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this step really required? Have you tried without? jq is no longer needed (IIRC) and xargs should be present on the system (which is now a full fledged ubuntu and not some bare bones alpine distro)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it and it still works. Thanks for the suggestion
CDK_MAVEN_METADATA_URL = ( | ||
"https://airbyte.mycloudrepo.io/public/repositories/airbyte-public-jars/io/airbyte/cdk/airbyte-cdk-core/maven-metadata.xml" | ||
) | ||
STATIC_GRADLE_OPTIONS = ("--build-cache", "--scan") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about --no-daemon, aren't we going to call gradle multiple times in the same VM now?
def2396
to
2176226
Compare
2176226
to
5709810
Compare
dind is still used for some python connector testing use cases which rely on docker-compose files. |
5709810
to
a49800e
Compare
@@ -61,24 +61,3 @@ def context_with_setup(dagger_client, python_connector_with_setup_not_latest_cdk | |||
@pytest.fixture(scope="module") | |||
def python_connector_base_image_address(python_connector_with_setup_not_latest_cdk): | |||
return python_connector_with_setup_not_latest_cdk.metadata["connectorBuildOptions"]["baseImage"] | |||
|
|||
|
|||
async def test_with_python_connector_installed_from_setup(context_with_setup, python_connector_base_image_address, latest_cdk_version): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test was failing - removing it as installing from setup.py is now legacy
@postamar @davinchia @frifriSF59 thanks for your comments. Here are my latest changes following your suggestions.
Here's my latest CI test run on I opened a followup PR for CI orchestration optimization: to leverage matrix strategies to run 1 java connector per runner and only install the java environment when needed (when testing java connectors). It's still a wip. |
a49800e
to
466ee65
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be curious what are the pieces @postamar as no longer needed and can now be removed. However definitely not blocking.
Apparently docker-in-docker is required for some python connectors so it can't be removed. This was by far the biggest piece. I'll take a look and see what odds and ends I can remove but they're fairly inconsequential in comparison. |
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
What
We want to move away from containerized gradle execution in CI.
This PR makes
airbyte-ci
run gradle as a subprocess on the system, while keeping theairbyte-ci
orchestration benefits.Example
I got a 🟢 run on destination-s3 from this branch.
Running a longer job on
source-postgres
,destination-bigquery
anddestination-snowflake
.Question for reviewers ❓
Which gradle options should we keep compared to the original containerized implementation:--no-watch-fs
--build-cache
and--scan
Do we want live Gradle logging when executing the gradle task in airbyte-ci, I found it really noisy and piping gradle output to python then to console has some performance overhead.I'll try to provide live gradle output in a follow up PR but local testing has shown strange overhead when streaming gradle output to the console.How
New GitHub Action for Java Environment Setup:
.github/actions/install-java-environment/action.yml
: Added a new action to install the Java environment, including Java and Gradle setup with configurable cache options.Updates to Build and Test Steps:
.github/actions/run-airbyte-ci/action.yml
: Integrated the new Java environment setup action into the CI workflow.airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/connectors/build_image/steps/java_connectors.py
: Updated the directory path for the distribution tar output.airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/connectors/test/steps/java_connectors.py
: Modified the test steps to use the new distribution directory path.Refactoring Gradle Task Handling:
airbyte-ci/connectors/pipelines/pipelines/airbyte_ci/steps/gradle.py
:DX benefits