Migrate OSS to the new scheduler #10021

benmoriceau · 2022-02-02T22:27:18Z

Tell us about the problem you're trying to solve

A new scheduler has been implemented and added to the cloud project. The migration ended up needing several manual intervention to unstuck the jobs.

Describe the solution you’d like

###Migration of the new job

Problem

Currently the migration is apply or not through a feature flag. This feature flag is set using an environment variable which is leading to some issues. The issue is that airbyte application do not start in the same time which can lead to some unexpected state like having the server to try to schedule a new connection using the new scheduler while the worker is still configured to use the old one.

Solution

Create a new table that will store the feature flags
Add the default value (false) to the table to the table for OSS and true in the cloud project
Change the feature flag implementation in order to be base on the DB
Change the scheduler implemetation to check if the feature flag is activated during the periodic run instead of the start only.
Switch the feature flag value to true

Solution

lmossman · 2022-04-18T23:43:45Z

I talked with Benoit about this ticket, here is a summary of what we concluded:

If possible, we would like to avoid doing another "faux major version bump" where we require users to upgrade to an intermediate version before upgrading to a later one (the solution laid out above requires this)
The steps laid out above are trying to solve for the problem where a new airbyte-server pod is spun up while an old airbyte-scheduler pod is still running. This can only happen if airbyte operators do not turn off Airbyte before upgrading to a new version.
- However, our Upgrading Airbyte documentation instructs users to first spin down their existing deployment before upgrading to a new one
- Therefore, if OSS users instead try to upgrade their airbyte deployments in-flight, that behavior is already undefined and we should not try to account for that case. So, we should be able to rely on the fact that OSS deployments will be spun down before being upgraded.

Given the above points, this simplifies the migration plan here to just flipping the feature flag to true and adding back the migration logic that we had in the ServerApp at one point.

However, given that tearing down an existing Airbyte deployment and spinning it back up could potentially result in some jobs being in a strange state (e.g. the state is RUNNING but nothing is actually handling that job), the work to make our connection manager temporal workflows properly handle all unexpected job states (tickets coming soon) should be a pre-requisite for this migration, to ensure that the new temporal scheduler can properly recover from any weird states that happen as a result of this migration.

benmoriceau added type/enhancement New feature or request needs-triage 2022-q1-platform area/platform issues related to the platform labels Feb 2, 2022

benmoriceau mentioned this issue Feb 2, 2022

Bmoric/add feature flag table #10014

Closed

jrhizor mentioned this issue Feb 15, 2022

API: Reset connection doesn't consistently work #9872

Closed

benmoriceau assigned lmossman Apr 14, 2022

bleonard added the team/platform-move label Apr 15, 2022

This was referenced May 3, 2022

[EPIC] Ensure Source and Destination errors are distinguished and useful (Defogging phase 0) #12118

Closed

CHECK the Source an Destination before SYNC to denote config_error #12423

Merged

lmossman mentioned this issue May 11, 2022

🎉 Migrate OSS to temporal scheduler #12757

Merged

lmossman closed this as completed in #12757 May 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate OSS to the new scheduler #10021

Migrate OSS to the new scheduler #10021

benmoriceau commented Feb 2, 2022 •

edited

Loading

lmossman commented Apr 18, 2022 •

edited

Loading

Migrate OSS to the new scheduler #10021

Migrate OSS to the new scheduler #10021

Comments

benmoriceau commented Feb 2, 2022 • edited Loading

Tell us about the problem you're trying to solve

Describe the solution you’d like

lmossman commented Apr 18, 2022 • edited Loading

benmoriceau commented Feb 2, 2022 •

edited

Loading

lmossman commented Apr 18, 2022 •

edited

Loading