-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Orchestrator as a native Vitess component #6612
Comments
Adding:
|
Adding the TODO list for tracking -
|
If we still want to keep the persistence, SQLite scales quite well in WAL mode: https://sqlite.org/wal.html which just needs to be enabled. It still only allows one writer at a time, but it allows reads to occur concurrently with any other operations. It eliminates the "database is busy" errors which plague anyone trying to use SQLite from multiple threads or processes. The in-memory mode will produce fewer locked/busy errors, but it still has a coarse-grained locking model so it can still produce those errors. WAL mode uses snapshot isolation to avoid them. That also means if we're explicitly or implicitly dependent on the very strict isolation default of SQLite, we may need to use more explicit concurrency management (i.e. Edit: looks like Orchestrator already does this! https://github.com/openark/orchestrator/blob/de1b1ecd3f65cac447b24067d99dc56a8109fd82/go/db/db.go#L354 WAL mode scales very well, it can scale to hundreds of thousands of processes and millions of reads/s, but maybe not at the same time? :D |
It does! |
Introduction
With #6582, we are starting on a Vitess-specific fork of the Orchestrator that was originally created by @shlomi-noach.
Previously, Vitess provided an integration to loosely cooperate with an install of Orchestrator managing automated failover for the MySQL instances underlying Vitess. However, users often encountered corner cases where the two sides could make conflicting changes, potentially leaving the system in an inconsistent state.
We intend to resolve these issues and provide a smooth and unified experience to the end-user.
Approach
Many approaches were considered:
Of the above approaches, we have decided on option 4. We reached this conclusion after studying the Orchestrator code, and realizing that it is written with the intent to allow customization.
Architecturally, the Orchestrator is closest in function to the work performed by vtctld and has some overlapping functionality. For example,
stop-replica
vsStopReplica
,graceful-master-takeover
vsPlannedReparentShard
.On the other hand: For massive clusters with >10K nodes, it may be better to divide the Orchestrator's responsibility into specific keyspaces or shards. In this situation, it starts to deviate from the vtctld model. This requires further brainstorming.
To fork or not to fork
We debated extending the Orchestrator to accommodate Vitess into the existing code base, vs. forking the code. The conclusion was to fork due to the following reasons:
End Product
The end product could be a unified component that merges Orchestrator and vtctld. We could either preserve the old vtctld name, or give a brand new name for the new binary.
This will extend vtctld’s responsibility to not only execute user commands, but also initiate actions against the cluster to remedy any problems that are automatically solvable.
Also, vtctld is designed to cooperate with other vtctlds present. This is in line with the Orchestrator's philosophy. Although they use different methods right now, they will be unified at the end.
As for the web UI, the Orchestrator has a pleasant and intuitive graphical interface. This will be used as the starting point, and the vtctld functionality will be adapted around this as starting point.
There should be links to vttablets from Orchestrator, similar to how there are links from the vtctld web UI to vttablets today.
Orchestrator also has a more structured CLI interface, which will likely be preferred over the one that vtctld organically grew.
If we decide to not merge with vtctld, we'll have to consider either keeping Orchestrator as a separate component, or see what it will look like if it ran inside vttablets.
Pluggable Durability
Users come with varying tolerances on durability. Additionally, cloud environments define complex topologies. For example, AWS has zones and regions. In order to meet these requirements, and future-proof ourselves, we will implement a pluggable durability policy. This will be based on an extension of FlexPaxos.
Essentially, this pluggability will allow us to address all existing known use cases as well as future ones we have not heard of. More details will follow.
Details
This is a preliminary drill-down of what we think is needed. So, it’s presented as somewhat of a laundry list. We will refine this as we move forward.
MVP
A Minimum Viable Product is being developed. The latest code is currently in https://github.com/planetscale/vitess/tree/ss-oc2-vt-mode. It’s likely to evolve. Those who wish to preview upcoming work can look for branches in the PlanetScale repo that are prefixed with
ss-oc
. At this point, the MVP has the following functionality:TabletExternallyReparented
work. This still does not include everything thatEmergencyReparentShard
does today, but should be extended.The MVP is only a starting point. There is more work that needs to be done:
Code culture
Metadata
Improvements
Changes in behavior
New functionality
The text was updated successfully, but these errors were encountered: