-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Upgrades Project Workstreams Update #36
Comments
Does this imply that the knowledge base will be maintained separate from the tests and their execution outcomes?
I'm not sure why predictive assessments don't come for free if you have support for complex testing? You'd need to be able to replicate the customer's environment that you're running complex tests on, but even if that is approximated (version to start w/), it seems like you should be able to drive an assessment immediately. Have I misunderstood some part of the proposal? Lastly, "Production-Ready" is a very loaded and fuzzy judgment. It will be different for every customer. I'd like to see a different way to describe conformance. Maybe we could say something like "conformant to v1.636 with the following exceptions: ...". In that case, conformant would mean that all known issues in our knowledge base at the point in time that 1.636 existed for the source/target versions as they were handed to the tool behaved completely as expected, unless otherwise noted. For both source and target, we should also dump out the configurations that we understand at the point in time that the tool is being run. It will be up to the customer to determine if the level of conformance that the tool is checking for is enough to determine if it is "production ready" or not. |
I don't know what you're asking here. In my mind, the expectation must be separable from its implementing test/data, but it might be fine for them to both be represented by files/data in the same repo. Does that answer your question?
Predictive, pre-upgrade assessments come free; validation of an actual, post-upgrade cluster's readiness don't (I think).
I think you're making a lot of assumptions here. For example - having access to both the pre- and post-upgrade clusters in some way. Is that a valid assumption we want to drive our design around? Unclear to me at this point. Also, it would be helpful if you could be clearer about what datapoints you're imagining are including in a determination of "conformance" and how they will be gathered. My cogitation indicates this is a situation where the devil really is in the details. Finally, I don't think we're going to give them a report with a big checkbox that says "congratulations, your cluster is production ready". We're going to give them a report that enables them to make that assessment themselves. But the goal of the report should be to facilitate that determination and so I personally have no problem with the current branding, which I think is the source of the disagreement. Open to further thoughts on the topic though. |
To answer/rephrase some of this as I understand it (and see if that lines up with other's understandings): I think one of the things we're starting to realize is that there are two types of validation tests:
The knowledge base and the assessment tool operate in conjunction with the first category. These should be tests that are run against testing framework as part of the CI/CD and they prove what we're claiming in the knowledge base. The second category is probably what customers care about in a real-life context. They want to know if their migration worked, and the first category doesn't really tell them that. Also they probably don't want us uploading our test data to their real clusters.
If I'm understanding this correctly, this is related to the first category, but the (very fuzzy!) "is this cluster production ready" is trying to get at the second category. |
Thanks for clarifying @mikaylathompson - I agree 100%. I would suggest that the Knowledge Base could inform some of the things customers should check for post-upgrade as part of our solution to (2). We might not be able to conclusively determine, in all areas, whether the upgrade broke something and instead need to leave it as an exercise for the reader based upon a prompt/warning. |
@mikaylathompson Incorporated some of your post back into the doc |
Discussed in-depth w/ @dblock.
Also discussed in-depth w/ @mikaylathompson, @sumobrian, @gregschohn, @kartg, @lewijacn, and @okhasawn.
|
Great proposal. Thanks for putting effort in drafting this. Few comments/questions
Could you clarify what shape would the output of knowledge base workstream take — will it be documentation on website, and/or a README in a repo and/or a collection of rules documented in some form. How would users contribute to it and ?
Whose responsibility will be to guarantee these "expectations" do not break in OpenSearch software? Today, backward compatibility framework is a source of validation for upgrades. Would the responsibility shift to UTF now? Would this be integrated with OpenSearch CICD pipleines to ensure the violation PRs are detected before merging in and the "expectations" are honored. Also, the primary user for UTF appears to be OpenSearch developers who would use it to test upgrade process sanctity between versions. Do you also see it being used by anyone else? I didn't see this mentioned (or I might have missed) but I also see Plugin & Extension developers as a user of this framework. UTF could be used to validate Plugin/Extension upgrade compatibility between versions.
I am assuming this would work on real production clusters to provide assessments? Would this cover - identifying breaking changes between the source and the target version, identifying data type incompatibilities, identifying deprecations, and providing recommendations that would help the user to discover and mitigate potential issues prior to starting the upgrade process? |
Had a good convo w/ @setiah on 2022-12-09, which I forgot to update this thread with the details of.
|
Stale issue; resolving. |
Objective
The purpose of this doc is to spark conversation around the major work initiatives in the Upgrades Project based upon new design work, developments, and data. The author proposes a set of workstreams that mostly overlap with preceding proposals but deviates in minor ways in terms of scope and sequencing. He hopes to drive alignment around this updated understanding of the project.
At a high level, it is proposed that the Validation Tool is not a "freebie" that naturally results from the development of the Upgrade Testing Framework, but is instead dependent on the Assessment Tool and therefore should be sequenced after it. Additionally, the focus of the Upgrade Testing Framework has shifted, and it is not longer expected to be performing validations of “real” clusters.
Upgrade Project Workstreams
Develop Upgrades "Knowledge Base"
This workstream is to develop a centralized understanding of what is "expected" to "go right" and "go wrong" when an Elasticsearch/OpenSearch cluster is upgraded. Within the bounds of this knowledge base are: data, metadata, configuration, the core ElasticsearchOpenSearch engine, and plugins. At the center of this workstream is developing a library of "expectations" that each express a thing that is expected (a string field should be converted to geopoint), when it applies (going from version X to X+2), and data/tests to confirm the expectation matches reality (e.g. the ability to run an actual upgrade and check to see if the expectation is true). The intention is to capture the community's full understanding of what actually happens during a given upgrade in order to provide better guidance/documentation, and provide solutions to pain points. A further intention is that it should be easy for community members to contribute new expectations to the knowledge base.
Related work:
Develop Tooling to Test Complex Upgrades
This workstream is focused on creating the required tooling to confirm that the expectations presented in the Knowledge Base are true in a repeatable, CI/CD manner. Currently, this workstream is called the Upgrade Testing Framework. Having such a set of tooling has the following benefits. First, it ensures that the expectations captured in the knowledge base are accurate and catches when those expectations change. Second, it facilitates development of fixes to pain points by providing a way to test those fixes. Third, it supports development of major new initiatives like single-hop, multi-version upgrades that existing test tooling does not focus on. Fourth, it provides the community with a higher-fidelity understanding of backwards compatibility than the existing Backwards Compatibility (BWC) tests are designed to provide and is intended to replace those existing tests.
While this tooling is not itself intended to be run directly against users' real clusters, the core code library written for the tooling is intended to be easily adapted to support new workstreams, such as providing an assistant that helps walk a user through the process of performing a migration or upgrade.
Related work
Develop Tooling to Perform Predictive Assessments
This workstream relies on the existance of an accurate, tested knowledge base to provide users a way to predict what issues they may run into when they perform an upgrade of an existing Elasticsearch/OpenSearch cluster to a new, proposed version. The current thinking is that the tool will interrogate the user's existing cluster to determine its configuration, use that understanding to project the expectations in the knowledge base into a subset that apply, and provide a report outlining issues that they may encounter.
Direct effort on this workstream has not yet begun.
Develop Tooling to Validate An Upgraded Cluster As Production-Ready
This workstream is focused on providing tooling to help a user decide whether an upgraded cluster is ready to be promoted to production. While further investigation is required to determine which specific criteria to validate with such tooling, the current thinking is as follows.
First, it is proposed that "is the system behaving as we expect" and "did the upgrade/migration break anything" are different questions requiring different, but probably related, work to answer. The first question is something that the Knowledge Base and Upgrade Testing Framework focus on by actually testing the edge cases in upgrades (e.g. did a field change from string to geopoint). The second is what the user cares about for this workflow when we talk about a Validation Tool (e.g. are the nodes happy and able to read the indices). Some expectations seems to overlap, such as the expectation that the pre- and post-upgrade clusters should have the same number of documents.
Second, attempting to directly confirm the full set of expectations contained in the knowledge base against an already-upgraded cluster does not seem to be either useful or tractable. A more reasonable approach would be to interrogate the post-upgrade cluster and provide a report of issues that may be present, according to the knowledge base, similar to the assessment tool, alongside whatever validation is performed. Many expectations are likely require a pre-upgrade setup phase (e.g. data upload) to confirm that is not relevant to a standalone, post-upgrade cluster and customers won't want to perform against their production cluster. In the event that both the pre-upgrade and post-upgrade cluster were available, a more sophisticated approach could be taken, but it still seems unlikely that the solution here would be to run the full suite of expectation tests as present in the Upgrade Testing Framework.
Third, while the validation tool would likely not be implemented by just running the Upgrade Testing Framework against the real, post-upgrade cluster, it does seem reasonable to test the validation tool with the Upgrade Testing Framework as one of its steps.
Fourth, investigation needs to be performed to determine which datapoints are most useful for validating a post-upgrade cluster is ready for production. A proper understanding of those datapoints is key for designing the tool, as the specific datapoints desired necessarily shape the input/setup requirements for the tool. For example, if it is overwhelmingly important that the number of documents is (roughly) the same pre- and post-upgrade, then the tool must have access to both the pre- and post-upgrade clusters in some way an cannot be run on a standalone, post-upgrade cluster.
Direct effort on this workstream has not yet begun.
Proposed Workstream Sequencing
The text was updated successfully, but these errors were encountered: