-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reintroduce for:
and core__Purpose
for v0.2
#9
Conversation
very high-level initial reaction based on reading the PR description (which is great!) not the change itself (I should probably leave actual review responsibility to somebody remaining on Polaris): I 100% buy your argument that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thx!
it has struck me that this might happen. the argument against must: imo is that it’s (1) less declarative and (2) less clear about who exactly must support the feature. explorer for example doesn’t care, and maybe explorer can just make up its own mind abt how to interpret must:, but i can imagine some applications that are a bit more borderline. it’s probably worth keeping in mind that all the use cases we have for this at present really are for: SECURITY and we are likely the folks who would discover we need a new enum value to capture some different behavior. we could also pick a more neutral for: SERVING, or split the difference with must: SERVING? |
Something like SERVING does make more sense to me. Just my 2 cents, happy to let you make a choice (with input from others if interested). |
core.spec.md
Outdated
|
||
```graphql definition | ||
enum core__Purpose { | ||
security |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @glasser that it could make sense to categorize these by processor behavior rather than purpose. My first response was also that something like serviceMustSupport
would be clearer, but we may want to think about the kinds of features and expected behavior first.
A thing to consider: should the metadata be the controlling power here? This could mean that a single subgraph dev could push a change all the way through that can't deploy, potentially creating runtime systems disruptions. If I own the infra do I want the schema to get to decide what should and shouldn't run, or do I want to control what's required to run, so I can reject earlier in the publishing pipeline based on config I control. This gives a lot of runtime control to individual subgraph devs, who are not trusted to control things like my authorization configuration, necessarily. So the surface area of attack and trust and control seems to be in a potentially odd place down this road. |
updated with a new purpose ( in particular, we provide a more granular model here, in which individual fields are or are not securely and correctly resolvable:
@ndintenfass to your point, this isolates the "damage" a subgraph owner can do. a subgraph can introduce an unsupported SECURITY feature and attach it to some of their fields. those fields will then not be securely resolvable. the supergraph owner may decide to reject any schema with any unresolvable fields, but that's on them; the spec suggests removing such fields from the schema and serving it anyway. this isolation isn't total (at least, not without more changes on the composition side); it shouldn't be considered a security surface, but rather, a "simple changes shouldn't break everything" dx affordance. |
@sachindshinde I think it'd be valuable to get your perspective on this! (Or maybe someone else from your team?) |
options: 1. least granular, fails the entire schema if an unsupported feature is present
|
It seems like 3 is something we can always add later (with the older syntax being basically a shorthand for "all to all directives"? 1 seems simpler to think about and implement consistently than 2. Do users actually want schemas that sort of load but only some fields/types works? |
yeah, i don't think we should do 3, at least not rn one use case for serving a schema with some unresolvable fields is during development, when some stuff is maybe broken but you want to best-effort the rest. another is actually at a production edge! i'd argue that if a prod router somehow receives a schema where it can't serve all the fields, it's probably better to fail some fields (partial outage) rather than all of them (total outage). whether you think that's actually a good idea is a separate question from whether the spec should provide enough structure for someone who does think it's a good idea to make the determination in the first place. i think it should. specifically, including the "cascade" language articulates two principles that i think make a great deal of sense, but which are not currently specified:
specifying (1) doesn't require us to actually include that logic—in fact, the spec is clear that conforming consumers can decide to just reject schemas with e.g. unsupported but including the language in the spec does allow consumers to make more granular determinations if they want. it also, importantly, gives feature implementers some guidance on how to specify their features. for example, it might be tempting to specify a feature which induces behavioral changes to all fields simply through its presence. this language makes it clear that such a design won't work (because there will be no (edit to add: i think we can make this guidance and the "no strange action at a distance" principle much more explicit, but the bones of it are there) |
I have a couple of high level questions/concerns which maybe go beyond the scope of this particular PR, but they seem relevant so I'll share them here. With the move to static composition one of our goals was predictability. As someone making or approving a change I want to be able to know what the resulting API schema will be before the change happens. In this context: a. How can we ensure that the API schema generated by Studio / managed federation matches the API schema that the gateway actually serves? Am I correct in assuming this would only hold true if we choose option 1 of the three? Or... b. Is there a way we can determine which features the gateway(s) will support at composition time? c. Regardless of which option we choose, how will errors be reported back to Studio? This seems potentially simpler with option 1 - we could rely on gateway schema reporting to tell us that the new schema hasn't been picked up by the gateway - but probably even more important with options 2 & 3. Ideally we'd offer a predictable system which allows me (as the person making the change, or as a person monitoring the health of changes) to know what the result of a schema change will be before it goes live in the gateway. While I think that making the gateway fail to update because some subgraph developer introduced a new feature that is not supported in the gateway is bad, I think allowing the gateway to update to a schema that is different from what I wanted to publish without either warning me ahead of time or alerting me after the fact is potentially worse. |
👍
agree it would be nice for the spec to offer a choice of (1) or (2) and hold back (3) for later.
don't think that's knowable given that different gateways in a fleet may be running different versions with different directive support -- and lots can change between composition time and the gateway loading a new schema at runtime. this can be mitigated with immutable deploys (e.g. Packer AMIs, docker images, etc.), blue/green deploys with pre-cutover analysis tests, and declarative config management (Terraform or k8s) where new configs (images plus schema) must undergo admission control (pass some smoke tests) before being admitted to an environments config repo similar to the supergraph-demo-k8s-graphops (see simple smoke tests) or in the Analysis step of a One additional smoke test could be an This would catch Gateway vs. supergraph schema mismatches before they were deployed live into an environment (assuming you did immutable image + schema deploys). However in the event that a supergraph schema did get pushed into a live environment, at that point would favor (2) with a partial outage vs. a total outage. |
Agree this is a property we should strive to uphold. The supergraph schema and API schema should be an intrinsically linked pair, as much as possible, so they could be reasoned about as two sides of one thing.
Yes, option (1) If doing However in a dev environment, you might want (2) to enable free experimentation without breaking composition as a DX affordance. You might also want (2) in an traditional managed federation Having a way to pick the Gateway runtime behavior (1) or (2) that is suitable for different Gateway deployments seems both important and something that is best configured as part of the Gateway deployment itself, given that a single supergraph schema artifact on a given variant may be deployed into different Gateway environments.
Agree with @ndintenfass that this needs more thought -- |
Some takeaways from discussion with @queerviolet and @martijnwalraven:
|
@pcmanus Even with your current focus, I think this could stand to benefit from your review and your collaboration with @queerviolet on it as I believe we need to make considerations for how this would work in the context of authoring subgraph schemas and the changes to composition you're looking at: e.g., how does this flow through the pipeline, namespacing, differing |
To be honest, I'm not sure I'm up to date on our story regarding core schema and subgraph authoring. My most recent understanding is that the end goal would be that subgraph authors do not use I'm not completely clear on whether the intent is for said "import mechanism" to expose the "purposes" introduced by this PR to subgraph authors, or instead to have said "purposes" associated to the feature and automatically imported by the "compiler". I seem however to have understood it was the latter (which makes more sense to me), in which case, there is no real authoring consideration. At least at that end goal. That said, I also do not know if we intend a temporary state where we'd ask subgraph authors to manually author
This is a bit related to my previous points. If "purposes" are essentially hard-coded to a feature (through some kind of module definition), then there is no reason to get differing values (for a given feature) in practice and we can just simply error out if that's not the case, which is easy enough to implement. In fact, even if subgraph authors manually author those "purposes", it feels erroring out on values discrepancies between subgraphs is the most reasonable option anyway. |
yeah, i think in practice the composer can fail if different subgraphs have different purposes for a given feature. if it's some very well-known feature (i.e. something we have a list of in the composer) we could provide a better error or even just fix it. there isn't currently a "composing" section either here or in the we do want to provide graph modules as a frontend which doesn't expose these details quite so hard (nor require users to correctly input metadata and definitions). i don't know if that will ship before subgraph core schemas. my suspicion is no, so there will be an intermediate period where folks are writing these by hand, but it really depends on release schedule. it'll also always be possible for people to write core schemas by hand or with a tool other than our compiler—and for that matter, it is possible for our compiler to contain bugs—so having some kind of handling for these cases will always be germane |
Are directives for documentation and informational purposes relevant in this discussion? Between customers, there's a hankering for directives to flow from subgraph schemas to the API schema (and the Studio UI, ideally) without any affect on execution or security. Some examples:
|
@lennyburdette such directives are in fact the default assumption! if purposes explicitly specified with |
(i should note that the gateway doesn't currently work this way, instead failing closed in the presence of any unsupported core features. although that's still allowable under this version of the spec, it's more clear that such behavior isn't generally desirable.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see this land! As discussed in the comments, we need to figure out how this affects subgraph authoring. Avoiding manually specifying the purpose for each feature may be another argument for getting to modules/imports sooner rather than later.
@core
,for:
, which takes a list ofcore__Purpose
values indicating the purpose(s) of a core feature. features with non-null purposes are treated differently w.r.t. the fail-open behavior. in particular, consumers MUST NOT serve a schema unless they support ALL features which are referenced forsecurity
core__Purpose
which prevents a feature from being removed in composition even if none of its schema elements are referenced, to be applied to features whose very existence is a signal to activate some kind of behavior).discussion points:
core__Purpose
has only one value,security
. it would thus be simpler to introduce aforSecurity: Bool
ormust: Bool
argument or similar. i'm not totally opposed to that, but i do think it's reasonably likely that we'll want additional purposes in the future (e.g.routing
,tracing
, etc). it's not unheardof for specs to introduce enums with only one value (e.g.document.designMode
, various vulkan/opengl apis i'm too lazy to look up) with the expectation that additional values will become available in the future.