Skip to content
This repository has been archived by the owner on Nov 18, 2021. It is now read-only.

Proposal: don't allow the introduction of new (non-hidden) definitions in a closed struct #543

Closed
mpvl opened this issue Sep 25, 2020 · 13 comments
Labels
Proposal roadmap/language-changes Specific tag for roadmap issue #339

Comments

@mpvl
Copy link
Contributor

mpvl commented Sep 25, 2020

Background

Currently, for regular fields, an author of a definition can make backwards-compatible modifications without fear of breaking its user, as long as users do not use this definition in an embedding. This is a great property to have for large-scale engineering.

A backwards compatible change in general is when an API is modified to be more less specific, with the exception for definitions that existing fields may not be removed, unless paired with a catch-all [string]: T or .... This is because, by default, a user may not add a field to a closed struct, so removing a field from the API will break a usage of this definition that specifies this field.

Note that a user may still add fields to an existing definitions by embedding it. In this case, changes to original definition may break its usage. There is a clear distinction, however, between using something as an embedding or not, and being able to give guarantees on the basis or whether or not a definition is used as an embedding is a clear rule. Also vet rules could warn about the use of definitions from outside the current module as an embedding.

The Problem

The problem is that this desirable property does not hold for nested definitions. Consider this definition:

#D: {
    a: int
}

A user of this template could write

foo: #D & {
    #bar: int
}

introducing a new definition in the definition #D. This is allowed because the current closedness rules do not apply to definitions.

Suppose the original definition is now changed to

#D: {
    a: int
   #bar: string
}

This now results in a breakage of foo downstream. This may be fine if both #D and foo are maintained by a single owner, but it is problematic if they are maintained by different organizations. The problem is two-fold: on the one hand the user of #D has no guarantee that the maintainer of #D will not break their usage, while the maintainer of #D cannot introduce a definition in #D without knowing it may break a user.

Note that the same problem does not occur for this snippet

import "acme.com/pkg"

foo: pkg.#D & {
    _#bar: int
}

because hidden definitions are not visible across package boundaries and live in their own namespace (NOTE: not implemented/enforced yet).

Proposal

We propose disallowing adding definitions to closed structs. In the example above, this means that

#D: a: int

foo: #D
foo: #bar: int // proposal makes this illegal

would be illegal and result in a #bar not allowed error.

The user could still write

#D: a: int

foo: {
    #D
    #bar: int
}

resulting in the same issue. However, this makes the behavior consistent with regular field and results in a single guideline upon which non-breakage guarantees can rely.

Note that users can still also write

#D: {
    a: int
}

foo: #D
foo: _#bar: int

to work around the issue as well.

Other less tangible benefits of this proposal is that conceptually there are less scenarios in how fields behave, as it makes the behavior of definitions more like that of regular fields. This may result in simpler model for the user and results in simpler code.

We propose that the ... operator does not apply to definitions.

Impact and transition

This is a breaking change. An automated rewrite is somewhat complex and would likely need to operate on the semantic level. Detecting breakage cases should be fairly straightforward. An automated rewrite will not likely be as straightforward, but seems in the realm of possibilities. It may be that it requires user interaction to decide if an offending inclusion of a definition should be rewritten as a an embedding or a hidden definition. Always rewriting it as an embedding may be the closest to the original semantics, though.

To minimize changes,

foo: #D
foo: #bar: int

could be rewritten as

foo: { #D, #bar: _ }
foo: #bar: int

This would not be too hard if position information of conflicting values is accurate.

It would be good to first implement the scoping rules of hidden fields before rolling this change out.

Discussion

Default constraints

The spec currently allows ...T to be used in structs as well (not yet implemented). It seems to not make sense to apply T to definitions and regular fields. The most likely reason why one would want to add a definition in a certain scope is because it either is a common type for multiple fields (see #limit below) or a type that is part of an expression at various points in the struct's fields. In the first case, ...T would just be applied multiple time. There seem to be very few favorable outcomes for the latter.

As it doesn't make sense to allow ...T to apply to definitions, it seems consistent to not have ... apply for definitions.

Users can use embedding to work around the constraint of adding definitions.

Possible future extensions

Note that what is discussed in this section is NOT a proposal, but rather a discussion on how the remaining limitations could be addressed in the future in a backwards compatible way.

The introduction of definitions may be useful in some cases. For instance, an author of a definition may want to base its definition on another definition, but also introduce a definition with some common patterns for its users. For instance

// amce.com/foo
package foo

#A: {
  a: int
  b: 
}
// example.com/pkg
package pkg

import "acme.com/foo"

#E: foo.#A & {
   #limit: <10 // illegal under the new proposal
  a: #limit
  b: #limit
}

Users of #E could write

import "example.com/pkg"

e: pkg.#E & { #limit: <5 }

to collectively tighten the limit of all fields.

To work around the limitations of the new proposal, the author could either use embeddings or a hidden definition. In this case, however, neither are satisfactory. It cannot be hidden, because users of the package are supposed to modify #limit and this it cannot be hidden. Using embedding brings us back to square one, as the author of #A could introduce #limit later on and break #E.

To resolve this issue, we could introduce qualified definitions that live in the namespace of the package in which they are defined. Syntactically, this could look like:

// example.com/pkg
package pkg

import "acme.com/foo"

#E: foo.#A & {
  pkg#limit: <10
  a: pkg#limit
  b: pkg#limit
}

Here pkg#limit is a broadening of the identifier syntax, where a # may be preceded not just by _, but any valid regular CUE identifier. The identifier here either refers to the handle of the current package or an imported package, fully qualifying the namespace of the definition. So in the example, pkg#limit is qualified by the tuple ("example.com/pkg", #limit).

A usage of #E would look like:

import "example.com/pkg"

e: pkg.#E & { pkg#limit: <5 }

This mechanism would allow authors to introduce definitions in a backwards compatible way without worrying about breaking things downstream or being broken by an upstream change.

This mechanism would only apply to definitions and not to regular fields. Regular fields are considered what ends up ultimately in a generated configuration and inherently represent a flat space. Definitions don't have this restriction and can benefit from the scoping results as introduced in this paragraph.

Note the close relation of scoped definitions with hidden definitions: 1) both are allowed to introduce a new definition within a closed struct (must be in the current package in both cases), 2) both have an additional qualifier before the # in a definition name, 3) in both cases the meaning of this qualifier is some fully qualified package path (which may be the current package), 4) the implementation of both relies on qualifying a field based on package names available in the current file during compile time. Note that this is mechanism is almost identical to how exported and unexported fields work in Go.

We reiterate, that the discussion of scoped definitions is missing a lot of details and is intended to be a proposal, but serves as an example to allow scoped definitions in a world where closedness rules apply to non-scoped definitions.

@seh
Copy link

seh commented Sep 25, 2020

So in the example, pkg#limit is qualified by the tuple ("acme.com/foo", #limit).

Should that be ("example.com/pkg", #limit) instead? Isn't #limit being defined in example.com/pkg?

We reiterate, that the discussion of scoped definitions is missing a lot of details and is intended to be a proposal,

Earlier you said that this is not a proposal, but here you say it is a proposal. It sounds like a proposal.

@mpvl
Copy link
Contributor Author

mpvl commented Sep 25, 2020

@seh: yes, good catch (I've changed it in the original).

@seh: the scoped/ qualified definitions (pkg#limit) are not part of the proposal. The actual proposal only involves the disallowing new definitions in closed structs discussed in the "Proposal" section.

@mpvl
Copy link
Contributor Author

mpvl commented Sep 25, 2020

@seh, I grouped the non-proposal part in discussions under "possible future extensions"

@eonpatapon
Copy link
Contributor

Quite agree with this proposal as if the def is closed (ie not including ... and friends) it shouldn't allow the addition of regular and def fields.

But it's not clear to me why ... would allow only regular fields. With ... in the def you can have same the problem you described with regular fields.


As for the discussion part and the described used case:

#E: foo.#A & {
  #limit: <10 // illegal under the new proposal
  a: #limit
  b: #limit
}

What's the advantage of declaring #limit in #E ? It could be declared at the root and used in #E without breaking any rule.

@myitcv
Copy link
Contributor

myitcv commented Sep 27, 2020

What's the advantage of declaring #limit in #E ? It could be declared at the root and used in #E without breaking any rule.

I think the context @mpvl is looking to establish here is that the author of #E is not the same author/owner as foo.#A, hence you don't have that control/luxury. But beyond that, the concept of #limit is only something that makes sense in the context of #E as presented (else it would already have been defined in foo.#A).

@myitcv
Copy link
Contributor

myitcv commented Sep 27, 2020

But it's not clear to me why ... would allow only regular fields. With ... in the def you can have same the problem you described with regular fields.

My take on the proposal is that it does not feel/appear like there is a good rationale/use case that suggests it should apply to definitions, and that therefore this issue has been raised to try and solicit others' intuition/examples to test that hypothesis/position.

Asked another (less convoluted!) way, do you have an example/intuition as to why it should apply to definitions?

How does the package qualified definitions approach work to your mind?

@eonpatapon
Copy link
Contributor

eonpatapon commented Sep 27, 2020

What's the advantage of declaring #limit in #E ? It could be declared at the root and used in #E without breaking any rule.

I think the context @mpvl is looking to establish here is that the author of #E is not the same author/owner as foo.#A, hence you don't have that control/luxury. But beyond that, the concept of #limit is only something that makes sense in the context of #E as presented (else it would already have been defined in foo.#A).

Yes I understood that, so the author of #E could write:

#limit: <10
#E: foo.#A & {
  a: #limit
  b: #limit
}

He doesn't need any control on foo.

@myitcv
Copy link
Contributor

myitcv commented Sep 27, 2020

@eonpatapon apologies, I misinterpreted what you wrote:

It could be declared at the root

as meaning "could be declared on the root type, i.e. foo.#A". On second reading I can see that's not what you meant.

the author of #E could write:

#limit: <10
#E: foo.#A & {
  a: #limit
  b: #limit
}

Correct, or use hidden definitions (because sometimes the definition being declared within the scope of #E is significant):

#E: foo.#A & {
    _#limit: <10
    a: _#limit
    b: _#limit
}

@eonpatapon
Copy link
Contributor

But it's not clear to me why ... would allow only regular fields. With ... in the def you can have same the problem you described with regular fields.

Asked another (less convoluted!) way, do you have an example/intuition as to why it should apply to definitions?

No not really

How does the package qualified definitions approach work to your mind?

I'm just thinking that the meaning of ... would not be trivial to grasp as it would imply: any regular field, no definition fields but fully qualified definitions fields (if implemented).

My point is if an upstream definition includes ... then any changes to it (adding a regular field or definition field) could potentially brake downstream definitions. So I don't understand why ... would exclude definitions.

If an author writes the definition

#A: {
  a: int
  b: string
}

He is sure that he won't break any downstream usage when modifying #A as it's not possible for user to add any field or definition to #A (as proposed - which I'm ok with)

But if he writes:

#A: {
  a: int
  b: string
  ...
}

He clearly knows there could be some breakage. And excluding definitions in #A downstream doesn't help on that (at least partially).

@mpvl
Copy link
Contributor Author

mpvl commented Sep 29, 2020

@eonpatapon

What's the advantage of declaring #limit in #E ?

Sometimes one has to. Maybe a more concrete example would be the following list type:

#ListType: {
  #T: _
  #List: {
    value: #T
    next: #List
  }
}

Users could instantiate ListType by setting #T to create a list type with values of a certain type.
This is a parametrized linked list. Firstly, it needs to be pointed out that #T needs to be defined within ListType: if #T had been defined outside of #ListType, it would not have been possible to instantiate a new #ListType with a more specific type like that.

In general there are all kinds of analogous reasons why a definition has to be scoped within a certain struct.

Now, suppose a users wants to have a list of two values. One could augment the ListType as such:

#ListType2:  {
  #ListType
  #T2
  #List: value2: #T2
}

Still a rather hypothetical example, but you can see perhaps why it is necessary that #T2 be included in #ListType2 and not outside it.

@mpvl
Copy link
Contributor Author

mpvl commented Sep 29, 2020

@eonpatapon

"I'm just thinking that the meaning of ... would not be trivial to grasp as it would imply: any regular field, no definition fields but fully qualified definitions fields (if implemented)."

The rationale is based on that it has weird consequences.

Firstly, assume that ...T is implemented. This is already mentioned in the spec. It provides for more consistency between lists and structs and is necessary to model the nuances of JSON Schema in CUE.

Now suppose we would allow ... to open up a struct for definitions. Then I would find it very surprising if ...T did not apply to definitions too. After all, ... is defined as a shorthand of ..._.

If one stops and think about what ...T does, I would find it quite odd if it also applied to definitions. For instance, in the list example one introduces such type most likely to constrain fields within that struct. It does not make sense to have ...T apply to both the definitions and the values in that struct.

Now, even if ...T were to apply to definitions too, given the interplay between [string]: T and ...T, I then would also expect [string]: T to apply to definitions. But that's not possible, because it inherently matches string values, which live in a different namespace. IOW, I would expect [=~"^#"]: T to match regular fields, not definitions. But if ...T can match definitions, I would expect there to be some way to have [string]: T to apply to definitions too.

Anyway, a bit convoluted train of thought, perhaps, but what I'm trying to get at is that allowing ... to apply to definitions results in more warts and questions than not doing so.

All of these complications can be avoided by just not letting ... apply to definitions. The question is whether people see this will create a gap in functionality.

@mpvl mpvl added this to the v0.3.0-evaluator-rewrite milestone Dec 8, 2020
@myitcv
Copy link
Contributor

myitcv commented Jan 14, 2021

Re-opening because there was a partial revert in https://cue-review.googlesource.com/c/cue/+/8201

@cueckoo
Copy link

cueckoo commented Jul 3, 2021

This issue has been migrated to cue-lang/cue#543.

For more details about CUE's migration to a new home, please see cue-lang/cue#1078.

@cueckoo cueckoo closed this as completed Jul 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Proposal roadmap/language-changes Specific tag for roadmap issue #339
Projects
None yet
Development

No branches or pull requests

5 participants