-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for binding patterns in SemIR #4221
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure I understand correctly, is this change supposed to only be adding BindingPattern
without really using it? Can you please use the PR description to elaborate a little on what's being implemented here?
Co-authored-by: Jon Ross-Perkins <jperkins@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be sure I understand correctly, is this change supposed to only be adding
BindingPattern
without really using it?
This PR "really uses it" in the way that it's designed to be used. In particular, we traverse the BindingPattern
s in order to emit the corresponding BindName
s.
Can you please use the PR description to elaborate a little on what's being implemented here?
How's this?
To keep things simple, this also rolls back the changes to use BindingPattern to generate the match IR. That will be added in future changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm particularly looking for feedback on the textual IR changes, because I'm not familiar enough with the conventions to tell if what I have is good, or how to improve them if not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I'm still looking at this today, but wanted to publish comments in case you want to discuss something during the toolchain meeting. Overall I think the direction this PR is taking seems a good start, though there are still some details that may be good to discuss (particularly see DeclInfo).
toolchain/sem_ir/decl.h
Outdated
// Information about a declaration. | ||
struct DeclInfo { | ||
// The pattern block, containing the pattern insts for the parameter pattern. | ||
InstBlockId pattern_block_id; | ||
// The declaration block, containing the declaration's parameters and their | ||
// types. | ||
InstBlockId decl_block_id; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What had you considered as alternatives before the DeclInfo approach? A couple I'm wondering about are:
- Adding an instruction to
decl_block_id
which just declares that there's a pattern- e.g.
struct PatternBlock { InstBlockId block_id; }
- i.e., as a generic approach that could apply to any block that adds a pattern.
- e.g.
- Putting
pattern_block_id
onEntityWithParams
- i.e., tracking it alongside parameters that it's providing a pattern for
To explain why I'm thinking about this, adding more ValueStores feels a little incremental complexity for understanding the toolchain. decl_block_id
was added to instructions because it fit into the instruction and was mainly for formatting. However, as part of splitting out to a new structure, it's no longer taking advantage of free space in the declaration's instruction.
Maybe it'd help for me to understand, what's the access pattern you expect for pattern_block_id
?
If only kept for the printed IR, the PatternBlock
instruction approach might be good: it keeps an association that you could get formatted instructions for, and also provides an approach that would work for arbitrary constructs. The overhead is marginally higher (essentially 12 more bytes), but more flexible beyond the current set of declarations. I expect this approach would need the PatternBlock
to be skipped for lowering, but it'd just be the one instruction.
If you need to go back to the pattern block from the decl instruction later during checking, then EntityWithParams
might be good: we would probably need it to be associated with the params themselves. EntityWithParams
is already shared where you're using DeclInfo
, meaning overhead is marginally lower (essentially 4 fewer bytes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hadn't considered those options. I've added the pattern block to EntityWithParams
, although note that Impl
doesn't derive from EntityWithParams
(yet?), so I had to also add it there. This feels a little awkward to me, because the pattern block is conceptually more closely tied to the decl block than to the rest of the entity, but that may be preferable to adding another ValueStore
.
I don't really expect the toolchain to access pattern_block_id
except when rendering the IR, but part of the rationale for introducing the pattern block and attaching it to some "owner" is because other tools (e.g. static analysis) might want to access it, and that will be much easier if it's a named member of some object rather than an inst they have to scan for. I also don't think the PatternBlock
approach will be as flexible as you say: in other uses of patterns (e.g. let
declarations), there's no block we can add it to while preserving the structural connection between the pattern insts and the pattern-match insts, so it will have to be a named member in those cases.
Side note: this changes the representation of chains of function-style qualifiers like Class(T:! type).F(n: T)
(see toolchain/check/testdata/class/generic/member_out_of_line.carbon
for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: this changes the representation of chains of function-style qualifiers like
Class(T:! type).F(n: T)
(seetoolchain/check/testdata/class/generic/member_out_of_line.carbon
for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.
I'm not seeing the issue, can you elaborate?
That said, I have an extended comment here, but let me ask two questions:
- Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?
- Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?
I'm asking this way because I think they're intertwined in how they'd be addressed; let me give some detail on both:
- Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?
In addition to alias
and similar, consider impl
. I see patterns in the example testdata/impl/impl_forall.carbon
, but not in the general case. Here's an example:
impl forall [T:! type] T as Simple {
fn F() {}
}
...
// CHECK:STDOUT: impl_decl @impl {
// CHECK:STDOUT: %T.patt: type = symbolic_binding_pattern T 0
// CHECK:STDOUT: } {
Had you considered only creating a pattern block when there's a forall
, avoiding the creation when there is none? In particular, forall
uses an implicit parameter list, which connects me to the second question.
Sharing PopNameComponent
is fine, but perhaps PopNameComponent
should have an option not to pop a pattern block?
- Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?
Right now, we do have decl_block_id merging a few different things. However, as you note, it's not really intended for inspection. Should parameter patterns be separated similarly to how parameter references are?
Note, I think this intertwines with the former question. If pattern blocks were split, one option might be to use TuplePattern and ImplicitParamList start/end nodes to push/pop pattern blocks. It looks like the start nodes don't have anything attached at present, so with a little work they might be useable as channels to store block IDs too.
To the extent that we might want different behavior between tuple patterns and explicit parameter lists, we currently don't support tuple patterns in check except as parameter lists. But, maybe these should be using distinct parse nodes instead of the same, or it may also get to a related question of how nested patterns should be represented/stored (maybe a nested tuple pattern actually needs this same behavior?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that
Impl
doesn't derive fromEntityWithParams
(yet?),
Note, #4336 is changing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: this changes the representation of chains of function-style qualifiers like
Class(T:! type).F(n: T)
(seetoolchain/check/testdata/class/generic/member_out_of_line.carbon
for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.I'm not seeing the issue, can you elaborate?
See %F.decl
in that file: after my revisions last week (specifically this one), there's no longer a symbolic_binding_pattern T
in the pattern block. By contrast, there is a bind_symbolic_name T
in the decl block.
I'm going to take your other questions in reverse order:
- Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?
My plan is to extend NameComponent
and EntityWithParams
with two new InstBlockId
members that relate to the pattern block in much the same way that implicit_params_id
and params_id
currently relate to the decl block:
- They don't "own" the insts they contain, but refer to insts in the pattern block.
- They have one inst per parameter.
- Each inst is the "root" inst for the corresponding parameter: all the pattern insts relating to that parameter are reachable from it.
This isn't just for future SemIR consumers;check
itself is going to need it once we shift to generating the pattern-match SemIR from the pattern SemIR.
I already have that implemented, but it seems out of scope for this PR.
- Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?
The basic problem is that the AST doesn't give us a reliable context-free signal that a full-pattern is starting (or ending), so we have to push a pattern block whenever one might be about to start, and pop once the full-pattern (if any) has definitely ended. That applies even to alias
and namespace
because the AST allows those to be parameterized.
You're right that we could avoid that problem if we put the implicit and explicit parameter lists in separate pattern blocks, and used separate AST node kinds for parameter lists and other tuple patterns. However, the consensus on Discord seemed to be to have one pattern block for each full-pattern, and to me that seems like the more semantically faithful representation.
Furthermore, I don't know if that change would really gain us much. My strong suspicion is that if a SemIR consumer cares about the distinction between implicit and explicit parameters, they probably care about which parameter a given inst is associated with. That means they aren't going to want to work with the pattern block directly, because it's too unstructured -- they're going to want to work with the one-inst-per-parameter blocks that I mentioned earlier, and traverse from there. That's certainly been my experience so far with consuming the pattern IR in toolchain code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I guess I was almost done looking through, just a couple more comments.
@@ -41,6 +41,8 @@ auto HandleParseNode(Context& context, Parse::ClassIntroducerId node_id) | |||
context.decl_name_stack().PushScopeAndStartName(); | |||
// This class is potentially generic. | |||
StartGenericDecl(context); | |||
// Start a new pattern block for the signature. | |||
context.pattern_block_stack().Push(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note, but something else to consider with storage might be whether we want the parameters and implicit parameters to each have their own pattern block. A single block is fine for now, I'm not asking for a change in this PR, but it might be something that affects the implementation more later.
- Put decl block insts in the scope of the declared entity. - Ignore attempts to name an inst that already has a name (but require those attempts to be in the same scope). - Give `Param` insts a ".param" suffix, to avoid colliding with the corresponding `BindName`.
toolchain/sem_ir/decl.h
Outdated
// Information about a declaration. | ||
struct DeclInfo { | ||
// The pattern block, containing the pattern insts for the parameter pattern. | ||
InstBlockId pattern_block_id; | ||
// The declaration block, containing the declaration's parameters and their | ||
// types. | ||
InstBlockId decl_block_id; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side note: this changes the representation of chains of function-style qualifiers like
Class(T:! type).F(n: T)
(seetoolchain/check/testdata/class/generic/member_out_of_line.carbon
for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.
I'm not seeing the issue, can you elaborate?
That said, I have an extended comment here, but let me ask two questions:
- Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?
- Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?
I'm asking this way because I think they're intertwined in how they'd be addressed; let me give some detail on both:
- Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?
In addition to alias
and similar, consider impl
. I see patterns in the example testdata/impl/impl_forall.carbon
, but not in the general case. Here's an example:
impl forall [T:! type] T as Simple {
fn F() {}
}
...
// CHECK:STDOUT: impl_decl @impl {
// CHECK:STDOUT: %T.patt: type = symbolic_binding_pattern T 0
// CHECK:STDOUT: } {
Had you considered only creating a pattern block when there's a forall
, avoiding the creation when there is none? In particular, forall
uses an implicit parameter list, which connects me to the second question.
Sharing PopNameComponent
is fine, but perhaps PopNameComponent
should have an option not to pop a pattern block?
- Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?
Right now, we do have decl_block_id merging a few different things. However, as you note, it's not really intended for inspection. Should parameter patterns be separated similarly to how parameter references are?
Note, I think this intertwines with the former question. If pattern blocks were split, one option might be to use TuplePattern and ImplicitParamList start/end nodes to push/pop pattern blocks. It looks like the start nodes don't have anything attached at present, so with a little work they might be useable as channels to store block IDs too.
To the extent that we might want different behavior between tuple patterns and explicit parameter lists, we currently don't support tuple patterns in check except as parameter lists. But, maybe these should be using distinct parse nodes instead of the same, or it may also get to a related question of how nested patterns should be represented/stored (maybe a nested tuple pattern actually needs this same behavior?).
I've added the TODOs we agreed on in the toolchain discussion yesterday. I wasn't quite sure where to put the TODO to embed more information in the parse tree, so I went with kind of a saturation approach. I believe that's all the changes we agreed on, so this is ready for review. |
Note: the contents of #4333 were merged into this PR after review, owing to a git usage error on my part. My apologies for the confusion. |
Introduces the
BindingPattern
andSymbolicBindingPattern
insts, and a separate stack of pattern blocks that they are emitted into. The intent is to generate the corresponding pattern-matching insts (likeBindName
) from them in a separate pass, but that is deferred to future PRs.See here for the design this is based on, but note that during review we have chosen to deviate from that design by putting the patterns in separate blocks, and omitting the "forward references" from a
BindingPattern
to its correspondingBindName
. This in turn necessitates having separate inst kinds for symbolic and non-symbolic binding patterns.