Initial support for binding patterns in SemIR #4221

geoffromer · 2024-08-16T18:01:33Z

Introduces the BindingPattern and SymbolicBindingPattern insts, and a separate stack of pattern blocks that they are emitted into. The intent is to generate the corresponding pattern-matching insts (like BindName) from them in a separate pass, but that is deferred to future PRs.

See here for the design this is based on, but note that during review we have chosen to deviate from that design by putting the patterns in separate blocks, and omitting the "forward references" from a BindingPattern to its corresponding BindName. This in turn necessitates having separate inst kinds for symbolic and non-symbolic binding patterns.

jonmeow

To be sure I understand correctly, is this change supposed to only be adding BindingPattern without really using it? Can you please use the PR description to elaborate a little on what's being implemented here?

toolchain/sem_ir/inst_namer.cpp

toolchain/sem_ir/inst_kind.def

toolchain/sem_ir/function.cpp

toolchain/check/eval.cpp

toolchain/check/handle_pattern_list.cpp

toolchain/sem_ir/formatter.cpp

toolchain/check/testdata/class/generic/fail_todo_use.carbon

toolchain/check/import_ref.cpp

toolchain/check/handle_binding_pattern.cpp

toolchain/lower/file_context.cpp

toolchain/sem_ir/typed_insts.h

Co-authored-by: Jon Ross-Perkins <jperkins@google.com>

geoffromer

To be sure I understand correctly, is this change supposed to only be adding BindingPattern without really using it?

This PR "really uses it" in the way that it's designed to be used. In particular, we traverse the BindingPatterns in order to emit the corresponding BindNames.

Can you please use the PR description to elaborate a little on what's being implemented here?

How's this?

toolchain/check/eval.cpp

toolchain/check/handle_binding_pattern.cpp

toolchain/check/import_ref.cpp

toolchain/lower/file_context.cpp

toolchain/lower/handle.cpp

toolchain/sem_ir/function.cpp

toolchain/sem_ir/inst_kind.def

toolchain/check/testdata/class/generic/fail_todo_use.carbon

toolchain/sem_ir/typed_insts.h

toolchain/check/handle_pattern_list.cpp

To keep things simple, this also rolls back the changes to use BindingPattern to generate the match IR. That will be added in future changes.

geoffromer

I'm particularly looking for feedback on the textual IR changes, because I'm not familiar enough with the conventions to tell if what I have is good, or how to improve them if not.

toolchain/check/handle_binding_pattern.cpp

toolchain/sem_ir/inst_namer.cpp

jonmeow

FYI, I'm still looking at this today, but wanted to publish comments in case you want to discuss something during the toolchain meeting. Overall I think the direction this PR is taking seems a good start, though there are still some details that may be good to discuss (particularly see DeclInfo).

toolchain/check/context.cpp

toolchain/check/handle_alias.cpp

toolchain/sem_ir/typed_insts.h

toolchain/sem_ir/inst_namer.cpp

jonmeow · 2024-09-17T18:48:03Z

toolchain/sem_ir/decl.h

+// Information about a declaration.
+struct DeclInfo {
+  // The pattern block, containing the pattern insts for the parameter pattern.
+  InstBlockId pattern_block_id;
+  // The declaration block, containing the declaration's parameters and their
+  // types.
+  InstBlockId decl_block_id;
+};


What had you considered as alternatives before the DeclInfo approach? A couple I'm wondering about are:

Adding an instruction to decl_block_id which just declares that there's a pattern

e.g. struct PatternBlock { InstBlockId block_id; }

i.e., as a generic approach that could apply to any block that adds a pattern.

Putting pattern_block_id on EntityWithParams

i.e., tracking it alongside parameters that it's providing a pattern for

To explain why I'm thinking about this, adding more ValueStores feels a little incremental complexity for understanding the toolchain. decl_block_id was added to instructions because it fit into the instruction and was mainly for formatting. However, as part of splitting out to a new structure, it's no longer taking advantage of free space in the declaration's instruction.

Maybe it'd help for me to understand, what's the access pattern you expect for pattern_block_id?

If only kept for the printed IR, the PatternBlock instruction approach might be good: it keeps an association that you could get formatted instructions for, and also provides an approach that would work for arbitrary constructs. The overhead is marginally higher (essentially 12 more bytes), but more flexible beyond the current set of declarations. I expect this approach would need the PatternBlock to be skipped for lowering, but it'd just be the one instruction.

If you need to go back to the pattern block from the decl instruction later during checking, then EntityWithParams might be good: we would probably need it to be associated with the params themselves. EntityWithParams is already shared where you're using DeclInfo, meaning overhead is marginally lower (essentially 4 fewer bytes).

I hadn't considered those options. I've added the pattern block to EntityWithParams, although note that Impl doesn't derive from EntityWithParams (yet?), so I had to also add it there. This feels a little awkward to me, because the pattern block is conceptually more closely tied to the decl block than to the rest of the entity, but that may be preferable to adding another ValueStore.

I don't really expect the toolchain to access pattern_block_id except when rendering the IR, but part of the rationale for introducing the pattern block and attaching it to some "owner" is because other tools (e.g. static analysis) might want to access it, and that will be much easier if it's a named member of some object rather than an inst they have to scan for. I also don't think the PatternBlock approach will be as flexible as you say: in other uses of patterns (e.g. let declarations), there's no block we can add it to while preserving the structural connection between the pattern insts and the pattern-match insts, so it will have to be a named member in those cases.

Side note: this changes the representation of chains of function-style qualifiers like Class(T:! type).F(n: T) (see toolchain/check/testdata/class/generic/member_out_of_line.carbon for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.

Side note: this changes the representation of chains of function-style qualifiers like Class(T:! type).F(n: T) (see toolchain/check/testdata/class/generic/member_out_of_line.carbon for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.

I'm not seeing the issue, can you elaborate?

That said, I have an extended comment here, but let me ask two questions:

Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?

Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?

I'm asking this way because I think they're intertwined in how they'd be addressed; let me give some detail on both:

Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?

In addition to alias and similar, consider impl. I see patterns in the example testdata/impl/impl_forall.carbon, but not in the general case. Here's an example:

impl forall [T:! type] T as Simple { fn F() {} } ... // CHECK:STDOUT: impl_decl @impl { // CHECK:STDOUT: %T.patt: type = symbolic_binding_pattern T 0 // CHECK:STDOUT: } {

Had you considered only creating a pattern block when there's a forall, avoiding the creation when there is none? In particular, forall uses an implicit parameter list, which connects me to the second question.

Sharing PopNameComponent is fine, but perhaps PopNameComponent should have an option not to pop a pattern block?

Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?

Right now, we do have decl_block_id merging a few different things. However, as you note, it's not really intended for inspection. Should parameter patterns be separated similarly to how parameter references are?

Note, I think this intertwines with the former question. If pattern blocks were split, one option might be to use TuplePattern and ImplicitParamList start/end nodes to push/pop pattern blocks. It looks like the start nodes don't have anything attached at present, so with a little work they might be useable as channels to store block IDs too.

To the extent that we might want different behavior between tuple patterns and explicit parameter lists, we currently don't support tuple patterns in check except as parameter lists. But, maybe these should be using distinct parse nodes instead of the same, or it may also get to a related question of how nested patterns should be represented/stored (maybe a nested tuple pattern actually needs this same behavior?).

note that Impl doesn't derive from EntityWithParams (yet?),

Note, #4336 is changing this.

Side note: this changes the representation of chains of function-style qualifiers like Class(T:! type).F(n: T) (see toolchain/check/testdata/class/generic/member_out_of_line.carbon for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.

I'm not seeing the issue, can you elaborate?

See %F.decl in that file: after my revisions last week (specifically this one), there's no longer a symbolic_binding_pattern T in the pattern block. By contrast, there is a bind_symbolic_name T in the decl block.

I'm going to take your other questions in reverse order:

Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?

My plan is to extend NameComponent and EntityWithParams with two new InstBlockId members that relate to the pattern block in much the same way that implicit_params_id and params_id currently relate to the decl block:

They don't "own" the insts they contain, but refer to insts in the pattern block.

They have one inst per parameter.

Each inst is the "root" inst for the corresponding parameter: all the pattern insts relating to that parameter are reachable from it.
This isn't just for future SemIR consumers; check itself is going to need it once we shift to generating the pattern-match SemIR from the pattern SemIR.

I already have that implemented, but it seems out of scope for this PR.

Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?

The basic problem is that the AST doesn't give us a reliable context-free signal that a full-pattern is starting (or ending), so we have to push a pattern block whenever one might be about to start, and pop once the full-pattern (if any) has definitely ended. That applies even to alias and namespace because the AST allows those to be parameterized.

You're right that we could avoid that problem if we put the implicit and explicit parameter lists in separate pattern blocks, and used separate AST node kinds for parameter lists and other tuple patterns. However, the consensus on Discord seemed to be to have one pattern block for each full-pattern, and to me that seems like the more semantically faithful representation.

Furthermore, I don't know if that change would really gain us much. My strong suspicion is that if a SemIR consumer cares about the distinction between implicit and explicit parameters, they probably care about which parameter a given inst is associated with. That means they aren't going to want to work with the pattern block directly, because it's too unstructured -- they're going to want to work with the one-inst-per-parameter blocks that I mentioned earlier, and traverse from there. That's certainly been my experience so far with consuming the pattern IR in toolchain code.

jonmeow

Actually, I guess I was almost done looking through, just a couple more comments.

jonmeow · 2024-09-17T20:14:42Z

toolchain/check/handle_class.cpp

@@ -41,6 +41,8 @@ auto HandleParseNode(Context& context, Parse::ClassIntroducerId node_id)
  context.decl_name_stack().PushScopeAndStartName();
  // This class is potentially generic.
  StartGenericDecl(context);
+  // Start a new pattern block for the signature.
+  context.pattern_block_stack().Push();


Just a note, but something else to consider with storage might be whether we want the parameters and implicit parameters to each have their own pattern block. A single block is fine for now, I'm not asking for a change in this PR, but it might be something that affects the implementation more later.

toolchain/check/pattern_block_stack.h

- Put decl block insts in the scope of the declared entity. - Ignore attempts to name an inst that already has a name (but require those attempts to be in the same scope). - Give `Param` insts a ".param" suffix, to avoid colliding with the corresponding `BindName`.

toolchain/check/name_component.h

jonmeow · 2024-09-19T16:58:00Z

toolchain/sem_ir/decl.h

+// Information about a declaration.
+struct DeclInfo {
+  // The pattern block, containing the pattern insts for the parameter pattern.
+  InstBlockId pattern_block_id;
+  // The declaration block, containing the declaration's parameters and their
+  // types.
+  InstBlockId decl_block_id;
+};


Side note: this changes the representation of chains of function-style qualifiers like Class(T:! type).F(n: T) (see toolchain/check/testdata/class/generic/member_out_of_line.carbon for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.

I'm not seeing the issue, can you elaborate?

That said, I have an extended comment here, but let me ask two questions:

Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?

Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?

I'm asking this way because I think they're intertwined in how they'd be addressed; let me give some detail on both:

Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?

In addition to alias and similar, consider impl. I see patterns in the example testdata/impl/impl_forall.carbon, but not in the general case. Here's an example:

impl forall [T:! type] T as Simple { fn F() {} } ... // CHECK:STDOUT: impl_decl @impl { // CHECK:STDOUT: %T.patt: type = symbolic_binding_pattern T 0 // CHECK:STDOUT: } {

Had you considered only creating a pattern block when there's a forall, avoiding the creation when there is none? In particular, forall uses an implicit parameter list, which connects me to the second question.

Sharing PopNameComponent is fine, but perhaps PopNameComponent should have an option not to pop a pattern block?

Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?

Right now, we do have decl_block_id merging a few different things. However, as you note, it's not really intended for inspection. Should parameter patterns be separated similarly to how parameter references are?

Note, I think this intertwines with the former question. If pattern blocks were split, one option might be to use TuplePattern and ImplicitParamList start/end nodes to push/pop pattern blocks. It looks like the start nodes don't have anything attached at present, so with a little work they might be useable as channels to store block IDs too.

To the extent that we might want different behavior between tuple patterns and explicit parameter lists, we currently don't support tuple patterns in check except as parameter lists. But, maybe these should be using distinct parse nodes instead of the same, or it may also get to a related question of how nested patterns should be represented/stored (maybe a nested tuple pattern actually needs this same behavior?).

geoffromer · 2024-09-25T17:42:25Z

I've added the TODOs we agreed on in the toolchain discussion yesterday. I wasn't quite sure where to put the TODO to embed more information in the parse tree, so I went with kind of a saturation approach.

I believe that's all the changes we agreed on, so this is ready for review.

geoffromer · 2024-09-25T20:21:58Z

Note: the contents of #4333 were merged into this PR after review, owing to a git usage error on my part. My apologies for the confusion.

geoffromer added 3 commits August 16, 2024 10:39

binding_pattern in let

99ac089

Checkpoint: tuple pattern support

585d773

Fix handling of params, and inst formatting

45e1ebd

github-actions bot added the toolchain label Aug 16, 2024

github-actions bot requested a review from josh11b August 16, 2024 18:01

geoffromer marked this pull request as draft August 16, 2024 18:05

geoffromer marked this pull request as ready for review August 16, 2024 18:07

github-actions bot requested a review from jonmeow August 16, 2024 18:07

geoffromer removed the request for review from josh11b August 16, 2024 18:08

Fix clang-tidy

c7cbe2e

jonmeow reviewed Aug 16, 2024

View reviewed changes

toolchain/sem_ir/typed_insts.h Show resolved Hide resolved

geoffromer and others added 3 commits August 19, 2024 14:02

Apply suggestions from code review

98130ce

Co-authored-by: Jon Ross-Perkins <jperkins@google.com>

Respond to reviewer comments

d1104e0

Respond to reviewer comments

b07e5a6

geoffromer commented Aug 20, 2024

View reviewed changes

geoffromer added 6 commits August 20, 2024 17:23

Respond to reviewer comments

53316c2

Record better location for BindingPattern inst

591fc76

Revert unintended changes in let handling

8242f56

Move pattern insts to separate block

81eed2d

Roll back obsolete lowering changes

7585f55

Drop "forward references" in BindingPattern

bee1dcf

To keep things simple, this also rolls back the changes to use BindingPattern to generate the match IR. That will be added in future changes.

geoffromer commented Sep 11, 2024

View reviewed changes

toolchain/check/handle_binding_pattern.cpp Outdated Show resolved Hide resolved

toolchain/sem_ir/inst_namer.cpp Show resolved Hide resolved

geoffromer requested a review from jonmeow September 13, 2024 20:51

jonmeow reviewed Sep 17, 2024

View reviewed changes

geoffromer added 2 commits September 18, 2024 11:35

Respond to reviewer comments

8f583fb

Attach pattern blocks to entities instead of insts

ed374dc

geoffromer requested a review from jonmeow September 18, 2024 21:59

Merge branch 'trunk' into inst-name-scope

13f1732

jonmeow reviewed Sep 24, 2024

View reviewed changes

Respond to reviewer comments

c228a7d

geoffromer requested a review from jonmeow September 24, 2024 19:37

Add TODOs, as per open discussion.

755cfad

jonmeow approved these changes Sep 25, 2024

View reviewed changes

geoffromer added 2 commits September 25, 2024 11:17

Merge branch 'trunk' into inst-name-scope

1e34d03

Merge branch 'inst-name-scope' into binding-review

7396aed

geoffromer enabled auto-merge September 25, 2024 18:41

geoffromer added this pull request to the merge queue Sep 25, 2024

Merged via the queue into carbon-language:trunk with commit dc32aa2 Sep 25, 2024
8 checks passed

geoffromer deleted the binding-review branch September 25, 2024 19:45

geoffromer mentioned this pull request Sep 25, 2024

SemIR text format cleanups #4333

Closed

zygoloid mentioned this pull request Sep 25, 2024

Don't substitute into the targeted instructions of an associated constant. #4342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for binding patterns in SemIR #4221

Initial support for binding patterns in SemIR #4221

geoffromer commented Aug 16, 2024 •

edited

Loading

jonmeow left a comment

geoffromer left a comment

geoffromer left a comment

jonmeow left a comment

jonmeow Sep 17, 2024

geoffromer Sep 18, 2024

jonmeow Sep 19, 2024

jonmeow Sep 24, 2024

geoffromer Sep 24, 2024

jonmeow left a comment

jonmeow Sep 17, 2024

jonmeow Sep 19, 2024

geoffromer commented Sep 25, 2024 •

edited

Loading

geoffromer commented Sep 25, 2024

Initial support for binding patterns in SemIR #4221

Initial support for binding patterns in SemIR #4221

Conversation

geoffromer commented Aug 16, 2024 • edited Loading

jonmeow left a comment

Choose a reason for hiding this comment

geoffromer left a comment

Choose a reason for hiding this comment

geoffromer left a comment

Choose a reason for hiding this comment

jonmeow left a comment

Choose a reason for hiding this comment

jonmeow Sep 17, 2024

Choose a reason for hiding this comment

geoffromer Sep 18, 2024

Choose a reason for hiding this comment

jonmeow Sep 19, 2024

Choose a reason for hiding this comment

jonmeow Sep 24, 2024

Choose a reason for hiding this comment

geoffromer Sep 24, 2024

Choose a reason for hiding this comment

jonmeow left a comment

Choose a reason for hiding this comment

jonmeow Sep 17, 2024

Choose a reason for hiding this comment

jonmeow Sep 19, 2024

Choose a reason for hiding this comment

geoffromer commented Sep 25, 2024 • edited Loading

geoffromer commented Sep 25, 2024

geoffromer commented Aug 16, 2024 •

edited

Loading

geoffromer commented Sep 25, 2024 •

edited

Loading