Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for binding patterns in SemIR #4221

Merged
merged 21 commits into from
Sep 25, 2024

Conversation

geoffromer
Copy link
Contributor

@geoffromer geoffromer commented Aug 16, 2024

Introduces the BindingPattern and SymbolicBindingPattern insts, and a separate stack of pattern blocks that they are emitted into. The intent is to generate the corresponding pattern-matching insts (like BindName) from them in a separate pass, but that is deferred to future PRs.

See here for the design this is based on, but note that during review we have chosen to deviate from that design by putting the patterns in separate blocks, and omitting the "forward references" from a BindingPattern to its corresponding BindName. This in turn necessitates having separate inst kinds for symbolic and non-symbolic binding patterns.

@github-actions github-actions bot requested a review from josh11b August 16, 2024 18:01
@geoffromer geoffromer marked this pull request as draft August 16, 2024 18:05
@geoffromer geoffromer marked this pull request as ready for review August 16, 2024 18:07
@github-actions github-actions bot requested a review from jonmeow August 16, 2024 18:07
@geoffromer geoffromer removed the request for review from josh11b August 16, 2024 18:08
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be sure I understand correctly, is this change supposed to only be adding BindingPattern without really using it? Can you please use the PR description to elaborate a little on what's being implemented here?

Copy link
Contributor Author

@geoffromer geoffromer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be sure I understand correctly, is this change supposed to only be adding BindingPattern without really using it?

This PR "really uses it" in the way that it's designed to be used. In particular, we traverse the BindingPatterns in order to emit the corresponding BindNames.

Can you please use the PR description to elaborate a little on what's being implemented here?

How's this?

Copy link
Contributor Author

@geoffromer geoffromer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm particularly looking for feedback on the textual IR changes, because I'm not familiar enough with the conventions to tell if what I have is good, or how to improve them if not.

@geoffromer geoffromer requested a review from jonmeow September 13, 2024 20:51
Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I'm still looking at this today, but wanted to publish comments in case you want to discuss something during the toolchain meeting. Overall I think the direction this PR is taking seems a good start, though there are still some details that may be good to discuss (particularly see DeclInfo).

Comment on lines 12 to 19
// Information about a declaration.
struct DeclInfo {
// The pattern block, containing the pattern insts for the parameter pattern.
InstBlockId pattern_block_id;
// The declaration block, containing the declaration's parameters and their
// types.
InstBlockId decl_block_id;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What had you considered as alternatives before the DeclInfo approach? A couple I'm wondering about are:

  • Adding an instruction to decl_block_id which just declares that there's a pattern
    • e.g. struct PatternBlock { InstBlockId block_id; }
    • i.e., as a generic approach that could apply to any block that adds a pattern.
  • Putting pattern_block_id on EntityWithParams
    • i.e., tracking it alongside parameters that it's providing a pattern for

To explain why I'm thinking about this, adding more ValueStores feels a little incremental complexity for understanding the toolchain. decl_block_id was added to instructions because it fit into the instruction and was mainly for formatting. However, as part of splitting out to a new structure, it's no longer taking advantage of free space in the declaration's instruction.

Maybe it'd help for me to understand, what's the access pattern you expect for pattern_block_id?

If only kept for the printed IR, the PatternBlock instruction approach might be good: it keeps an association that you could get formatted instructions for, and also provides an approach that would work for arbitrary constructs. The overhead is marginally higher (essentially 12 more bytes), but more flexible beyond the current set of declarations. I expect this approach would need the PatternBlock to be skipped for lowering, but it'd just be the one instruction.

If you need to go back to the pattern block from the decl instruction later during checking, then EntityWithParams might be good: we would probably need it to be associated with the params themselves. EntityWithParams is already shared where you're using DeclInfo, meaning overhead is marginally lower (essentially 4 fewer bytes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't considered those options. I've added the pattern block to EntityWithParams, although note that Impl doesn't derive from EntityWithParams (yet?), so I had to also add it there. This feels a little awkward to me, because the pattern block is conceptually more closely tied to the decl block than to the rest of the entity, but that may be preferable to adding another ValueStore.

I don't really expect the toolchain to access pattern_block_id except when rendering the IR, but part of the rationale for introducing the pattern block and attaching it to some "owner" is because other tools (e.g. static analysis) might want to access it, and that will be much easier if it's a named member of some object rather than an inst they have to scan for. I also don't think the PatternBlock approach will be as flexible as you say: in other uses of patterns (e.g. let declarations), there's no block we can add it to while preserving the structural connection between the pattern insts and the pattern-match insts, so it will have to be a named member in those cases.

Side note: this changes the representation of chains of function-style qualifiers like Class(T:! type).F(n: T) (see toolchain/check/testdata/class/generic/member_out_of_line.carbon for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: this changes the representation of chains of function-style qualifiers like Class(T:! type).F(n: T) (see toolchain/check/testdata/class/generic/member_out_of_line.carbon for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.

I'm not seeing the issue, can you elaborate?

That said, I have an extended comment here, but let me ask two questions:

  1. Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?
  2. Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?

I'm asking this way because I think they're intertwined in how they'd be addressed; let me give some detail on both:

  1. Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?

In addition to alias and similar, consider impl. I see patterns in the example testdata/impl/impl_forall.carbon, but not in the general case. Here's an example:

impl forall [T:! type] T as Simple {
  fn F() {}
}

...

// CHECK:STDOUT:   impl_decl @impl {
// CHECK:STDOUT:     %T.patt: type = symbolic_binding_pattern T 0
// CHECK:STDOUT:   } {

Had you considered only creating a pattern block when there's a forall, avoiding the creation when there is none? In particular, forall uses an implicit parameter list, which connects me to the second question.

Sharing PopNameComponent is fine, but perhaps PopNameComponent should have an option not to pop a pattern block?

  1. Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?

Right now, we do have decl_block_id merging a few different things. However, as you note, it's not really intended for inspection. Should parameter patterns be separated similarly to how parameter references are?

Note, I think this intertwines with the former question. If pattern blocks were split, one option might be to use TuplePattern and ImplicitParamList start/end nodes to push/pop pattern blocks. It looks like the start nodes don't have anything attached at present, so with a little work they might be useable as channels to store block IDs too.

To the extent that we might want different behavior between tuple patterns and explicit parameter lists, we currently don't support tuple patterns in check except as parameter lists. But, maybe these should be using distinct parse nodes instead of the same, or it may also get to a related question of how nested patterns should be represented/stored (maybe a nested tuple pattern actually needs this same behavior?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that Impl doesn't derive from EntityWithParams (yet?),

Note, #4336 is changing this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: this changes the representation of chains of function-style qualifiers like Class(T:! type).F(n: T) (see toolchain/check/testdata/class/generic/member_out_of_line.carbon for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.

I'm not seeing the issue, can you elaborate?

See %F.decl in that file: after my revisions last week (specifically this one), there's no longer a symbolic_binding_pattern T in the pattern block. By contrast, there is a bind_symbolic_name T in the decl block.

I'm going to take your other questions in reverse order:

  1. Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?

My plan is to extend NameComponent and EntityWithParams with two new InstBlockId members that relate to the pattern block in much the same way that implicit_params_id and params_id currently relate to the decl block:

  • They don't "own" the insts they contain, but refer to insts in the pattern block.
  • They have one inst per parameter.
  • Each inst is the "root" inst for the corresponding parameter: all the pattern insts relating to that parameter are reachable from it.
    This isn't just for future SemIR consumers; check itself is going to need it once we shift to generating the pattern-match SemIR from the pattern SemIR.

I already have that implemented, but it seems out of scope for this PR.

  1. Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?

The basic problem is that the AST doesn't give us a reliable context-free signal that a full-pattern is starting (or ending), so we have to push a pattern block whenever one might be about to start, and pop once the full-pattern (if any) has definitely ended. That applies even to alias and namespace because the AST allows those to be parameterized.

You're right that we could avoid that problem if we put the implicit and explicit parameter lists in separate pattern blocks, and used separate AST node kinds for parameter lists and other tuple patterns. However, the consensus on Discord seemed to be to have one pattern block for each full-pattern, and to me that seems like the more semantically faithful representation.

Furthermore, I don't know if that change would really gain us much. My strong suspicion is that if a SemIR consumer cares about the distinction between implicit and explicit parameters, they probably care about which parameter a given inst is associated with. That means they aren't going to want to work with the pattern block directly, because it's too unstructured -- they're going to want to work with the one-inst-per-parameter blocks that I mentioned earlier, and traverse from there. That's certainly been my experience so far with consuming the pattern IR in toolchain code.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I guess I was almost done looking through, just a couple more comments.

@@ -41,6 +41,8 @@ auto HandleParseNode(Context& context, Parse::ClassIntroducerId node_id)
context.decl_name_stack().PushScopeAndStartName();
// This class is potentially generic.
StartGenericDecl(context);
// Start a new pattern block for the signature.
context.pattern_block_stack().Push();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, but something else to consider with storage might be whether we want the parameters and implicit parameters to each have their own pattern block. A single block is fine for now, I'm not asking for a change in this PR, but it might be something that affects the implementation more later.

@geoffromer geoffromer requested a review from jonmeow September 18, 2024 21:59
- Put decl block insts in the scope of the declared entity.
- Ignore attempts to name an inst that already has a name (but require those attempts to be in the same scope).
- Give `Param` insts a ".param" suffix, to avoid colliding with the corresponding `BindName`.
Comment on lines 12 to 19
// Information about a declaration.
struct DeclInfo {
// The pattern block, containing the pattern insts for the parameter pattern.
InstBlockId pattern_block_id;
// The declaration block, containing the declaration's parameters and their
// types.
InstBlockId decl_block_id;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: this changes the representation of chains of function-style qualifiers like Class(T:! type).F(n: T) (see toolchain/check/testdata/class/generic/member_out_of_line.carbon for an example), but I think both forms are probably wrong, and I think the pre-existing representation of the decl is probably wrong too, so I'd rather punt on that whole issue for now.

I'm not seeing the issue, can you elaborate?

That said, I have an extended comment here, but let me ask two questions:

  1. Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?
  2. Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?

I'm asking this way because I think they're intertwined in how they'd be addressed; let me give some detail on both:

  1. Why do declarations always have a pattern block, even declarations that never have a pattern (such as alias and namespace)?

In addition to alias and similar, consider impl. I see patterns in the example testdata/impl/impl_forall.carbon, but not in the general case. Here's an example:

impl forall [T:! type] T as Simple {
  fn F() {}
}

...

// CHECK:STDOUT:   impl_decl @impl {
// CHECK:STDOUT:     %T.patt: type = symbolic_binding_pattern T 0
// CHECK:STDOUT:   } {

Had you considered only creating a pattern block when there's a forall, avoiding the creation when there is none? In particular, forall uses an implicit parameter list, which connects me to the second question.

Sharing PopNameComponent is fine, but perhaps PopNameComponent should have an option not to pop a pattern block?

  1. Considering programmatic access for tools, how would a tool disambiguate between explicit and implicit parameter patterns?

Right now, we do have decl_block_id merging a few different things. However, as you note, it's not really intended for inspection. Should parameter patterns be separated similarly to how parameter references are?

Note, I think this intertwines with the former question. If pattern blocks were split, one option might be to use TuplePattern and ImplicitParamList start/end nodes to push/pop pattern blocks. It looks like the start nodes don't have anything attached at present, so with a little work they might be useable as channels to store block IDs too.

To the extent that we might want different behavior between tuple patterns and explicit parameter lists, we currently don't support tuple patterns in check except as parameter lists. But, maybe these should be using distinct parse nodes instead of the same, or it may also get to a related question of how nested patterns should be represented/stored (maybe a nested tuple pattern actually needs this same behavior?).

@geoffromer geoffromer requested a review from jonmeow September 24, 2024 19:37
@geoffromer
Copy link
Contributor Author

geoffromer commented Sep 25, 2024

I've added the TODOs we agreed on in the toolchain discussion yesterday. I wasn't quite sure where to put the TODO to embed more information in the parse tree, so I went with kind of a saturation approach.

I believe that's all the changes we agreed on, so this is ready for review.

@geoffromer geoffromer added this pull request to the merge queue Sep 25, 2024
Merged via the queue into carbon-language:trunk with commit dc32aa2 Sep 25, 2024
8 checks passed
@geoffromer geoffromer deleted the binding-review branch September 25, 2024 19:45
@geoffromer
Copy link
Contributor Author

Note: the contents of #4333 were merged into this PR after review, owing to a git usage error on my part. My apologies for the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants