Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MassaLabs: Allow named constants range notation #355

Merged

Conversation

Leo-Besancon
Copy link

@Leo-Besancon Leo-Besancon commented Aug 13, 2024

Draft PR for #266

I wanted some input for the design there @bitwalker.
The main issue is that AirScript aggregate types contain the size in their definitions, and so we don't know the type of ranges containing named constants until we resolve their value.
Moreover, the semantic analysis pass expects to know the various sizes.
For example, while checking list comprehension, we check that all iterator types are the same. Some code use placeholder (u32::MAX as usize) when the size is unknown, (I think this is currently unused), but I'm thinking of two options:

  • Either try and deduce the expected type or use the placeholder if needed for the semantic analysis pass. Then, during the constant propagation pass, we resolve the named constant and update the type (perhaps also make additional checks that
    we couldn't do during semantic analysis without knowing the full type). I tried to do this at first but it seemed a bit ugly
  • Or we resolve the constant value before / during semantic analysis (done in this draft PR)

What are your thoughts? Is there anything else you think I'm missing?

TODO:

  • Improve the constant value resolution (for ranges) during semantic analysis
  • Improve robustness (e.g various commented sections / todo!() that should become either become unreachable or better handled)
  • Parser tests rely on PartialEq, and there is some inconsistency between ranges that have a resolved type and the others. As a workaround I changed the PartialEq for RangeExpr to not check the type.
  • Clean commit history
  • Add documentation

@bitwalker
Copy link
Contributor

bitwalker commented Aug 15, 2024

What are your thoughts? Is there anything else you think I'm missing?

@Leo-Besancon I needed to get my head back in the gritty details a bit, so I ended up prototyping a solution for this in my bitwalker/range-constants branch. Feel free to build or riff off of it as you see fit.

My implementation is based on the assumption that named constants can be replaced with the actual values very early during compilation. This is because we have no reason to preserve the use of the name, and everywhere else relies on being able to access the actual value (as you've discovered). So this is exactly what I've implemented in my branch - after parsing, but during the initial visit of a module during semantic analysis, any time an expression is evaluated, we are first resolving all symbols of that expression, so we take that opportunity to ensure that references to constant identifiers are valid. Then, in the case of ranges, the next thing we do is visit the range bounds, and after resolving the symbol, rewrite the bound with the actual constant that was referenced, or raise an error if the reference was invalid (either the symbol doesn't exist, isn't a constant, or the value is not a valid constant scalar value).

Then in all subsequent parts of the pipeline, it is safe to assume that range bounds are always constant, and panic if that assumption is violated.

The solution is a bit hacky, but largely because we are using the AST as an IR as well, so we end up having to encode a lot of information in the same data structure, which lends to it feeling hacky. Once we have a proper IR in place, the IR can assume all references to named constants are reified and things become a lot cleaner.

Let me know if you have further questions, or run into any issues with your implementation!

@Leo-Besancon
Copy link
Author

What are your thoughts? Is there anything else you think I'm missing?

@Leo-Besancon I needed to get my head back in the gritty details a bit, so I ended up prototyping a solution for this in my bitwalker/range-constants branch. Feel free to build or riff off of it as you see fit.

My implementation is based on the assumption that named constants can be replaced with the actual values very early during compilation. This is because we have no reason to preserve the use of the name, and everywhere else relies on being able to access the actual value (as you've discovered). So this is exactly what I've implemented in my branch - after parsing, but during the initial visit of a module during semantic analysis, any time an expression is evaluated, we are first resolving all symbols of that expression, so we take that opportunity to ensure that references to constant identifiers are valid. Then, in the case of ranges, the next thing we do is visit the range bounds, and after resolving the symbol, rewrite the bound with the actual constant that was referenced, or raise an error if the reference was invalid (either the symbol doesn't exist, isn't a constant, or the value is not a valid constant scalar value).

Then in all subsequent parts of the pipeline, it is safe to assume that range bounds are always constant, and panic if that assumption is violated.

The solution is a bit hacky, but largely because we are using the AST as an IR as well, so we end up having to encode a lot of information in the same data structure, which lends to it feeling hacky. Once we have a proper IR in place, the IR can assume all references to named constants are reified and things become a lot cleaner.

Let me know if you have further questions, or run into any issues with your implementation!

Hello, perfect, thank you for answering my question in such a detailed manner! It helps clear things up and aligns with what I thought was the more realistic approach.
I see some things in your code that's certainly cleaner than what I was trying to do, so I'll build on some of it.

sydhds and others added 7 commits August 22, 2024 09:15
This makes it possible to reference named constants in a range, or slice
indexing operation.

NOTE: The named constants are rewritten with the values they refer to
early during compilation, so in most of the pipeline you will never be
able to observe anything but constant range bounds. However, rather than
complicate the AST further by making that distinction explicit, we reuse
a single type for all ranges, which means we have a lot of
apparently-fallible operations in places where they are actually
infallible, but this seems like a decent compromise.
@Leo-Besancon Leo-Besancon force-pushed the allow-named-constants-range-notation branch from 20e2ea4 to 0a7796c Compare August 22, 2024 07:16
@Leo-Besancon Leo-Besancon marked this pull request as ready for review August 22, 2024 07:45
@bobbinth bobbinth requested review from bobbinth and bitwalker August 25, 2024 18:50
Copy link
Contributor

@bitwalker bitwalker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Definitely a handy improvement!

@bitwalker
Copy link
Contributor

@bobbinth Let me know if you still plan to review this, otherwise I'll try to get this merged tomorrow

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you! Though this is a pretty shallow review from me.

@Leo-Besancon
Copy link
Author

Hello @bitwalker did you want to merge this one? :)

@bitwalker bitwalker merged commit 900a80a into 0xPolygonMiden:next Sep 18, 2024
4 checks passed
@bitwalker
Copy link
Contributor

@Leo-Besancon Sorry about that, forgot to follow up on this, thanks for the ping!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants