Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Name mangling in IRModules #84

Merged
merged 3 commits into from
Jul 25, 2022
Merged

Conversation

gigiblender
Copy link
Contributor

@gigiblender gigiblender marked this pull request as ready for review June 29, 2022 15:47
@tqchen
Copy link
Member

tqchen commented Jun 29, 2022

Thanks @gigiblender ! a few points for consideration:

  • Ideally NameSupply should interact well with our current linkage spec, e.g. if a function have an attribute global_symbol, it means the name is "final"(because external users are expecting to look it up with that name), likely we need to have global. Anything that does not have a global_symbol attr can subject to change.
  • Would be useful to initialize a NameSupply from an existing IRModule, so it does not need to be carried through out passes and is only needed for pass that regenerate global vars.

In addition to that, it would be really nice to get some smart naming resolve mechanism.

For example, it would be really nice to have the following effect (by parsing the suffix integer separately from prefix string

x = get_next_name("fun12", existing_names=["func12", "func13", "func4"])
assert x == "func14"

This is to avoid the case where we have "func12_0_0_0_0" when multiple set of prefixes are called and they usually looks confusion.

@gigiblender
Copy link
Contributor Author

Thanks @gigiblender ! a few points for consideration:

* Ideally NameSupply should interact well with our current linkage spec, e.g. if a function have an attribute `global_symbol`, it means the name is "final"(because external users are expecting to look it up with that name), likely we need to have global. Anything that does not have a `global_symbol` attr can subject to change.

* Would be useful to initialize a NameSupply from an existing IRModule, so it does not need to be carried through out passes and is only needed for pass that regenerate global vars.

In addition to that, it would be really nice to get some smart naming resolve mechanism.

For example, it would be really nice to have the following effect (by parsing the suffix integer separately from prefix string

x = get_next_name("fun12", existing_names=["func12", "func13", "func4"])
assert x == "func14"

This is to avoid the case where we have "func12_0_0_0_0" when multiple set of prefixes are called and they usually looks confusion.

Thank you for feedback @tqchen! I agree with your suggestions and amended the text with further clarifications.

@areusch
Copy link
Contributor

areusch commented Jun 30, 2022

Copy link
Contributor

@mbs-octoml mbs-octoml left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks so much for doing this -- you should get a nice (and hopefully satisfying) cascade of simplifications and code deletion.

@kparzysz-quic
Copy link

A couple of thoughts...

  • We should have a common facility that represents a scope in which names need to be unique. Conceptually it could be a set of "known" strings (that are presumed used), and something (like a private counter) to generate a unique suffix. The use would be new_name = GenerateName("foo"). If foo had already been generated, new_name would be set to "foo4" (for example). This should not produce global variables (or any IR objects), just names.
  • This RFC still doesn't settle the question whether "name_hint" is a final name or not. I'm open to requiring that global symbols are given final names from the beginning, not hints.

Copy link

@kparzysz-quic kparzysz-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nothing else, this should not generate new IR, only names.

@kparzysz-quic
Copy link

IRModule already has functions to create new globals. With the name-generating NameSupply added to the module, these functions could use that implicitly.

@gigiblender
Copy link
Contributor Author

A couple of thoughts...

* We should have a common facility that represents a scope in which names need to be unique.  Conceptually it could be a set of "known" strings (that are presumed used), and something (like a private counter) to generate a unique suffix.  The use would be `new_name = GenerateName("foo")`.  If `foo` had already been generated, `new_name` would be set to `"foo4"` (for example).  This should not produce global variables (or any IR objects), just names.


* This RFC still doesn't settle the question whether "name_hint" is a final name or not.  I'm open to requiring that global symbols are given final names from the beginning, not hints.

Thanks for the comments @kparzysz-quic. I agree that we could have the NameSupply methods return Strings and make sure that GlobalVars are created using strings provided by a NameSupply (eventually one contained within the IRModule). This way, the NameSupply could be used for other name mangling scenarios where GlobalVars are not needed.

Regarding the name_hint being a final name. I am too of this opinion (it should be a final name) and believe that it already works this way. If I am not wrong, the only edge case is where we use the global_symbol attribute instead.

@mbs-octoml @tqchen any thoughts?

@tqchen
Copy link
Member

tqchen commented Jul 8, 2022

Regarding the name_hint being a final name. I am too of this opinion (it should be a final name) and believe that it already works this way. If I am not wrong, the only edge case is where we use the global_symbol attribute instead.

In this context, it is helpful to look at existing practices. This is a topic related to linking in most compilers, and they are closely related to the linkage type.

  • For a function that have ExternalLinkage, the name cannot change, and whatever name being supplied is the final name, this is because the users of the library needs to be able to refer to the function using the provided name.
  • For a function that have PrivateLinkage, the name can and should be able to change, this is because when we link multiple functions together, there can be two private functions that comes with the same name, as a result, the conflicted function needs to be renamed. Private functions are only referred within the module, so we can afford to rewrite all the callers of the function to the new name.
  • During optimization, if a function with PrivateLinkage is not called by any functions, we can safely remove that function, but we cannot do that for ExternalLinkage.

There are more linkage types in LLVM but this two types roughly are sufficient for our discussion purposes.

Now come back to our context, we roughly need two things:

  • N0: We need an IR node to uniquely identify a function, storing hints to the name, so new names can be generated base on the hint, but not necessarily pinning down the name (due to the need to support PrivateLinkage).
  • N1: We need a way to uniquely specify the final name (during BYOC we need to settle down on a name so different compilations interact based on that name), or specify that the linkage type of the function is external.

Note that N0 means we cannot simply enforce a name or name_hint field to be final, it have to be a combination with another attribute that specifies or derives the linkage type. Our current convention is:

  • When a function have global_symbol attribute, it specifies the "final name"(as in symbol in the symbol tabel), and indicate that the LinkageType is External
  • When a function does not have global_symbol attribute, it means the LinkageType is private, at some time point, the compiler can choose to pick the final name, by attaching global symbol based on the name_hint of the function. But that should be deferred to as late as possible so we can perform optimizations like eliminate dead functions.

@areusch
Copy link
Contributor

areusch commented Jul 11, 2022

@tqchen i think one goal we should have with this RFC is to codify things like

When a function does not have global_symbol attribute, it means the LinkageType is private

the purpose is not to codify it so that we can't change it, but just so that we are all on the same page here. perhaps we can incorporate definitions like this into the RFC text and into helper functions?

@areusch
Copy link
Contributor

areusch commented Jul 11, 2022

i've put this RFC on the meeting agenda for this week's Community Meeting

@tqchen
Copy link
Member

tqchen commented Jul 12, 2022

the purpose is not to codify it so that we can't change it, but just so that we are all on the same page here. perhaps we can incorporate definitions like this into the RFC text and into helper functions?

I agree

@gigiblender
Copy link
Contributor Author

Adding a draft PR to this discussion.

@areusch
Copy link
Contributor

areusch commented Jul 14, 2022

we discussed this at the community meeting yesterday. here are notes:

  • @areusch: one of suggestions raised was to make it possible to reconstruct a NameSupply from an IRModule. we might need to make it possible to initialize it from multiple IRmodules since we currently lower Relay to many separate IRModules in the compiler.
  • @kparzysz-quic : what's the use of having multiple modules, unless they're built for separate devices. do we need mod_name as a field?
    • mod_name is not to distinguish different IRModules at compile time; more to distinguish separate models at runtime via namespace prefix.
  • @kparzysz-quic : what is the connection between the name that we attach to a global and the name that actually ends up in an object file? there is a strong need to have a degree of control over the names that end up in the object file and their visibility. how are we addressing this need? i think there is a lack of this in TVM now. we don't have a way
    • @tqchen: good point. we should document the linkage type. right now there is an implicit convention: the name_hint is not final (it's just an internal-to-compiler concept) and there's a GlobalVar which enforces that there is external linkage type with a fixed name. e.g. if we are splitting the compilation flow, then there is an expectation that when calling into a symbol, the name won't change. another rule: if a function has attribute kGlobalSymbol, you cannot change it or even delete it.
    • @manupa-arm : when we say we want to have control: do we want this to extend to Relay? if a Relay module contains a GlobalVar, should that be emitted with exactly that name?
  • @areusch some context for this RFC: in the C++ runtime, we previously mostly cared that a GlobalVar name could be passed to Module.GetFunction() and then that function would find that implemented PrimFunc. When we implemented AOT in the C runtime, we changed to expecting to be able to directly call those names as C symbols. This caused a bit of a problem when we moved to implement both the C and C++ AOTExecutor wrappers, where metadata lookups could be different because of name mangling. To solve that, we intended to move in a direction where the names of all symbols were fixed at birth, thus NameSupply.
    • @kparzysz-quic: who is typically the caller of these functions?
    • @areusch typically an executor
    • @kparzysz-quic can we modify the names of the global? shouldn't matter so long as everyone agrees

@areusch
Copy link
Contributor

areusch commented Jul 25, 2022

@gigiblender is this ready for a last review?

@gigiblender
Copy link
Contributor Author

@gigiblender is this ready for a last review?

@areusch yes

@areusch areusch merged commit 831d702 into apache:main Jul 25, 2022
@areusch
Copy link
Contributor

areusch commented Jul 25, 2022

thanks @gigiblender , the RFC has been accepted! please create a tracking issue and mention this PR in that issue to link it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants