Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an overload of CSharpSyntaxTree.Create that allows to specify checksum algorithm #62923

Closed
tmat opened this issue Jul 25, 2022 · 8 comments
Closed
Assignees
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation Area-Compilers blocking API needs to reviewed with priority to unblock work Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request
Milestone

Comments

@tmat
Copy link
Member

tmat commented Jul 25, 2022

Background and Motivation

The existing API always used SourceHashAlgorithm.Sha1 but the IDE needs to be able to create parsed syntax trees backed by text with specific checksum algorithm.

See #62840 for the change, including usage in the IDE.

Proposed API

class CSharpSyntaxTree
{
  public SyntaxTree Create(
   CSharpSyntaxNode root,
   CSharpParseOptions? options = null,
   string? path = "",
   Encoding? encoding = null);

+ public SyntaxTree Create(
+   CSharpSyntaxNode root,
+   CSharpParseOptions? options = null,
+   string? path = "",
+   Encoding? encoding = null, 
+   SourceHashAlgorithm checksumAlgorithm = SourceHashAlgorithm.Sha1);
)

In addition, similarly to SyntaxTree.Encoding make checksum algorithm available from SyntaxTree instance:

class SyntaxTree
{
+ public abstract SourceHashAlgorithm ChecksumAlgorithm { get; }
}

and add a factory method to SyntaxFactory:

class SyntaxFactory
{
- public static SyntaxTree SyntaxTree(
-    SyntaxNode root, ParseOptions? options = null, string path = "", Encoding? encoding = null)

+ public static SyntaxTree SyntaxTree(
+   SyntaxNode root, ParseOptions? options, string path, Encoding? encoding)

+ public static SyntaxTree SyntaxTree(
+   SyntaxNode root,
+   ParseOptions? options = null,
+   string path = "",
+   Encoding? encoding = null,
+   SourceHashAlgorithm checksumAlgorithm = SourceHashAlgorithm.Sha1)
}


Similarly for VB.
@tmat tmat added Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request labels Jul 25, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Jul 25, 2022
@jcouv jcouv removed the untriaged Issues and PRs which have not yet been triaged by a lead label Jul 28, 2022
@jcouv jcouv added this to the 17.4 milestone Jul 28, 2022
@jcouv
Copy link
Member

jcouv commented Jul 28, 2022

Assigned to @333fred and @CyrusNajmabadi to triage/drive as needed, as area owners for api-syntax.

@333fred
Copy link
Member

333fred commented Aug 5, 2022

@tmat what's the motivation for using a different hashing algorithm? Not criticizing, just curious.

@tmat
Copy link
Member Author

tmat commented Aug 5, 2022

Need to use the same hash algorithm as specified in the project settings. As explained in #62840, Hot Reload needs to check for a given Document whether or not it matches the compiled code. It does so by comparing the checksum stored in the PDB with the checksum of the Document. The checksum algorithm thus needs to flow through the system.

@tmat
Copy link
Member Author

tmat commented Aug 5, 2022

Another reason is that it should be possible for users of our API to avoid using SHA1 entirely. Having SHA1 as a default parameter of an API that can't be changed makes that impossible.

@333fred 333fred added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Aug 5, 2022
@333fred 333fred assigned tmat and unassigned 333fred and CyrusNajmabadi Aug 8, 2022
@jaredpar jaredpar modified the milestones: 17.4, C# 12.0 Aug 16, 2022
@333fred 333fred added the blocking API needs to reviewed with priority to unblock work label Aug 17, 2022
@333fred
Copy link
Member

333fred commented Aug 18, 2022

API Review

  • This seems like it would be a constant value of the input text, rather than needing to be recalculated by the tree.
  • We need more background info on where the checksum is needed and when it needs to be calculated.

@333fred 333fred added api-needs-work API needs work before it is approved, it is NOT ready for implementation api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation api-needs-work API needs work before it is approved, it is NOT ready for implementation labels Aug 18, 2022
@tmat
Copy link
Member Author

tmat commented Aug 19, 2022

This seems like it would be a constant value of the input text, rather than needing to be recalculated by the tree.

The tree either holds on a SourceText, if it's created directly from SourceText or it creates SourceText lazily if it's created from a syntax node:
https://sourceroslyn.io/#Microsoft.CodeAnalysis.CSharp/Syntax/CSharpSyntaxTree.ParsedSyntaxTree.cs,63.

In the latter case the checksum algorithm is passed from the SyntaxTree to the SourceText being created. Currently ParsedSyntaxTree is only created with hardcoded checksum algorithm.

The property is added to SyntaxTree in order to avoid creating the underlying SourceText unnecessarily, only when the checksum algorithm is needed.

@333fred
Copy link
Member

333fred commented Sep 6, 2022

API Review

  • Common case is the SourceText is what is calculating the checksum
  • Default implementation forwards to SourceText's property
  • However, IDE creates trees without the SourceText in some occasions
    • Don't want to rehydrate the SourceText for this scenario
  • We'll make sure doc comments on the property explain the usage
  • Can't calculate from the nodes: BOM can't be round-tripped, for example, as it's not represented in in-memory string
  • Might consider changing the return of Encoding to be none for trees from binary data or other non-file sources, but that would be a breaking change.
  • Do we regret having Encoding on the tree as well?
    • Neither Encoding or Checksum are really components of the tree itself. The tree isn't influenced by them, and just turns the tree into a property bag
    • Could the implementation PR keep this info in a different location?
  • Maybe we could create SourceTextProvider component that gets a callback when we need to make a tree?
    • But GetText is already virtual, derived types can override as appropriate
  • Next steps will be attempting to make Encoding obsolete in the IDE and see what progress we can make with plumbing extra info in that way.

@333fred 333fred added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Sep 6, 2022
@tmat
Copy link
Member Author

tmat commented Sep 13, 2022

Changes merged in #63832 enabled defining a subclass of CSharpSyntaxTree/VisualBasicSyntaxTree that customizes creation of SourceText. This approach can be used instead of passing checksum algorithm to SyntaxFactory.CreateTree.

@tmat tmat closed this as completed Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation Area-Compilers blocking API needs to reviewed with priority to unblock work Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request
Projects
None yet
Development

No branches or pull requests

5 participants