Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Captures Invalid Dimensions from an array variable declaration. #32843

Conversation

YairHalberstadt
Copy link
Contributor

Initially this pull request enables the dimensions specified in error in a variable declaration such as this:

int[10] x;

To be captured in a BoundTypeExpression as BoundDimensionsOpt, and to be consumed via an added IgnoredDimensions Property in IVariableDeclarationOperation.

See #32464

In future pull requests I may look at including support for capturing the invalid dimensions in:

  • parameters
  • foreach and for loop variables
  • typeof expressions
  • type arguments
  • is type expressions
  • casts
  • as expressions
  • Return types
  • Field types
    etc.

It is unlikely that I will do all of these however, so suggestions as to which would be highest priority would be useful.

@YairHalberstadt YairHalberstadt requested a review from a team as a code owner January 26, 2019 22:55
@jcouv jcouv added Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee. labels Jan 27, 2019
@gafter gafter self-assigned this Jan 27, 2019
@gafter gafter added this to the 16.0.P4 milestone Jan 27, 2019
@gafter
Copy link
Member

gafter commented Jan 28, 2019

@YairHalberstadt There are a few failed tests.

@YairHalberstadt
Copy link
Contributor Author

Should be done now. They were broken by the merge from Master.

/// Array dimensions supplied to an array declaration in error cases, ignored by the compiler. This is only used for the C# case of
/// RankSpecifierSyntax nodes on an ArrayTypeSyntax.
/// </summary>
ImmutableArray<IOperation> IgnoredDimensions { get; }
Copy link
Contributor

@AlekseyTs AlekseyTs Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ImmutableArray IgnoredDimensions { get; } [](start = 8, length = 53)

It looks like we already expose this information through IVariableDeclaratorOperation.IgnoredArguments #Closed

Copy link
Contributor Author

@YairHalberstadt YairHalberstadt Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's for

int x[10]

Not

int[10] x
``` #Closed

var boundDeclType = new BoundTypeExpression(typeSyntax, aliasOpt, inferredType: isVar, type: declTypeOpt.TypeSymbol);

ImmutableArray<BoundExpression> dimensionsOpt;
if (typeSyntax is ArrayTypeSyntax arrayTypeSyntax && arrayTypeSyntax.RankSpecifiers.Count > 0)
Copy link
Contributor

@AlekseyTs AlekseyTs Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typeSyntax is ArrayTypeSyntax arrayTypeSyntax [](start = 16, length = 45)

Do we need to worry about type syntax other than ArrayTypeSyntax? For example, a generic type with an array type as a type argument. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an array creation that would also be invalid. For now I'm just trying to capture cases where the syntax would be valid in an ArrayCreationExpression, but invalid in a variable declaration, as I feel these are probably the most common cases.

I think there a huge amount of scope to do more here. This is meant to be an initial pull request with the minimum changes required to be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I'm just trying to capture cases where the syntax would be valid in an ArrayCreationExpression, but invalid in a variable declaration, as I feel these are probably the most common cases.

Why are these scenarios more interesting? What has changed in variable declarations around array type syntax?

I think there a huge amount of scope to do more here. This is meant to be an initial pull request with the minimum changes required to be useful.

I do not think I support this approach. It becomes very hard to make changes to features that has shipped due to backward compatibility considerations. This PR goes to master. It is not clear to me what is the final plan, how do we think it should look like at the end and what is the plan for us to get there before we ship. I'd rather have clarity about all the scenarios that we should handle here.


In reply to: 251617370 [](ancestors = 251617370)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we would have to put the member on a different node of the IOperation tree, so I view this as two seperate APIs.

I listed a large number of potential places where we could expose this information at the top of the pull request. I think we have to decide exactly which of them should go in the initial PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should handle all types here, not just array types. Dig through any type and collect sizes from all array types within it.


In reply to: 251623089 [](ancestors = 251623089,251617370)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to worry about type syntax other than ArrayTypeSyntax? For example, a generic type with an array type as a type argument.

This always ends up being parsed as a BinaryExpression, as when a generic type declaration cannot be parsed correctly, it seems we parse < as Less Than.

I can't think of any cases where we would parse an expression type as type syntax which would contain an invalid dimension other than an ArrayTypeSyntax.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This always ends up being parsed as a BinaryExpression, as when a generic type declaration cannot be parsed correctly, it seems we parse < as Less Than.

I can't think of any cases where we would parse an expression type as type syntax which would contain an invalid dimension other than an ArrayTypeSyntax.

Can one create such a tree manually by using SyntaxFactory? Also, I might be wrong, but I think with the recent changes to the parser an element type of the array type could be a nullable array type with sizes as well. Also, since we are going to do the collection of misplaced sizes for other contexts too, and in those contexts parser's behavior around handling of < might be slightly different, or it might change in the future, I think it would be more robust to create a helper that doesn't make any assumption on how the parser works, simply visits the syntax (by using a syntax visitor) and collects all size expressions from array types in the syntax.


In reply to: 253308012 [](ancestors = 253308012)

var sizes = ArrayBuilder<BoundExpression>.GetInstance();
// We don't want to report any errors from this, as the syntax is invalid anyway
var dummyDiagnostics = DiagnosticBag.GetInstance();
foreach (var expressionSyntax in arrayTypeSyntax.RankSpecifiers[0].Sizes)
Copy link
Contributor

@AlekseyTs AlekseyTs Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RankSpecifiers[0] [](start = 65, length = 17)

What about other rank specifiers? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

In an array creation that would also be invalid. For now I'm just trying to capture cases where the syntax would be valid in an ArrayCreationExpression, but invalid in a variable declaration, as I feel these are probably the most common cases.

I think there a huge amount of scope to do more here. This is meant to be an initial pull request with the minimum changes required to be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

I do not think I support this approach. I see no good reason to handle some sizes, but not the others.


In reply to: 251617530 [](ancestors = 251617530)

Copy link
Contributor Author

@YairHalberstadt YairHalberstadt Feb 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that we distinguish in the bound tree and the Semantic Model between:

int[10][] x;
int[][10] x;

If so we will need to add a new type of BoundNode which contains this information as well as the BoundExpression for the invalid dimensions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important that we distinguish in the bound tree and the Semantic Model between:

csharp
int[10][] x;
int[][10] x;

I cannot think of any need for this at the moment. The sizes are misplaced expressions.


In reply to: 253308442 [](ancestors = 253308442)

@@ -2858,6 +2848,26 @@ private BoundExpression BindArrayCreationExpression(ArrayCreationExpressionSynta
: BindArrayCreationWithInitializer(diagnostics, node, node.Initializer, type, arraySizes);
}

private BoundExpression BindArrayRankSpecifier(ExpressionSyntax arrayRankSpecifier, SyntaxNode node, DiagnosticBag diagnostics)
Copy link
Contributor

@AlekseyTs AlekseyTs Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrayRankSpecifier [](start = 72, length = 18)

The name feels misleading, this isn't a syntax for a rank specifier. #Closed

var dummyDiagnostics = DiagnosticBag.GetInstance();
foreach (var expressionSyntax in arrayTypeSyntax.RankSpecifiers[0].Sizes)
{
var size = BindArrayRankSpecifier(expressionSyntax, typeSyntax, dummyDiagnostics);
Copy link
Contributor

@AlekseyTs AlekseyTs Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var size = BindArrayRankSpecifier(expressionSyntax, typeSyntax, dummyDiagnostics); [](start = 20, length = 82)

It looks like, when a declaration statement contains multiple declarators, we are going to bind sizes multiple times. Moreover, the bound tree is going to have multiple distinct bound nodes associated with the same syntax. This is likely to create problems with SemanticModel and probably in other components. I think the sizes should be bound once (where the typeSyntax is bound) and the bound tree shouldn't contain duplicate nodes associated with sizes. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we do, and I have tested for that in the semantic model, but I will take another look when I get the chance.

What I believe happens is that multiple declarations all have access to the same BoundTypeExpression. There is no obvious way to avoid this. In the Semantic model we just take the first one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking back to it, a Multiple Local Declaration just contains a list of boundLocalDeclarations, and no BoundTypeExpression. Unless we plan to change that, the current approach is I think most sensible.

We already repeat the BoundTypeExpression multiple times in the tree, and the dimensions is a member of the BoundTypeExpression, so I do not see that this is any worse than what we already have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already repeat the BoundTypeExpression multiple times in the tree, and the dimensions is a member of the BoundTypeExpression, so I do not see that this is any worse than what we already have.

Yes, we probably have duplicates for BoundTypeExpression today, this is not good and we'd rather not make problem worse by duplicating even more nodes. SemanticModel builds a map from a syntax node to bound nodes associated with the syntax. In general, it is not Ok to have duplicates in that list of associated bound nodes. This might not cause observable problems with BoundTypeExpression because those are not real expressions. Or problems exist, but we simply not aware of them. I am not comfortable with duplicating bound nodes for expressions, because a lot more things happen for regular expressions.

In any case, It is not OK to bind the same expression over and over again when we can easily avoid that.


In reply to: 251624426 [](ancestors = 251624426)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give it a try when I get a chance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlekseyTs

I think the sizes should be bound once (where the typeSyntax is bound) and the bound tree shouldn't contain duplicate nodes associated with sizes.

Where exactly do you mean by this?

As far as I can see there is no single location where we Bind a Type Syntax. Instead we create a BoundTypeExpression in a number of places, including here.

As far as I can see in a variable declaration the only time the TypeSyntax is Bound is when Binding the BoundLocalDeclaration.

Since a BoundMultipleLocalDeclarations is just a list of of BoundlocalDeclarations, there is no way that I can see to avoid this being bound multiple times.

Perhaps the real problem is with our definition of a BoundMultipleLocalDeclaration and BoundlocalDeclaration.

  <!--
  Bound node that represents a single local declaration:
    int x = Foo();
    
  NOTE: The node does NOT introduce the referenced local into surrounding scope.
        A local must be explicitly declared in a BoundBlock to be usable inside it.
        
  NOTE: In an error recovery scenario we might have a local declaration parsed as
        int x[123] - This is an error commonly made by C++ developers who come to
        C#. We will give a good error about it at parse time, but we should preserve
        the semantic analysis of the argument list in the bound tree.
  -->
  <Node Name="BoundLocalDeclaration" Base="BoundStatement">
    <Field Name="LocalSymbol" Type="LocalSymbol"/>
    <Field Name="DeclaredType" Type="BoundTypeExpression"/>
    <Field Name="InitializerOpt" Type="BoundExpression" Null="allow"/>
    <Field Name="ArgumentsOpt" Type="ImmutableArray&lt;BoundExpression&gt;" Null="allow"/>
  </Node>

  <!--
  Bound node that represents multiple local declarations:
    int x =1, y =2;
    
  Works like multiple BoundLocalDeclaration nodes.  
  -->
  <Node Name="BoundMultipleLocalDeclarations" Base="BoundStatement">
    <Field Name="LocalDeclarations" Type="ImmutableArray&lt;BoundLocalDeclaration&gt;"/>
  </Node>
            if (variableCount == 1)
            {
                return BindVariableDeclaration(kind, isVar, variableList[0], typeSyntax, declType, alias, diagnostics, node);
            }
            else
            {
                BoundLocalDeclaration[] boundDeclarations = new BoundLocalDeclaration[variableCount];
                int i = 0;
                foreach (var variableDeclarationSyntax in variableList)
                {
                    boundDeclarations[i++] = BindVariableDeclaration(kind, isVar, variableDeclarationSyntax, typeSyntax, declType, alias, diagnostics);
                }
                return new BoundMultipleLocalDeclarations(node, boundDeclarations.AsImmutableOrNull());
            }

We are making a fundamental distinction in the tree between a Declaration of multiple locals and a declaration of only one. When we declare multiple locals we create a list of local declarations, and when we are declaring only one we bind the single declaration directly. I don't see why such a distinction should exist.

Instead every declaration can contain an arbitrary number of locals. There is no need to special case the situation where it contains only one. This means that every single declaration should be a BoundMultipleLocalDeclarations, and every single BoundLocalDeclaration is just an item in the list of BoundMultipleLocalDeclarations. This means we can move DeclaredType to the containing BoundMultipleLocalDeclarations, which will avoid it being duplicated in the tree.

If you agree this should be done, I will obviously do so in a seperate PR as part of a seperate issue.

Copy link
Contributor

@AlekseyTs AlekseyTs Feb 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means we can move DeclaredType to the containing BoundMultipleLocalDeclarations, which will avoid it being duplicated in the tree.

I think this would be fine. It looks like the BoundTypeExpression.InferredType can be moved to BoundLocalDeclaration.
Alternatively, we could make BoundLocalDeclaration.DeclaredType optional, and within BoundMultipleLocalDeclarations set it only for the first BoundLocalDeclaration. The BoundTypeExpression.InferredType should be moved in this case too. This approach would require less changes in the bound tree structure and you might find that attractive.

If you agree this should be done, I will obviously do so in a seperate PR as part of a seperate issue.

I think it would be fine to make this change in context of this PR because it is blocked by this issue. It would be fine to do this in a separate PR, but then we will have to wait for it to go through and then continue this PR on top of that work. #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried my suggested solution first. Whilst I preferred this from a purity perspective, and it also simplifies the binding code in a number of situations, it leads to issues in error recovery for cases like this:

public class C {
    public void M() {
        var a = 14, b = 78.3; //error CS0819: Implicitly-typed variables cannot have multiple declarators
        
        int c = b; //error CS0266: Cannot implicitly convert type 'double' to 'int'. An explicit conversion exists (are you missing a cast?)
        int d = a;
    }
}

https://sharplab.io/#v2:C4LglgNgPgAgzAAhgJgQYQQbwLACgEFKIwAsCAsgBQCUWehDCAbgIYBOCLCAvAgIwkANAgBGPBAHYAHADo4AbnqMCS5WAB2wBAGNxIxfmUINWgCbiWBhgF881oA=

Currently a is inferred as int, and b as double. Hence int c = b would give us an error, whilst int d = a would not;.

However this change would make that impossible, as type would now be per declaration statement, not per declaration.

So I'm going to try your approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However this change would make that impossible, as type would now be per declaration statement, not per declaration.

So I'm going to try your approach.

I am pretty sure that with either approach we could keep the same types (as the types before the change) for the locals, but I let you to decide what approach to use.


In reply to: 253298803 [](ancestors = 253298803)

{
void M1()
{
/*<bind>*/int[10] x;/*</bind>*/
Copy link
Contributor

@AlekseyTs AlekseyTs Jan 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 [](start = 22, length = 2)

It would be good to have tests that try to get IOperation nodes for the sizes directly. #Closed

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 28, 2019

    public override void VisitVariableDeclaration(IVariableDeclarationOperation operation)

I would expect to see some changes in TestOperationVisitor.VisitVariableDeclaration(IVariableDeclarationOperation operation). Make sure all compiler tests pass with IOperation hook enabled build -testDesktop -testIOperation. #Closed


Refers to: src/Test/Utilities/Portable/Compilation/OperationTreeVerifier.cs:474 in d2e7c06. [](commit_id = d2e7c06, deletion_comment = False)

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 28, 2019

It looks like we need some tests for GetTypeInfo, GetSymbolInfo and other SemanticModel APIs applicable to expressions. The tests should target the illegal dimension sizes #Closed

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 28, 2019

Done with review pass (iteration 4) #Closed

@AlekseyTs
Copy link
Contributor

AlekseyTs commented Jan 28, 2019

In an offline discussion with @gafter we thought it would be good to test scenarios where expression locals are declared within the invalid size expression (declaration pattern or out variable declarations can be used for that). Including cases when the locals are referenced later in otherwise legal expressions. #Closed

@YairHalberstadt
Copy link
Contributor Author

I have made the changes suggested in the review. I have also run -testIOperation, and have fixed all failing tests that looked like they were relevant (some appear to have been failing already, mostly to do with new language features.)

@YairHalberstadt
Copy link
Contributor Author

By the way, I am going away for a week and a half tomorrow, so I might be able to reply to messages here, but I wont be able to work on this.

[CompilerTrait(CompilerFeature.IOperation)]
[Fact]
public void IVariableDeclaration_InvalidIgnoredDimensions_TestSemanticModel()
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine this test shouldn't go here, but I wasn't sure where it should go, so I put it here for now.

@@ -1184,7 +1184,7 @@ public override BoundNode VisitLocalDeclaration(BoundLocalDeclaration node)
return null;
}

bool inferredType = node.DeclaredType.InferredType;
bool inferredType = node.DeclaredTypeOpt?.InferredType ?? false; // a declaration with an inferred type will always store its DeclaredType
Copy link
Contributor

@AlekseyTs AlekseyTs Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node.DeclaredTypeOpt?.InferredType ?? false; // a declaration with an inferred type will always store its DeclaredType [](start = 32, length = 118)

Is InferredType value the only reason for duplicating BoundTypeExpression for the var case? Please move the property to BoundLocalDeclaration and get rid of the duplication. #Closed

Copy link
Contributor Author

@YairHalberstadt YairHalberstadt Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different type may be inferred for each local declaration, so without duplication we will lose that information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different type may be inferred for each local declaration, so duplication will pose that information.

How and where does that help? We still have only one syntax node for the var keyword. I am not suggesting to make any changes to how we infer the type of the local and what type do we capture in the LocalSymbol.


In reply to: 255623589 [](ancestors = 255623589)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How and where does that help? We still have only one syntax node for the var keyword. I am not suggesting to make any changes to how we infer the type of the local and what type do we capture in the LocalSymbol.

In a var case, for every bound local we do this:

 if (isVar)
            {
                aliasOpt = null;
                bindType = true;

                initializerOpt = BindInferredVariableInitializer(diagnostics, value, valueKind, localSymbol.RefKind, declarator);

                // If we got a good result then swap the inferred type for the "var" 
                TypeSymbol initializerType = initializerOpt?.Type;
                if ((object)initializerType != null)
                {
                    declTypeOpt = TypeSymbolWithAnnotations.Create(initializerType);

We then use the declTypeOpt when creating the boundDeclType as follows:

                boundDeclType = new BoundTypeExpression(typeSyntax, aliasOpt, inferredType: isVar, dimensionsOpt: invalidDimensions.ToImmutableAndFree(), type: declTypeOpt.TypeSymbol);
            }

            return new BoundLocalDeclaration(associatedSyntaxNode, localSymbol, boundDeclType, initializerOpt, arguments, hasErrors);

So every boundNode will have a different type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So every boundNode will have a different type.

How and where does that help? I understand we were doing this before. The question is why do we have to keep doing this? We are changing things for other scenarios.


In reply to: 257821406 [](ancestors = 257821406)

@gafter gafter added this to the 16.1.P1 milestone Mar 14, 2019
case SyntaxKind.PredefinedType:
break;
default:
throw ExceptionUtilities.Unreachable;
Copy link
Contributor

@AlekseyTs AlekseyTs Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw ExceptionUtilities.Unreachable; [](start = 20, length = 37)

Since we are switching on the Kind, it might be better to UnexpectedValue```` helper instead of Unreachable```. #Closed

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (iteration 19)

@AlekseyTs
Copy link
Contributor

@YairHalberstadt Could you please resolve the merge conflicts? Also, would it be fine if we squash commits into one in the process of merging this PR?

@YairHalberstadt
Copy link
Contributor Author

Will do, and that will be fine.

@AlekseyTs
Copy link
Contributor

@YairHalberstadt It looks like DeeplyNestedGeneric unit-test (https://github.com/dotnet/roslyn/blob/master/src/Compilers/CSharp/Test/Emit/Emit/EndToEndTests.cs#L103) fails on Windows_Desktop_Unit_Tests release_64. Does it fail for you locally?

@YairHalberstadt
Copy link
Contributor Author

I've investigated the issue properly now, and can confirm, that whilst previously it was safe to visit a generic to depth of 730, as a result of VisitRankSpecifiers, it is now only safe to visit a generic to a depth of 680.

More worryingly the stack guard no longer works.

I will make two commits: The first will fix the test, the second will add a stack guard to VisitRankSpecifiers.

If you don't think a stack guard is necessary, I will revert the second commit.

@YairHalberstadt
Copy link
Contributor Author

I think a better idea, is that since VisitRankSpecifiers is only necessary to provide better tooling and information, to simply not visit beyond a depth of 20 at all.
Does that make sense?

… unit test.

EndToEndTests.DeeplyNestedGeneric is now back to testing a generic to a depth of 730.
@YairHalberstadt
Copy link
Contributor Author

I've made my suggested change of only visiting rank specifiers to a depth of 20.
An alternative would be to make VisitRankSpecifiers mostly tail recursive as follows:

        internal static void VisitRankSpecifiers<TArg>(this TypeSyntax type, Action<ArrayRankSpecifierSyntax, TArg> action, TArg argument)
        {
start:
            switch (type.Kind())
            {
                case SyntaxKind.ArrayType:
                    var arrayTypeSyntax = (ArrayTypeSyntax)type;
                    arrayTypeSyntax.ElementType.VisitRankSpecifiers(action, argument);
                    foreach (var rankSpecifier in arrayTypeSyntax.RankSpecifiers)
                    {
                        action(rankSpecifier, argument);
                    }
                    break;
                case SyntaxKind.NullableType:
                    var nullableTypeSyntax = (NullableTypeSyntax)type;
                    type = nullableTypeSyntax.ElementType;
                    goto start;
                case SyntaxKind.PointerType:
                    var pointerTypeSyntax = (PointerTypeSyntax)type;
                    type = pointerTypeSyntax.ElementType;
                    goto start;
                case SyntaxKind.TupleType:
                    var tupleTypeSyntax = (TupleTypeSyntax)type;
                    var elementsCount = tupleTypeSyntax.Elements.Count;
                    if (elementsCount == 0)
                        break;

                    for (int index = 0; index < elementsCount - 1; index++)
                    {
                        var element = tupleTypeSyntax.Elements[index];
                        element.Type.VisitRankSpecifiers(action, argument);
                    }

                    type = tupleTypeSyntax.Elements[elementsCount - 1].Type;
                    goto start;
                case SyntaxKind.RefType:
                    var refTypeSyntax = (RefTypeSyntax)type;
                    type = refTypeSyntax.Type;
                    goto start;
                case SyntaxKind.GenericName:
                    var genericNameSyntax = (GenericNameSyntax)type;
                    var argsCount = genericNameSyntax.TypeArgumentList.Arguments.Count;
                    Debug.Assert(argsCount != 0);

                    for (int index = 0; index < argsCount - 1; index++)
                    {
                        var typeArgument = genericNameSyntax.TypeArgumentList.Arguments[index];
                        typeArgument.VisitRankSpecifiers(action, argument);
                    }

                    type = genericNameSyntax.TypeArgumentList.Arguments[argsCount - 1];
                    goto start;
                case SyntaxKind.QualifiedName:
                    var qualifiedNameSyntax = (QualifiedNameSyntax)type;
                    qualifiedNameSyntax.Left.VisitRankSpecifiers(action, argument);
                    type = qualifiedNameSyntax.Right;
                    goto start;
                case SyntaxKind.AliasQualifiedName:
                    var aliasQualifiedNameSyntax = (AliasQualifiedNameSyntax)type;
                    type = aliasQualifiedNameSyntax.Name;
                    goto start;
                case SyntaxKind.IdentifierName:
                case SyntaxKind.OmittedTypeArgument:
                case SyntaxKind.PredefinedType:
                    break;
                default:
                    throw ExceptionUtilities.UnexpectedValue(type.Kind());
            }

Since 70% of generic types in CoreFx have only one type argument, this would significantly reduce the risk of a stack overflow.

{
// Currently we only use this method to provide a better tooling experience when compiling incorrect syntax.
// Since this is not a must-have, we don't risk causing a stack overflow exception, so don't visit to a depth greater than 20.
const int MAX_RECURSION_DEPTH = 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const int MAX_RECURSION_DEPTH = 20; [](start = 16, length = 35)

I do not think we should use a hard-coded limit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure it is always called with the same limit, or some locals will not be bound properly. What is the best way to do that without hard coding the limit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make sure it is always called with the same limit, or some locals will not be bound properly. What is the best way to do that without hard coding the limit?

No limit based on depth.


In reply to: 266654989 [](ancestors = 266654989)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying I should revert this commit, and use a stack guard instead?

Copy link
Contributor

@AlekseyTs AlekseyTs Mar 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying I should revert this commit, and use a stack guard instead?

Perhaps we should try getting rid of the recursion and using an array builder to manually keep track of the stack of TypeSyntax nodes to look at. Another thing that might help reducing the stack frame size for this method is to avoid using locals in the switch cases by inlining expressions where it makes sense and extracting body of the case into a separate function in other cases.


In reply to: 266656829 [](ancestors = 266656829)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it might be worth passing the argument to the delegate by reference since we are using a tuple for that.


In reply to: 266658685 [](ancestors = 266658685,266656829)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I showed above how we could use jumps to significantly reduce the depth of recursion by removing all the tail recursive calls.

That combined with inlining expressions and passing the tuple by ref would probably avoid issues in all but the most pathological of cases.

If that is not enough on its own, then we could use an array builder to store the type syntaxes as suggested, but that would risk allocations on the common code path for the sake of a very rare code case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the argument by ref increases the depth of recursion by 8.
Inlining and extracting local functions made no difference.
Replacing recursion with jumps avoids the problem entirely in the test case, as no stack overflow occurs inside VisitRankSpecifiers.

A pathological case would look like this:

(((((((..., int), int), int), int), int), int), int) x = default;

//or

ValueTuple<ValueTuple<ValueTuple<..., int>,int>,int> x = default;

However testing them, a stack overflow occurred in a completely unrelated bit of code long before VisitRankSpecifiers caused an issue, so I will not add a stack guard for now unless you think it's necessary.

{
VisitRankSpecifiersInternal(type, action, argument, 0);

void VisitRankSpecifiersInternal(TypeSyntax typeInternal, Action<ArrayRankSpecifierSyntax, TArg> actionInternal, TArg argumentInternal, int recursionDepth)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VisitRankSpecifiersInternal [](start = 17, length = 27)

Our convention is to start names of local functions with a small letter.

{
VisitRankSpecifiersInternal(type, action, argument, 0);

void VisitRankSpecifiersInternal(TypeSyntax typeInternal, Action<ArrayRankSpecifierSyntax, TArg> actionInternal, TArg argumentInternal, int recursionDepth)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VisitRankSpecifiersInternal [](start = 17, length = 27)

Can this local function be marked as static?

case SyntaxKind.GenericName:
var genericNameSyntax = (GenericNameSyntax)type;
var argsCount = genericNameSyntax.TypeArgumentList.Arguments.Count;
Debug.Assert(argsCount != 0);
Copy link
Contributor

@AlekseyTs AlekseyTs Mar 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug.Assert(argsCount != 0); [](start = 20, length = 29)

Is it not possible to violate this assumption by building the node manually? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider replacing the assert with:

                    if (argsCount == 0)
                        break;

similar to the way TupleType is handled


In reply to: 266938083 [](ancestors = 266938083)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@AlekseyTs
Copy link
Contributor

Done with review pass (iteration 27)

Copy link
Contributor

@AlekseyTs AlekseyTs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (iteration 28)

@AlekseyTs
Copy link
Contributor

@YairHalberstadt Could you please resolve the merge conflict?

@YairHalberstadt
Copy link
Contributor Author

I have fixed the merge conflicts.

@AlekseyTs AlekseyTs merged commit 7faa12f into dotnet:master Mar 21, 2019
@AlekseyTs
Copy link
Contributor

@YairHalberstadt Thank you for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants