Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic Syntax is incorrectly highlighting arguments as commands #2860

Open
Jaykul opened this issue Aug 4, 2020 · 20 comments
Open

Semantic Syntax is incorrectly highlighting arguments as commands #2860

Jaykul opened this issue Aug 4, 2020 · 20 comments

Comments

@Jaykul
Copy link

Jaykul commented Aug 4, 2020

Any bare (unquoted) string argument that has a - in it is being highlighted as a command:

image

@ghost ghost added the Needs: Triage Maintainer attention needed! label Aug 4, 2020
@SydneyhSmith SydneyhSmith added Issue-Bug A bug to squash. Area-Semantic Highlighting and removed Needs: Triage Maintainer attention needed! labels Aug 4, 2020
@justinytchen
Copy link
Contributor

The problem is that there is no way to distinguish between a function name in a declaration and a parameter from just the token information alone. I think that we'll need to add other checks (i.e. position or looking into the AST)

image

image

@ghost ghost added the Needs: Maintainer Attention Maintainer attention needed! label Aug 5, 2020
@rjmholt
Copy link
Contributor

rjmholt commented Aug 5, 2020

I think the important part is that the Test token and the That-Thing token have different flags:

> $ast = [System.Management.Automation.Language.Parser]::ParseInput('get-banana this-thing', [ref]$tok
s, [ref]$errs)
C:\Users\Robert Holt\Documents\Dev\Microsoft\PowerShellEditorServices [async-ps-consumer +0 ~4 -0 !]
> $toks[0].TokenFlags
CommandName
C:\Users\Robert Holt\Documents\Dev\Microsoft\PowerShellEditorServices [async-ps-consumer +0 ~4 -0 !]
> $toks[1].TokenFlags
None

That's how PSReadLine is able to differentiate them I imagine

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented Aug 5, 2020

@rjmholt I think what they're getting at is that the current regex based highlighter treats the name in a function definition like a command (which doesn't have the flag).

Personally I'd rather have the name in a function definition be parsed as a bare word than the current situation (that's how both the ISE and PSRL work currently). But a fix for both might be just keeping track of if the previous token kind was Function. Then also use TokenFlags.CommandName.

@justinytchen
Copy link
Contributor

justinytchen commented Aug 5, 2020

There's no way to differentiate between the Get-AA token (should be labelled as a command/function) and the That-Thing token (which should be labelled as something else) with only the Token info. So without using additional data, we either have the situation we have now, or Get-AA would not be highlighted

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented Aug 5, 2020

@justinytchen You also could change this up a bit:

https://github.com/PowerShell/PowerShellEditorServices/blob/c40734354631043cf3c390d36c9b1ef6a80b3949/src/PowerShellEditorServices/Services/TextDocument/Handlers/PsesSemanticTokensHandler.cs#L51-L54

to be:

Token[] tokens = file.ScriptTokens;
for (int i = 0; i < tokens.Length; i++)
{
    Token token = tokens[i];
    PushToken(token, builder);
    if (token.Kind == TokenKind.Function)
    {
        // + bounds checks
        Token next = tokens[++i];
        PushTokenAsFunctionNameIfBareWord(next, builder);
    }
}

@rjmholt
Copy link
Contributor

rjmholt commented Aug 5, 2020

I think what they're getting at is that the current regex based highlighter treats the name in a function definition like a command (which doesn't have the flag).

Yes, I noticed that. But in @Jaykul's original issue description he seems to be referring to the CommandAst situation rather than the FunctionDefinitionAst situation.

The FunctionDefinitionAst scenario is easy to address: we look back a token and see if it's the function keyword. Given that the grammar defines the full nature of the context, rather than just the tokens, we can't do a simple token search for all tokens, but I believe we can in this case.

A more sophisticated approach is to track our position in both the AST and the token array simulataneously and for each token look into where we are in the AST to determine the token semantics.

I've written an example where we track through tokens as we move through an AST here.

Or the converse, where we find the AST element corresponding to our token:

    /// <summary>
    /// An AST visitor to find the smallest AST containing a given IScriptPosition.
    /// </summary>
    internal class FindAstFromPositionVisitor : AstVisitor2
    {
        private readonly IScriptPosition _position;


        private Ast _astAtPosition;


        /// <summary>
        /// Create a FindAstFromPositionVisitor around a given position,
        /// ready to find the smallest AST containing that position.
        /// </summary>
        /// <param name="position">The position to find the containing AST of.</param>
        public FindAstFromPositionVisitor(IScriptPosition position)
        {
            _position = position;
        }


        /// <summary>
        /// Retrieve the computed smallest containing AST after a visit has been performed.
        /// </summary>
        /// <returns>The smallest AST containing the originally given position in the visited AST.</returns>
        public Ast GetAst()
        {
            return _astAtPosition;
        }


        public override AstVisitAction VisitArrayExpression(ArrayExpressionAst arrayExpressionAst) => VisitAst(arrayExpressionAst);


        public override AstVisitAction VisitArrayLiteral(ArrayLiteralAst arrayLiteralAst) => VisitAst(arrayLiteralAst);


        public override AstVisitAction VisitAssignmentStatement(AssignmentStatementAst assignmentStatementAst) => VisitAst(assignmentStatementAst);


        public override AstVisitAction VisitAttribute(AttributeAst attributeAst) => VisitAst(attributeAst);


        public override AstVisitAction VisitAttributedExpression(AttributedExpressionAst attributedExpressionAst) => VisitAst(attributedExpressionAst);


        public override AstVisitAction VisitBaseCtorInvokeMemberExpression(BaseCtorInvokeMemberExpressionAst baseCtorInvokeMemberExpressionAst) => AstVisitAction.SkipChildren;


        public override AstVisitAction VisitBinaryExpression(BinaryExpressionAst binaryExpressionAst) => VisitAst(binaryExpressionAst);


        public override AstVisitAction VisitBlockStatement(BlockStatementAst blockStatementAst) => VisitAst(blockStatementAst);


        public override AstVisitAction VisitBreakStatement(BreakStatementAst breakStatementAst) => VisitAst(breakStatementAst);


        public override AstVisitAction VisitCatchClause(CatchClauseAst catchClauseAst) => VisitAst(catchClauseAst);


        public override AstVisitAction VisitCommand(CommandAst commandAst) => VisitAst(commandAst);


        public override AstVisitAction VisitCommandExpression(CommandExpressionAst commandExpressionAst) => VisitAst(commandExpressionAst);


        public override AstVisitAction VisitCommandParameter(CommandParameterAst commandParameterAst) => VisitAst(commandParameterAst);


        public override AstVisitAction VisitConfigurationDefinition(ConfigurationDefinitionAst configurationDefinitionAst) => VisitAst(configurationDefinitionAst);


        public override AstVisitAction VisitConstantExpression(ConstantExpressionAst constantExpressionAst) => VisitAst(constantExpressionAst);


        public override AstVisitAction VisitContinueStatement(ContinueStatementAst continueStatementAst) => VisitAst(continueStatementAst);


        public override AstVisitAction VisitConvertExpression(ConvertExpressionAst convertExpressionAst) => VisitAst(convertExpressionAst);


        public override AstVisitAction VisitDataStatement(DataStatementAst dataStatementAst) => VisitAst(dataStatementAst);


        public override AstVisitAction VisitDoUntilStatement(DoUntilStatementAst doUntilStatementAst) => VisitAst(doUntilStatementAst);


        public override AstVisitAction VisitDoWhileStatement(DoWhileStatementAst doWhileStatementAst) => VisitAst(doWhileStatementAst);


        public override AstVisitAction VisitDynamicKeywordStatement(DynamicKeywordStatementAst dynamicKeywordStatementAst) => VisitAst(dynamicKeywordStatementAst);


        public override AstVisitAction VisitErrorExpression(ErrorExpressionAst errorExpressionAst) => VisitAst(errorExpressionAst);


        public override AstVisitAction VisitErrorStatement(ErrorStatementAst errorStatementAst) => VisitAst(errorStatementAst);


        public override AstVisitAction VisitExitStatement(ExitStatementAst exitStatementAst) => VisitAst(exitStatementAst);


        public override AstVisitAction VisitExpandableStringExpression(ExpandableStringExpressionAst expandableStringExpressionAst) => VisitAst(expandableStringExpressionAst);


        public override AstVisitAction VisitFileRedirection(FileRedirectionAst redirectionAst) => VisitAst(redirectionAst);


        public override AstVisitAction VisitForEachStatement(ForEachStatementAst forEachStatementAst) => VisitAst(forEachStatementAst);


        public override AstVisitAction VisitForStatement(ForStatementAst forStatementAst) => VisitAst(forStatementAst);


        public override AstVisitAction VisitFunctionDefinition(FunctionDefinitionAst functionDefinitionAst) => VisitAst(functionDefinitionAst);


        public override AstVisitAction VisitFunctionMember(FunctionMemberAst functionMemberAst) => VisitAst(functionMemberAst);


        public override AstVisitAction VisitHashtable(HashtableAst hashtableAst) => VisitAst(hashtableAst);


        public override AstVisitAction VisitIfStatement(IfStatementAst ifStmtAst) => VisitAst(ifStmtAst);
    
        public override AstVisitAction VisitIndexExpression(IndexExpressionAst indexExpressionAst) => VisitAst(indexExpressionAst);


        public override AstVisitAction VisitInvokeMemberExpression(InvokeMemberExpressionAst methodCallAst) => VisitAst(methodCallAst);


        public override AstVisitAction VisitMemberExpression(MemberExpressionAst memberExpressionAst) => VisitAst(memberExpressionAst);


        public override AstVisitAction VisitMergingRedirection(MergingRedirectionAst redirectionAst) => VisitAst(redirectionAst);


        public override AstVisitAction VisitNamedAttributeArgument(NamedAttributeArgumentAst namedAttributeArgumentAst) => VisitAst(namedAttributeArgumentAst);


        public override AstVisitAction VisitNamedBlock(NamedBlockAst namedBlockAst) => VisitAst(namedBlockAst);


        public override AstVisitAction VisitParamBlock(ParamBlockAst paramBlockAst) => VisitAst(paramBlockAst);


        public override AstVisitAction VisitParameter(ParameterAst parameterAst) => VisitAst(parameterAst);


        public override AstVisitAction VisitParenExpression(ParenExpressionAst parenExpressionAst) => VisitAst(parenExpressionAst);


        public override AstVisitAction VisitPipeline(PipelineAst pipelineAst) => VisitAst(pipelineAst);


        public override AstVisitAction VisitPropertyMember(PropertyMemberAst propertyMemberAst) => VisitAst(propertyMemberAst);


        public override AstVisitAction VisitReturnStatement(ReturnStatementAst returnStatementAst) => VisitAst(returnStatementAst);


        public override AstVisitAction VisitScriptBlock(ScriptBlockAst scriptBlockAst) => VisitAst(scriptBlockAst);


        public override AstVisitAction VisitScriptBlockExpression(ScriptBlockExpressionAst scriptBlockExpressionAst) => VisitAst(scriptBlockExpressionAst);


        public override AstVisitAction VisitStatementBlock(StatementBlockAst statementBlockAst) => VisitAst(statementBlockAst);


        public override AstVisitAction VisitStringConstantExpression(StringConstantExpressionAst stringConstantExpressionAst) => VisitAst(stringConstantExpressionAst);


        public override AstVisitAction VisitSubExpression(SubExpressionAst subExpressionAst) => VisitAst(subExpressionAst);


        public override AstVisitAction VisitSwitchStatement(SwitchStatementAst switchStatementAst) => VisitAst(switchStatementAst);


        public override AstVisitAction VisitThrowStatement(ThrowStatementAst throwStatementAst) => VisitAst(throwStatementAst);


        public override AstVisitAction VisitTrap(TrapStatementAst trapStatementAst) => VisitAst(trapStatementAst);


        public override AstVisitAction VisitTryStatement(TryStatementAst tryStatementAst) => VisitAst(tryStatementAst);


        public override AstVisitAction VisitTypeConstraint(TypeConstraintAst typeConstraintAst) => VisitAst(typeConstraintAst);


        public override AstVisitAction VisitTypeDefinition(TypeDefinitionAst typeDefinitionAst) => VisitAst(typeDefinitionAst);


        public override AstVisitAction VisitTypeExpression(TypeExpressionAst typeExpressionAst) => VisitAst(typeExpressionAst);


        public override AstVisitAction VisitUnaryExpression(UnaryExpressionAst unaryExpressionAst) => VisitAst(unaryExpressionAst);


        public override AstVisitAction VisitUsingExpression(UsingExpressionAst usingExpressionAst) => VisitAst(usingExpressionAst);


        public override AstVisitAction VisitUsingStatement(UsingStatementAst usingStatementAst) => VisitAst(usingStatementAst);


        public override AstVisitAction VisitVariableExpression(VariableExpressionAst variableExpressionAst) => VisitAst(variableExpressionAst);


        public override AstVisitAction VisitWhileStatement(WhileStatementAst whileStatementAst) => VisitAst(whileStatementAst);


        private AstVisitAction VisitAst(Ast ast)
        {
            if (!AstContainsPosition(ast))
            {
                return AstVisitAction.SkipChildren;
            }


            _astAtPosition = ast;
            return AstVisitAction.Continue;
        }


        private bool AstContainsPosition(Ast ast)
        {
            return IsBefore(ast.Extent.StartScriptPosition, _position)
                && IsBefore(_position, ast.Extent.EndScriptPosition);
        }


        private static bool IsBefore(IScriptPosition left, IScriptPosition right)
        {
            return left.LineNumber < right.LineNumber
                || (left.LineNumber == right.LineNumber && left.ColumnNumber <= right.ColumnNumber);
        }
    }

@SydneyhSmith SydneyhSmith added Needs: Maintainer Attention Maintainer attention needed! and removed Needs: Maintainer Attention Maintainer attention needed! labels Aug 6, 2020
@rjmholt
Copy link
Contributor

rjmholt commented Aug 6, 2020

Ok so the problem is actually that generic tokens with no flags are currently highlighted as functions.

Bareword arguments without -s are tokenised as identifiers rather than generic tokens, so they fall through to the grammar.

We can recognise a bareword identifier most correctly as an identifier or a generic token that is subordinate to a command token... Harder to do though.

We need to pick a new value based on this list. I would guess Member, Property or EnumMember would work.

@TylerLeonhardt
Copy link
Member

TylerLeonhardt commented Aug 6, 2020

Property is probably the least threatening because Member in the Monokai theme was lime green lol

@rjmholt
Copy link
Contributor

rjmholt commented Aug 6, 2020

because Member in the Monokai theme was lime green

Ugh, yeah, this is what I mean by themes requiring hacks

@Jaykul
Copy link
Author

Jaykul commented Aug 7, 2020

A "generic token" that's a parameter ... is actually a string, not a member.

@SeeminglyScience
Copy link
Collaborator

@Jaykul true but nothing currently highlights a bare word string literal command arg like every other string literal. member is probably the closest to what existing tools highlight it as.

@Jaykul
Copy link
Author

Jaykul commented Aug 7, 2020

Are you really arguing that since everyone else is wrong, this should be wrong too? It's a string. The fact that PowerShell allows you to leave off the quotes doesn't change the fact that it's a string. It should be highlighted as a string if it's Kind = "Identifier" or Kind = "Generic" with TokenFlags = "None" (and follows a command?).

It's most definitely not a member. Highlighting it as member is worse than nothing (although not worse than highlighting it as a command) in my opinion.

If you really decide you need to use member, please instead send a PR to the SemanticTokenTypes enum to add an a "unspecified" or "otherliteral" or something.

@SeeminglyScience
Copy link
Collaborator

Are you really arguing that since everyone else is wrong, this should be wrong too?

Nah, I'm saying people will be annoyed if it's highlighted like every other string literal, because that's what they're used to.

It's a string. The fact that PowerShell allows you to leave off the quotes doesn't change the fact that it's a string.

Yeah, no one here thinks it isn't a string. Doesn't change the fact that folks expect it to be highlighted differently 🤷

@rjmholt
Copy link
Contributor

rjmholt commented Aug 7, 2020

Yeah it's not about how PowerShell uses it, it's about how it's highlighted. Categorising it as a string for highlighting purposes will make 'input' and input the same colour, whereas in PSReadLine for example, they are not at all (here vs 'there'):

image

Are you really arguing that since everyone else is wrong, this should be wrong too? It's a string.

Please keep your tone civil. We can be more constructive without rhetoric like this.

@Jaykul
Copy link
Author

Jaykul commented Aug 8, 2020

I don't think there's that much backward compatibility to worry about. It's already drastically changing most of the highlighting... and not just breaking changes like using statements with paths, escaped characters, #requires, etc.

And PSReadline is wrong about that too. Both your here and 'there' are strings. Why would you want to make them seem different? No offense intended. I mean, I did most of the regex parser, I'm not trying to blame anyone. I'm just pointing out that it's literally a string (and now, we know that) but it's not highlighted as one just because ... we never bothered to get it right.

@SeeminglyScience
Copy link
Collaborator

SeeminglyScience commented Aug 8, 2020

@Jaykul that's all fair and I don't necessarily disagree. Tbh, I think it would bother people. I don't really have any specific logic to back that up with, but I think it would get a lot of complaints.

Personally I'd also rather stick closer to what PSRL does, even if it isn't correct. PSRL highlighting barewords differently changed how they feel vs quoted strings imo, for better or worse.

@rjmholt
Copy link
Contributor

rjmholt commented Aug 10, 2020

Both your here and 'there' are strings. Why would you want to make them seem different?

Yeah, I definitely understand your reasoning here.

We're in a strange set because bareword strings are a rare occurrence among languages, so there's not good precedent or mechanisms built in (in an environment mainly designed to cater to C# and TypeScript) for them.

I don't think there's that much backward compatibility to worry about

Agreed. Certainly as a visual feature, I don't personally see any concerns about "breaking changes" so to speak.

I think of highlighting as a visual representation of tokeniser mode. Even though bareword strings and quoted strings represent the same type of value, when I'm typing them or looking at my program, the thing I care about is what delimits them. A quote is easy to miss, but the specific colour of a quoted string tells me that I'm in a special mode where the rules are different. That's not true of bareword strings, because the tokeniser will find a lot more reasons to end the string token.

Certainly I think that if we made a change to classify quoted strings and bareword strings the same way, the majority of our users would have complaints due to struggling to differentiate the different types of strings in their scripts.

In an ideal world, we would have a BareWordString token type that a VSCode theme could pick up on so that anyone could choose to make them the same or different colours. But in the real world we need a fallback that will cater to common themes out there that are mainly built for programming languages with tame syntaxes. I think that's the essence of my proposal in #2852 (comment).

(The unfortunate side-effect of the success of the C-syntax-family languages is that much tooling forgets just how different some languages are)

@Jaykul
Copy link
Author

Jaykul commented Aug 10, 2020

While you're convincing me this whole product is not for me, I disagree fundamentally with the base assumption that your users want what they already have.

Here's PSReadLine, Semantic, and EditorSyntax:

image

PSReadLine is better than the others, and (when it's not coloring strings as commands) sematic is better than editor syntax, but in all three the (lack of) highlighting of bare strings is incorrect and does not help authors understand their code.

I have no problem with the idea of having a language-specific token (or, in this case, a shell-language-specific token) and a fallback token type for themes that don't understand the type of language that PowerShell is, but if you want to do that, you also need to provide a first-class theme that you can point to as an example (and in this day and age, you need a dark version of it too). However, that's not what's happening here. And in any case, the ISE theme doesn't provide better information. If anything, it's worse in this case, because keywords are virtually indistinquishable from bareword strings.

If you wanted to have different colors for bareword, single quoted, and double quoted strings, I would would be down with that. But that falls down if Single and Double quoted strings are one kind of string, and bare strings are the only one that's totally different...

@TylerLeonhardt
Copy link
Member

So I think we need to take this in two stages.

  1. The first stage is parity: being as close to PSReadLine as possible. This will allow both the console and editing experience to be on the same playing field. (this was actually the original goal of doing this work from the beginning)

  2. The second stage is improving: Once both experiences are roughly aligned, then we can go back and make the changes in both to support the language better.

@SydneyhSmith SydneyhSmith removed the Needs: Maintainer Attention Maintainer attention needed! label Aug 13, 2020
@jessiewestlake
Copy link

To make matters worse... You also have the same problem when the parameter argument is not a string, but a negative number (e.g. -1 or -3.14).

Invoke-Calculation -Operator Subtract -Minuend -3 -Subtrahend 2

@ghost ghost added the Needs: Maintainer Attention Maintainer attention needed! label Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants