-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StringLiteral
#4
Conversation
src/lex.ts
Outdated
@@ -40,6 +40,10 @@ export function lex(s: string): Lexer { | |||
text in keywords | |||
? keywords[text as keyof typeof keywords] | |||
: Token.Identifier; | |||
} else if (['"', "'"].includes(s.charAt(pos))) { | |||
scanForward((c) => /[_a-zA-Z0-9'"]/.test(c)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strings should be able to contain any character except the matching "
or '
. And you'll need to allow for escaping like with \'
as well.
src/lex.ts
Outdated
case CharCodes.doubleQuote: | ||
return '"'; | ||
default: | ||
return ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't remember what invalid escapes are supposed to do, but I think it's returning char
itself, or maybe slash+char.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, it returns the char
itself using the String.fromCharCode()
fn (ts sourcecode)
Just made the changes here imteekay/mini-typescript@bc76e30
@@ -66,6 +65,62 @@ export function lex(s: string): Lexer { | |||
function scanForward(pred: (x: string) => boolean) { | |||
while (pos < s.length && pred(s.charAt(pos))) pos++; | |||
} | |||
|
|||
function scanString() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feels like this code needs a lot more tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just made some adjustments to how I scan, parse, and emit escape characters. And separated the StringLiteral
node from the Literal
(I will rename the Literal
to NumericLiteral
and refactor the code in the future) (imteekay/mini-typescript@c5e8220).
Also added more tests as you recommended (imteekay/mini-typescript@ead71e2 and imteekay/mini-typescript@b04fc8a).
2a76e7e
to
ead71e2
Compare
@@ -0,0 +1 @@ | |||
"var singleQuote = 'singleQuote';\nvar doubleQuote = \"doubleQuote\";\nvar escapedSingleQuote = 'escapedSingle\\'Quote';\nvar escapedDoubleQuote = \"escapedDouble\\\"Quote\";\n(missing)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should remove the (missing)
after merging the EmptyStatement
PR.
{ | ||
"kind": "ExpressionStatement", | ||
"expr": { | ||
"kind": "Identifier", | ||
"text": "(missing)" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should replace this node with the EmptyStatement
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason that you have empty statements is that ;
is parsed as a separator, not a terminator, so a complete program looks like X;Y;Z
not X;Y;Z;
.
(separators have become less and less popular over the history of programming languages)
if (pos >= s.length) { | ||
// report unterminated string literal error | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure how to report this kind of error in the lexer scope 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll probably have to make the lexer able to report errors the same way the other phases do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to add a new exercise to add the support for the lexer to report errors (imteekay/mini-typescript@65a3270). I also saw an interesting tweet by Maria about reporting errors for string literals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. As you noted, you just need a way to report errors, and some refactoring of the literal types.
export type StringLiteral = Location & { | ||
kind: Node.StringLiteral; | ||
value: string; | ||
isSingleQuote: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a decent choice; you could also have the emitter always produce double quotes or single quotes (and, in theory, make that configurable).
{ | ||
"kind": "ExpressionStatement", | ||
"expr": { | ||
"kind": "Identifier", | ||
"text": "(missing)" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason that you have empty statements is that ;
is parsed as a separator, not a terminator, so a complete program looks like X;Y;Z
not X;Y;Z;
.
(separators have become less and less popular over the history of programming languages)
if (pos >= s.length) { | ||
// report unterminated string literal error | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you'll probably have to make the lexer able to report errors the same way the other phases do
@sandersn just decided to add a new exercise to refactor/rename from edit: just opened a new PR to handle that imteekay/mini-typescript#6 |
String
literalsResources
String
literals sandersn/mini-typescript#15