-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formatting of grammar.js
is unusual
#39
Comments
One thing to note, emacs is less available/portable than JavaScript formatters. |
@NoahTheDuke Thanks for taking a look and commenting. I think the target audience here is people who might work on the grammar. More generally, perhaps it is for entities that might modify it with the intention of having the results merged in. But may be there are cases I have overlooked. I usually use some sort of Linux box but I also end up having to work with Windows. The outlined approach works with those and I have a hard time imagining it's not going to work on macos or BSDs -- which historically have been able to run Emacs. Note that this approach doesn't require that anyone use Emacs as an editor interactively -- it's just used to execute a script. It is true however that the formatting that (setq js-indent-align-list-continuation t) has a long and not-well-documented past [1]. In that sense it may be fragile. OTOH, if that stops working in a future Emacs, perhaps we can point at the old version of the code. I'd like an approach that doesn't depend on an editor, so I'm happy to consider other formatting options. Probably I should also mention that I'm leaning toward removing the use of Do specific situations come to mind where you think the outlined approach might be an issue? [1] It may have come from code in |
Only that most Linux distributions and osx and windows don't come with emacs installed by default. I personally use neovim (and vim before that), so having to install emacs merely to format the code is a much bigger step than using the language I'm currently writing to format the code. Seems like a bad trade to go from relying on the defacto standard tooling (npm) to a specific editor. On the other hand, I'm not an active dev so maybe this doesn't matter. |
To do development for a tree-sitter grammar, in my experience, I've found that what I needed included at least:
These are not all available on any of the platforms I use out-of-the-box [1] -- I don't think there is a platform that has them all by default. So IIUC there is likely installation work involved no matter what your environment is. Windows is the platform I found to be the most work getting bits to work. I tend to use scoop there and it was a matter of Relatively speaking, installing Emacs is, in my experience, less problematic compared to some other pieces (e.g. Emscripten) and further, I have found it to be not much work. Installation isn't the only factor of course -- updating is also something one needs to consider. As you've probably noticed from the other issues, Emscripten is much more effort and cumbersome to get right for tree-sitter grammars. As I said earlier though, I'm not adverse to choosing another non-editor-dependent approach. I just haven't found a suitable one yet. The option of formatting things manually is also available -- it's what I did for most of the life of this repository. Having an automated way is nice to be able to use, but one does not have to use it all of the time. If someone is asked to change their formatting in a PR, then it seems a good option to provide a way to make that easier. Regarding:
AFAIU, Thanks for sharing your thoughts by the way. I think the resulting communication can help to explore and spell certain things out. [1] Note that getting an appropriate version is also relevant and one may need to install a different version of something that is already installed. |
I'd agree with Noah that using Emacs for formatting is a barrier for other contributors. The way I see this, we could keep using Emacs or switch to prettier, which is the de-facto standard js formatter. If we use Emacs, we don't need to require users to install it. We can apply the formatting ourselves after other people develop with normal tools. I don't mind doing that at all. |
I think there may be a misunderstanding here -- I don't think I ever stated that there is a requirement for installing Emacs. I'm sorry if I gave that impression, it was not intended. Also, I stated this:
but I'm not necessarily adverse to performing some reformatting manually nor do I think the stated style is particularly difficult to manage to learn how to do. I personally find reading the other grammars I've looked at quite taxing in spots. When I have to look at many rules I am not familiar with (which is likely to be the case if I come back to look here too), the existing styles really don't help. There is a bit of oddness to the proposed one here, but it's far easier for me to understand at a glance than the alternatives I have looked at. Having said all of this, I'm still fine to investigate looking for alternatives that might work better. I very much doubt that prettier will be a good choice though -- I say this based on having tried it out as well as based on what it claims:
It's fine to be opinionated but it's opining about a different use of JavaScript than how it's being used for tree-sitter grammars as a DSL. Having few options means it's not likely to be that tweakable IIUC. I don't think it's a good match. I think it's more likely that |
Some other formatting-related programs I've looked at include:
Looked through options and tried some things out but not much luck yet. I did get some descriptions that might be handy when searching:
|
I've collected links, vocabulary, samples, and other bits and placed them in a gist, as it's a bit much for here perhaps: https://gist.github.com/sogaiu/75411c556eba685ea4dfa6043970cfed |
I'm trying out It seems to be able to handle the concern mentioned above regarding nested calls. There are some other bits that I haven't tamed yet though. |
As part of the reformatting effort, I tried replacing instances of the There are some tweaks that are necessary to do this (e.g. const KEYWORD_HEAD =
RegExp('[^' +
'\\f\\n\\r\\t ' +
'(){}' +
'\\[\\]' + // double-backslashes for re escapes
'\\\\' + // double-backslashes for re escapes
'"' +
'~^;`,:/' +
'@' +
'\\u000B\\u001C\\u001D\\u001E\\u001F' +
'\\u2028\\u2029\\u1680' +
'\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009' +
'\\u200a\\u205f\\u3000' +
']'); Here's what we have on 262d6d6 for comparison: tree-sitter-clojure/grammar.js Lines 107 to 108 in 262d6d6
|
By adding the following definition in function regex(patt) {
return RegExp(patt);
} The sample above can be written: const KEYWORD_HEAD =
regex('[^' +
'\\f\\n\\r\\t ' +
'(){}' +
'\\[\\]' + // double-backslashes for re escapes
'\\\\' + // double-backslashes for re escapes
'"' +
'~^;`,:/' +
'@' +
'\\u000B\\u001C\\u001D\\u001E\\u001F' +
'\\u2028\\u2029\\u1680' +
'\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009' +
'\\u200a\\u205f\\u3000' +
']'); Here's what const STRING =
token(seq('"',
repeat(regex('[^"\\\\]')),
repeat(seq("\\",
regex('.'),
repeat(regex('[^"\\\\]')))),
'"')); |
May be I'll start a new issue with these regular expression bits. |
As a fairly speculative idea, I tried translating I don't know if this will get pursued much, but there are other folks who generate For this to be useful I imagine one would need at least:
I haven't looked into which part of the As a side note, using this approach (alone) would mean that Node.js is no longer necessary as that is only used by Being able to generate Anyway, below is what it could look like (though I did tweak the default formatting for vectors). (I made up the {:name "clojure"
;; a comment
:extras []
:conflicts []
:inline [:_kwd_leading_slash
:_kwd_just_slash
:_kwd_qualified
:_kwd_unqualified
:_kwd_marker
:_sym_qualified
:_sym_unqualified]
:_tokens
{:WHITESPACE_CHAR
[:regex "["
"\\f\\n\\r\\t, "
"\\u000B\\u001C\\u001D\\u001E\\u001F"
"\\u2028\\u2029\\u1680"
"\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009"
"\\u200a\\u205f\\u3000"
"]"]
:WHITESPACE [:token [:repeat1 :WHITESPACE_CHAR]]
:COMMENT [:token [:regex "(;|#!)"
".*"
"\\n?"]]
:DIGIT [:regex "[0-9]"]
:ALPHANUMERIC [:regex "[0-9a-zA-Z]"]
:HEX_DIGIT [:regex "[0-9a-fA-F]"]
:OCTAL_DIGIT [:regex "[0-7]"]
:HEX_NUMBER [:seq "0"
[:regex "[xX]"]
[:repeat1 :HEX_DIGIT]
[:optional "N"]]
:OCTAL_NUMBER [:seq "0"
[:repeat1 :OCTAL_DIGIT]
[:optional "N"]]
:RADIX_NUMBER [:seq [:repeat1 :DIGIT]
[:regex "[rR]"]
[:repeat1 :ALPHANUMERIC]]
:RATIO [:seq [:repeat1 :DIGIT]
"/"
[:repeat1 :DIGIT]]
:DOUBLE [:seq [:repeat1 :DIGIT]
[:optional [:seq "."
[:repeat :DIGIT]]]
[:optional [:seq [:regex "[eE]"]
[:optional [:regex "[+-]"]]
[:repeat1 :DIGIT]]]
[:optional "M"]]
:INTEGER [:seq [:repeat1 :DIGIT]
[:optional [:regex "[MN]"]]]
:NUMBER [:token [:prec 10
[:seq [:optional [:regex "[+-]"]]
[:choice :HEX_NUMBER
:OCTAL_NUMBER
:RADIX_NUMBER
:RATIO
:DOUBLE
:INTEGER]]]]
:NIL [:token "nil"]
:BOOLEAN [:token [:choice "false"
"true"]]
:KEYWORD_HEAD
[:regex "[^"
"\\f\\n\\r\\t "
"/"
"()"
"\\[\\]"
"{}"
"\""
"@~^;`"
"\\\\"
",:"
"\\u000B\\u001C\\u001D\\u001E\\u001F"
"\\u2028\\u2029\\u1680"
"\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008\\u2009"
"\\u200a\\u205f\\u3000"
"]"]
:KEYWORD_BODY [:choice [:regex "[:']"]
:KEYWORD_HEAD]
:KEYWORD_NAMESPACED_BODY
[:token [:repeat1 [:choice [:regex "[:'/]"]
:KEYWORD_HEAD]]]
:KEYWORD_NO_SIGIL
[:token [:seq :KEYWORD_HEAD
[:repeat :KEYWORD_BODY]]]
:KEYWORD_MARK [:token ":"]
:AUTO_RESOLVE_MARK [:token "::"]
:STRING
[:token [:seq "\""
[:repeat [:regex "[^"
"\""
"\\\\"
"]"]]
[:repeat [:seq "\\"
[:regex "."]
[:repeat [:regex "[^"
"\""
"\\\\"
"]"]]]]
"\""]]
:OCTAL_CHAR [:seq "o"
[:choice [:seq :DIGIT :DIGIT :DIGIT]
[:seq :DIGIT :DIGIT]
[:seq :DIGIT]]]
:NAMED_CHAR [:choice "backspace"
"formfeed"
"newline"
"return"
"space"
"tab"]
:UNICODE [:seq "u"
:HEX_DIGIT
:HEX_DIGIT
:HEX_DIGIT
:HEX_DIGIT]
:ANY_CHAR [:regex ".|\\n"]
:CHARACTER [:token [:seq "\\"
[:choice :OCTAL_CHAR
:NAMED_CHAR
:UNICODE
:ANY_CHAR]]]
:SYMBOL_HEAD [:regex "[^"
"\\f\\n\\r\\t "
"/"
"()"
"\\[\\]"
"{}"
"\""
"@~^;`"
"\\\\"
",:"
"#'"
"0-9"
"\\u000B\\u001C\\u001D\\u001E\\u001F"
"\\u2028\\u2029\\u1680"
"\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2008"
"\\u2009\\u200a\\u205f\\u3000"
"]"]
:NS_DELIMITER [:token "/"]
:SYMBOL_BODY [:choice :SYMBOL_HEAD
[:regex "[:#'0-9]"]]
:SYMBOL_NAMESPACED_NAME
[:token [:repeat1 [:choice :SYMBOL_HEAD
[:regex "[/:#'0-9]"]]]]
:SYMBOL
[:token [:seq :SYMBOL_HEAD
[:repeat :SYMBOL_BODY]]]
}
:rules
{:source [:repeat [:choice :_form
:_gap]]
:_gap [:choice :_ws
:comment
:dis_expr]
:_ws :WHITESPACE
:comment :COMMENT
:dis_expr [:seq [:field "marker" "#_"]
[:repeat :_gap]
[:field "value" :_form]]
:_form [:choice :num_lit ;; atom-ish
:kwd_lit
:str_lit
:char_lit
:nil_lit
:bool_lit
:sym_lit
;; basic collection-ish
:list_lit
:map_lit
:vec_lit
;; dispatch reader macros
:set_lit
:anon_fn_lit
:regex_lit
:read_cond_lit
:splicing_read_cond_lit
:ns_map_lit
:var_quoting_lit
:sym_val_lit
:evaling_lit
:tagged_or_ctor_lit
;; some other reader macros
:derefing_lit
:quoting_lit
:syn_quoting_lit
:unquote_splicing_lit
:unquoting_lit]
:num_lit :NUMBER
:kwd_lit [:choice :_kwd_leading_slash
:_kwd_just_slash
:_kwd_qualified
:_kwd_unqualified]
:_kwd_leading_slash [:seq [:field "marker" :_kwd_marker]
[:field "delimiter" :NS_DELIMITER]
[:field "name"
[:alias :KEYWORD_NAMESPACED_BODY
:kwd_name]]]
:_kwd_just_slash [:seq [:field "marker" :_kwd_marker]
[:field "name" [:alias :NS_DELIMITER :kwd_name]]]
:_kwd_qualified
[:prec 2
[:seq [:field "marker" :_kwd_marker]
[:field "namespace"
[:alias :KEYWORD_NO_SIGIL :kwd_ns]]
[:field "delimiter" :NS_DELIMITER]
[:field "name"
[:alias :KEYWORD_NAMESPACED_BODY :kwd_name]]]]
:_kwd_unqualified
[:prec 1
[:seq [:field "marker" :_kwd_marker]
[:field "name" [:alias :KEYWORD_NO_SIGIL :kwd_name]]]]
:_kwd_marker [:choice :KEYWORD_MARK
:AUTO_RESOLVE_MARK]
:str_lit :STRING
:char_lit :CHARACTER
:nil_lit :NIL
:bool_lit :BOOLEAN
:sym_lit [:seq [:repeat :_metadata_lit]
[:choice :_sym_qualified :_sym_unqualified]]
:_sym_qualified
[:prec 1 [:seq [:field "namespace" [:alias :SYMBOL :sym_ns]]
[:field "delimiter" :NS_DELIMITER]
[:field "name" [:alias :SYMBOL_NAMESPACED_NAME :sym_name]]]]
:_sym_unqualified
[:field "name"
[:alias [:choice :NS_DELIMITER
:SYMBOL]
:sym_name]]
:_metadata_lit
[:seq [:choice [:field "meta" :meta_lit]
[:field "old_meta" :old_meta_lit]]
[:optional [:repeat :_gap]]]
:meta_lit
[:seq [:field "marker" "^"]
[:repeat :_gap]
[:field "value" [:choice :read_cond_lit
:map_lit
:str_lit
:kwd_lit
:sym_lit]]]
:old_meta_lit
[:seq [:field "marker" "#^"]
[:repeat :_gap]
[:field "value" [:choice :read_cond_lit
:map_lit
:str_lit
:kwd_lit
:sym_lit]]]
:list_lit [:seq [:repeat :_metadata_lit]
:_bare_list_lit]
:_bare_list_lit [:seq [:field "open" "("]
[:repeat [:choice [:field "value" :_form]
:_gap]]
[:field "close" ")"]]
:map_lit [:seq [:repeat :_metadata_lit]
:_bare_map_lit]
:_bare_map_lit [:seq [:field "open" "{"]
[:repeat [:choice
[:field "value" :_form]
:_gap]]
[:field "close" "}"]]
:vec_lit [:seq [:repeat :_metadata_lit]
:_bare_vec_lit]
:_bare_vec_lit [:seq [:field "open" "["]
[:repeat [:choice [:field "value" :_form]
:_gap]]
[:field "close" "]"]]
:set_lit [:seq [:repeat :_metadata_lit]
:_bare_set_lit]
:_bare_set_lit [:seq [:field "marker" "#"]
[:field "open" "{"]
[:repeat [:choice [:field "value" :_form]
:_gap]]
[:field "close" "}"]]
:anon_fn_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "#"]
:_bare_list_lit]
:regex_lit [:seq [:field "marker" "#"]
:STRING]
:read_cond_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "#?"]
[:repeat :_ws]
:_bare_list_lit]
:splicing_read_cond_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "#?@"]
[:repeat :_ws]
:_bare_list_lit]
:auto_res_mark :AUTO_RESOLVE_MARK
:ns_map_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "#"]
[:field "prefix" [:choice :auto_res_mark
:kwd_lit]]
[:repeat :_gap]
:_bare_map_lit]
:var_quoting_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "#'"]
[:repeat :_gap]
[:field "value" :_form]]
:sym_val_lit [:seq [:field "marker" "##"]
[:repeat :_gap]
[:field "value" :sym_lit]]
:evaling_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "#="]
[:repeat :_gap]
[:field "value" [:choice :list_lit
:read_cond_lit
:sym_lit]]]
:tagged_or_ctor_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "#"]
[:repeat :_gap]
[:field "tag" :sym_lit]
[:repeat :_gap]
[:field "value" :_form]]
:derefing_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "@"]
[:repeat :_gap]
[:field "value" :_form]]
:quoting_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "'"]
[:repeat :_gap]
[:field "value" :_form]]
:syn_quoting_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "`"]
[:repeat :_gap]
[:field "value" :_form]]
:unquote_splicing_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "~@"]
[:repeat :_gap]
[:field "value" :_form]]
:unquoting_lit [:seq [:repeat :_metadata_lit]
[:field "marker" "~"]
[:repeat :_gap]
[:field "value" :_form]]
}} |
I love edn and do a similar edn-to-json translation for https://github.com/mtgred/netrunner. However, isn't this just trading one dev tool for another? Instead of node, they now need babashka or jet or what have you. Unless you write it to be usable with clojure alone, which is probably a safe bet to assume for someone working on a clojure grammar 😉 |
Thanks for that link -- I will check it out. Ah, would you mind providing a hint or two about where in the codebase:
might be? I did some searching for
Yes, I think there could be a trade occurring here -- though depending on how it's done, it might be that there are just more options (e.g. generating a readable Also, it seems to me that sometimes trades / swaps are worth it. It's not clear yet whether that would be the case here, but if there are issues with a setup that involved Clojure, edn, babashka, etc., what the chances of borkdude (or other Clojure folks) helping might be compared to chances that Node.js (or other non-Clojure-using) folks helping might be? I have no magical crystal ball but I have my experience to look back at :) Another consideration one might have is whether Clojure-using folks who might have some interest in contributing would prefer to work in Still investigating what the cost of this kind of approach could be though. |
Thinking about potential contributions, I would think that whoever wants to contribute, it will be easier to convince a clojure dev to work with javascript tooling than it will be to convince a js developer to work with edn/clj tooling. Many clojure devs probably are working in js ecosystem to some extent already. |
I certainly don't mind working with EDN. Instead of writing a custom tool we could also consider compiling a grammar.cljs to grammar.js For example: (js/grammar (clj->js {:name "clojure" ...})) then use whatever the right cljs configuration magic is needed to set Edit: |
Once you're using shadow-cljs, you might as well not change to anything else, as shadow-cljs requires having node installed lol. After all of this back and forth, I think the |
I appreciate the work that thheller has done with shadow-cljs and I used it for a number of projects, but my recollection is that I needed to update frequently. Do you know if that has changed? |
I don't, I haven't used it much. |
Thanks -- may be I'll try to catch up on the state of things. |
It doesn't look that different to me when looking at: https://github.com/thheller/shadow-cljs/commits/master Search for "bump". |
Yes, and still easier would be to have mostly Clojure-ish tooling for Clojrue devs I would think :) When working with Clojure-ish hosted languages, I think it's not uncommon for devs to try to avoid doing things in the underlying language for the most part, but using what's beneath when necessary (go interop). I'm thinking of the Incidentally, IIUC, today marks the 3rd year since the first meaningful commit of this project (yay!). Looking back, I think the reality of the situation for a tree-sitter grammar is that there are bits that change / don't work which are outside of the grammar writer's / maintainer's control [1]. Worse, some of those bits change unexpectedly or in hard to predict ways and broken things don't necessarily get fixed for a long time. I think sometimes it's worth creating or adopting code to take over some of those bits to reduce churn / breakage (current and potential). If there is such code that we can create / control / maintain, perhaps we'd prefer using certain sorts of languages / tooling. [1] My sense is that a typical path for a grammar repository is to end up using:
Except for the last set of items, these all have had various problems (at least from my perspective). The first 3 I think most of us have little chance of influencing. The |
Ok, I managed to make something [1] that takes a The The generated The main thing I didn't account for initially was that order within I didn't write it in Clojure, but I think it might not be too bad. [1] The overall flow looks like this. |
I'm considering an approach where it's possible to add developer-related items (e.g. using babashka tasks) to the repository and leave (some of?) the existing bits in place perhaps after some "cleaning". The idea is mostly leaning in the direction of "co-existence" of methods rather than outright immediate supplantation. This will allow experimentation, which, if it works out, may lead to retirement of some of the remaining older bits. I've been working on and trying something similar in tree-sitter-janet-simple. There, I made the tooling in Janet and it's been working pretty nicely so far [1]. I think a similar thing may be doable with babashka and its task feature. I thought that the mentioned talk was pretty good at motivating why one might want to consider the tasks feature (starting around 2:21). I think if this approach works out, it might help with folks with a Clojure-leaning getting involved -- which I believe might help for the upkeep / maintenance of the project. Also, I think it's more fun and I would guess others might find that to be the case too. [1] Some things I've implemented include:
|
Perhaps we've settled on a decent compromise from the perspective of readability. At least I think it's much easier to understand than what I've seen elsewhere :) I'm going to close this for the moment. It can be reopened later if necessary. |
The formatting of
grammar.js
now has portions that look like this:tree-sitter-clojure/grammar.js
Lines 219 to 221 in 262d6d6
AFAIK, this arrangement is not typical. However, it is the result of trying to achieve:
grammar.js
, andWe tried some formatters including prettier and js-beautify, but failed to get them to handle expressing nested calls with indentation that was sufficiently readable to us.
As an example of something that we would prefer but failed to manage with other methods, consider:
tree-sitter-clojure/grammar.js
Lines 335 to 342 in 262d6d6
We didn't figure out how to get that sort of result from any of the formatters we tried, nor, it seems, did any of the other tree-sitter grammar repositories we checked.
Although JavaScript is being used to express things in
grammar.js
, perhaps the number of nested calls is somewhat unusual and thus (some?) existing formatters are not likely to have considered this kind of use (i.e. to express a grammar description).We did have some luck with Emacs'
js.el
using code like:This allowed us to arrange for nested functions calls to appear as we liked, but a side-effect was that it resulted in the unusual type of indentation demonstrated at the beginning of this post. While that is unfortunate, we think the alternative is less desirable.
We may eventually add code for formatting to the repository and a method to invoke it from the command line, perhaps a script like:
For invocation from
cmd.exe
, perhaps something like the following will do:The text was updated successfully, but these errors were encountered: