Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: comment and discard are treated as whitespace in most macros #75

Merged
merged 12 commits into from
Aug 24, 2020

Conversation

carocad
Copy link
Owner

@carocad carocad commented Aug 12, 2020

fixes #74

@carocad carocad self-assigned this Aug 12, 2020
@carocad carocad marked this pull request as ready for review August 13, 2020 20:19
@sogaiu
Copy link

sogaiu commented Aug 14, 2020

On a related note for using ignore*, I think something can be between ^ (or #^) and what follows:

$ clj
Clojure 1.10.1
user=> (binding [*print-meta* true]
  (prn (read-string "^ #_ :a {:a true} [:a]")))
^{:a true} [:a]
nil

To spell things out, in this example, between ^ and {:a true}, there is #_ :a. (I also tested with ;; hi\n and just whitespace.)

So perhaps metadata_entry and deprecated_metadata_entry could use similar treatment as backtick, quote, etc.?

Although old, here is an example from the wild: https://github.com/technomancy/swank-clojure/blob/master/src/swank/util/class_browse.clj#L64

@sogaiu
Copy link

sogaiu commented Aug 14, 2020

If that path is taken, I guess that may have consequences for "reconstructing" source.

That is, if ast is given code that contains something like:

^ {:a true} [:a]

to reconstruct the source string using code, the whitespace between the ^ and {:a true} needs to be accounted for appropriately.

May be something like this:

    :metadata_entry
    (do (. string-builder (append "^"))
        (doseq [child (rest ast)] (code* child string-builder)))

    :deprecated_metadata_entry
    (do (. string-builder (append "#^"))
        (doseq [child (rest ast)] (code* child string-builder)))

@carocad
Copy link
Owner Author

carocad commented Aug 14, 2020

@sogaiu thanks a lot for the input. Totally didn't consider the case of # ^ as it seems so crazy 🤯. I will modify this PR accordingly

@sogaiu
Copy link

sogaiu commented Aug 16, 2020

I did a bit of testing and I think symbolic might benefit from having ignore* as well:

$ clj
Clojure 1.10.1
user=> ## Inf
##Inf
user=> ## #_ :a Inf
##Inf
user=> ## ;; oh my
NaN
##NaN
user=> 

https://github.com/carocad/parcera/blob/master/src/Clojure.g4#L134

@carocad
Copy link
Owner Author

carocad commented Aug 17, 2020

I would like to understand the reasoning for moving the ignore* from the end of metadata to the end of metadata_entry and deprecated_metadata_entry. If you don't mind, please share your thoughts on this matter.

In general, I was trying to see which patterns would work best for parsing metadata. Since (by now) it is quite a complex rule I wanted to avoid making it too complex. On second thought it probably would be better to stick with the way it was before just to avoid a more complex metadata-entry.

Also on a tangential note, I noticed the bump in version number. If I find any other things would you prefer to hear about them sooner or after another release?

By all means, feel free to report issues as you see them. In this case, all these issues were related to each other so I want to fix them together. If you find an issue that is not related to this discussion then I would move it to another PR/release 😉

@sogaiu
Copy link

sogaiu commented Aug 17, 2020

Thank you.

I very much appreciate the explanation.

Regarding other (I think mostly unrelated) things, I will create new issues.

@sogaiu
Copy link

sogaiu commented Aug 18, 2020

I have done a bit more testing and it looks to me at the moment like reader conditionals and the splicing reader conditionals don't allow any discards or comments to be between the initial #? / #?@ and the corresponding list (at least at the repl):

;; conditional
#?(:clj 1 :cljr 2)
; => 1

#? #_ 1 (:clj 1 :cljr 2)
Syntax error reading source at (REPL:10:5).
read-cond body must be a list
Syntax error compiling at (REPL:0:0).
Unable to resolve symbol: _ in this context
1
Execution error (IllegalArgumentException) at user/eval7 (REPL:1).
Wrong number of args passed to keyword: :clj

#? (:clj 1 :cljr 2)
; => 1

;; conditional_splicing
[#?@(:clj [:a] :cljr [:b])]
; => [:a]

[#?@ #_ 1 (:clj [:a] :cljr [:b])]
Syntax error reading source at (REPL:14:7).
read-cond body must be a list
Syntax error compiling at (REPL:0:0).
Unable to resolve symbol: _ in this context
1
Execution error (IllegalArgumentException) at user/eval11 (REPL:1).
Wrong number of args passed to keyword: :clj
Syntax error reading source at (REPL:14:34).
Unmatched delimiter: ]

[#?@ (:clj [:a] :cljr [:b])]
; => [:a]

So it looks like #_ doesn't work, but whitespace does. I believe ;-comments don't work either, but am less sure of my testing there. I tried putting the following in a file:

(print #? ;; hi
(:clj 1 :cljr 2))

and executing it via clj to get:

Syntax error reading source at (rc.cljc:1:12).
read-cond body must be a list

Is this what you see too?

FWIW, I verified separately for var quote, discard, tag and eval that they do seem to allow each of discard, whitespace, and comment:

;; var_quote
(def a 1)
; => #'user/a

#'a
; => #'user/a

#' #_ 1 a
; => #'/user/a

#' a
; => #'/user/a

#' ;;hi
a
; => #'user/a

;; discard
#_ 1 :a
; => :a

#_ #_ a 1 :a
; => :a

#_ ;;hi
1 :a
; => :a

;; tag
#uuid "00000000-0000-0000-0000-000000000000"
; => #uuid "00000000-0000-0000-0000-000000000000"

#uuid #_ 1 "00000000-0000-0000-0000-000000000000"
; => #uuid "00000000-0000-0000-0000-000000000000"

#uuid ;;hi
"00000000-0000-0000-0000-000000000000"
; => #uuid "00000000-0000-0000-0000-000000000000"

;; eval
#=(+ 1 1)
; => 2

#= #_ 1 (+ 1 1)
; => 2

#= ;;hi
(+ 1 1)
; => 2

#= (+ 1 1)
; => 2

However, since the grammar is more permissive in various ways anyway, may be there is nothing to be done?

@carocad
Copy link
Owner Author

carocad commented Aug 18, 2020

hey, @sogaiu thanks a lot for the extra testing. Totally didn't notice that case.

However, since the grammar is more permissive in various ways anyway, maybe there is nothing to be done?

In general, what I try to do in parcera is to keep it as close as possible to the "real grammar" so that it can be useful for non-Clojure developers (just use antlr4 and generate to your favorite language). So whenever possible I would stick to a more strict rule.

@sogaiu
Copy link

sogaiu commented Aug 18, 2020

Thank you for the explanation.

@carocad carocad merged commit 28c1810 into master Aug 24, 2020
@carocad carocad deleted the reader/ignore branch August 24, 2020 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comment and metadata issue
2 participants