No tests to express expected failure for templates and other things #42

sogaiu · 2023-02-10T12:33:29Z

There was a query about how expected failures might be tested for at the tree-sitter repository.

It got me thinking that we might be able to capture and express some of our intents about what we don't support.

Some possible examples of such things might include:

template files that use {{ and }} pairs
ClojureCLR's pipe-delimited symbols
~~ClojureDart's new(?) #/ construct~~ [1]

Additions based on further discussion:

zero byte (based on discussion starting here) - found an additional source file that seems to use a zero byte -- 3rd byte

[1] I think we now know what construct this is and ATM it seems reasonable to try to support it.

The text was updated successfully, but these errors were encountered:

sogaiu · 2023-02-11T12:09:20Z

Below is an example corpus test for {{ }} templates:

================================================================================
Templates with `{{` `}}` Should Fail
================================================================================

(defproject codebrutale/{{name}} "0.1.0-SNAPSHOT"
  :description "FIXME"
  :url "https://github.com/codebrutale/{{name}}"
  :license {:name "Eclipse Public License"
            :url "https://www.eclipse.org/legal/epl-v10.html"}
  :dependencies [[org.clojure/clojure "1.9.0"]])

--------------------------------------------------------------------------------

(source
  (list_lit
    (sym_lit
      (sym_name))
    (sym_lit
      (sym_name))
    (ERROR)
    (map_lit
      (map_lit
        (sym_lit
          (sym_name))))
    (str_lit)
    (kwd_lit
      (kwd_name))
    (str_lit)
    (kwd_lit
      (kwd_name))
    (str_lit)
    (kwd_lit
      (kwd_name))
    (map_lit
      (kwd_lit
        (kwd_name))
      (str_lit)
      (kwd_lit
        (kwd_name))
      (str_lit))
    (kwd_lit
      (kwd_name))
    (vec_lit
      (vec_lit
        (sym_lit
          (sym_ns)
          (sym_name))
        (str_lit)))))

sogaiu · 2023-02-11T15:14:28Z

Note the following quote by the tree-sitter creator:

Yeah, there is intentionally not much that you as a grammar author can do to manipulate the error recovery.

Also, looking at the first comment in this, it seems possible that tweaks to one's grammar might change whether and where there might be ERROR and/or MISSING nodes.

If that is the case, then having too many corpus tests that indicate precise failure patterns might not be so great from a maintenance perspective...

dannyfreeman · 2023-02-13T18:44:49Z

I don't think we should test for expected failures, I think we should just let them be failures. Like you point out, we can't really control how things fail. Changes to tree sitter versions and the grammar may change what the failures look like. To me, the important cases to test are what we expect tree sitter to parse correctly. That's all consumers of the parser care about, is that valid clojure produces a usable parse tree. Errors are expected to occur while the library is in use, and those errors are completely ignored most of the time.

sogaiu · 2023-02-14T00:29:21Z

That sounds mostly good.

I want to document that we don't necessarily support some things though, like the items mentioned earlier.

If the testing could have been reliable, that would have been a bit more convenient.

dannyfreeman · 2023-02-14T17:29:13Z

There is always the option of some more hand crafted tests. Like a script that parses a file with the grammar, and we assert that it returns a non-zero exit code, but also that it doesn't crash the parsers (like the null byte did). There is no reason it needs to go through the corpus testing.

sogaiu · 2023-02-14T23:55:51Z

Good point.

So perhaps the zero byte [1] thing is something we should have a test for too.

Regarding the ~~nul~~ zero byte thing, I found this discussion:

(I'm posting multiple links though it's the same discussion because GitHub discussions tends to hide things -- not a helpful UI / UX sometimes.)

I'm not sure whether we're supposed to be able to handle them...

[1] At this point, from the perspective of widespread use, I'm not sure whether "nul" or "null" is an appropriate term. I started using "zero byte" at some point to avoid the whole topic, but apparently not as far back as the referenced discussion :) See here for some more on this if interested.

dannyfreeman · 2023-02-15T18:17:05Z

I suspect NUL byte came from the days when c-niles did everything in their power to eliminate redundant characters in all their identifiers. I'll try to stick with zero byte from here on out.

Regardless of that one scenario, some simple tests like this might be worth throwing into a script.

sogaiu · 2023-02-15T21:47:54Z

Ok, I'll update the initial post to include mention of zero bytes so there's a place we can look that fits on most normal screens :)

sogaiu · 2023-02-27T14:00:40Z

I added some preliminary expected failure testing in 51ca143 (on the dev branch).

There isn't anything yet for ClojureCLR's pipe-delimited symbols, but I did add something for a template file that leads to a parse error and something for a file with a zero byte in it.

The tests can be invoked by bb expect-failures.

Not crazy about the output but at least the testing occurs.

On a related note, I experimented with using TAP in tree-sitter-janet-simple as mentioned at the bottom of #39 (comment) (see the link named "described here" if interested) [1].

Possibly it would be interesting to consider using TAP here too.

[1] I'm avoiding putting the link here because then this issue will show up at the tree-sitter repository unnecessarily :P

sogaiu · 2023-02-27T14:08:54Z

For the ClojureCLR pipe-delimited symbol testing, it's not necessarily that we get an error parsing, it's that the parsed result may be quite different from correct.

Here is a comment that has an example along with the resulting parse output.

Testing for that would be different from the other two things where there was checking for an exit code of 1 (possibly parsing the output to look for ERROR would be good too, but that's not there ATM).

dannyfreeman · 2023-02-28T17:39:14Z

The failure tests work great so far.

For the CLR test case, it's not a failure, but a strange parse tree that is technically does not produce errors. I can pass this into the JVM clojure repl

'(|System.Collections.Generic.IList`1[System.Int32]|)

And it gives me a valid result. I don't think it should produce an error then. Since that is how the reference implementation behaves, perhaps this should just be described in a corups test??

Edit: here is a PR against dev. Feel free to press merge if you like it #48

sogaiu · 2023-03-01T06:17:59Z

Thanks for trying things out.

I think that:

|System.Collections.Generic.IList`1[System.Int32]|

is supposed to be a single symbol.

Interacting with clojure via the command line I get:

$ rlwrap clojure
Clojure 1.11.1
user=> '|System.Collections.Generic.IList`1[System.Int32]|
|System.Collections.Generic.IList
1
Syntax error (ClassNotFoundException) compiling at (REPL:0:0).
System.Int32
Syntax error compiling at (REPL:0:0).
Unable to resolve symbol: | in this context
user=>

I think what's happening here is that:

|System.Collections.Generic.IList is recognized as a symbol [1]
1 (actually backtick followed by 1) is recognized as a number
[System.Int32] is recognized as a vector with a single symbol in it
The last | is being interpreted as a symbol

The ClojureCLR docs here have this bit:

Vertical bars are used in pairs to surround the name (or part of the name) of a symbol that has any special characters (from the standard Clojure symbol viewpoint) in it. For example, |A(B)|, A|(|B|)|, and A|(B)| all mean the symbol whose name consists of the four characters A, (, B, and ).

So the parsing doesn't seem correct to me.

I suspect if this is to be handled that doing it via an external scanner might be worth considering. I'm not yet convinced effort should be spent on it at this stage though.

[1] I tried the following too:

$ rlwrap clojure
Clojure 1.11.1
user=> (def |System.Collections.Generic.IList 1)
#'user/|System.Collections.Generic.IList

so that seems to be evidence in favor of |System.Collections.Generic.IList being interpreted as a symbol.

dannyfreeman · 2023-03-01T14:03:08Z

In JVM clojure it is definitely more than one symbol, and it is possible to parse those symbols without errors (executing is another matter). It seems that in clojure CLR it is a single symbol. We can't support both because while scanning we don't know what platform the code is intended to run on. Maybe it's CLR, maybe JVM, maybe it's both, but we don't have that info while parsing.

I think in this case we support the more common platform. Errors are not produced when parsing, so the only downside I can see is for CLR clojure they might get some funny syntax highlighting on these types of symbols. Since the CLR implementation has a different reader, then I think this would be the case for a special grammar that inherits this grammar as a parent and adds custom rules for these symbols. I doubt anyone will think this is enough of a problem to actually write the new grammar. It would be a lot of hassle to use just for the CLR. Based on how clojure CLR is used, it would be difficult to know when to use a CLR grammar since a lot of the code uses the normal .clj extension.

NoahTheDuke · 2023-03-01T14:18:20Z

@dmiller might have thoughts here. I know he's currently in the process of rewriting the ClojureCLR in F#, maybe he'd be willing to shed some insights on this stuff for us. If the cljr community can move to relying on a different filetype, we could help fork this for them. Otherwise, I agree we're at an impasse.

dannyfreeman · 2023-03-01T14:23:48Z

Yeah, it's would be an involved process. Not only would we need a sub-grammar based on this one, editors would also need some support to use it. It would result in a special tree-sitter-clojure-clr.so dynamic library after all. In Emacs I believe that would be at least a derived major mode.

Unrelated: It is very cool to see ClojureCLR getting rewritten in F#.

dmiller · 2023-03-01T14:59:00Z

I'm not sure of all the questions being asked, but here goes from what I can figure out.
The |...| syntax I would love to see going away in the rewrite, but unfortunately I may need to keep it for backward compatibility.
If tagged literals had been available at the time, I would have used them instead of | ... |, but so it goes.

Regarding |System.Collections.Generic.IList`1[System.Int32]| -- this unfortunately is the actual CLR name for the type.
I didn't know what kind of sweetened syntax to put in for typenames, a la C# or F# -- no experience in the real world at that point -- so i went with the official CLR syntax. Given all the strange characters, I had to introduce | ... | quoting for symbol names. Stolen with some modification from Common Lisp.

How JVM handles it -- well, you can see it is weird.

Does this help?

dannyfreeman · 2023-03-01T17:44:28Z

_It is good to know that this syntax is probably not going away.

One question that I think would be good to know: will clojure CLR be moving to a new file extension .cljr? Right now it looks like a lot of it uses .clj, same as the JVM. Without a distinct file extension I don't think we will be able to provide a separate tree sitter grammar for clojure CLR that would handle these kinds of differences in the reader.

If the plan is to stick with .clj that is fine, the only consequence is that type names like |System.Collections.Generic.IList`1[System.Int32]| might have some strange syntax highlighting in certain editors.

dmiller · 2023-03-01T18:21:52Z

The plan is to move to a new extension .cljr.
The code to do that is in the latest alpha release of ClojureCLR (the current version, not the rewrite).
I will make that transition in all the libraries I maintain by the next release.
Though I'm sure folks will be happy enough to have any support and can live with some strange syntax highlighting.

sogaiu · 2023-03-07T10:28:36Z

In c86581f [1] I've added a babashka task to demonstrate an expected misparse.

bb show-expected-misparses should produce some output. Not that illustrative at the moment. Probbably it could use some work, but at least something is recorded in a way that can be checked programmatically.

[1] This is on a temporary branch where I'm working on various babashka tasks.

sogaiu · 2023-03-12T15:09:15Z

Most (if not all) of the testing for these bits will be living in a separate repository, so I'll close this issue for now.

Thanks to all the participants!

sogaiu changed the title ~~No tests to express expected failure for templates~~ No tests to express expected failure for templates and other things Feb 10, 2023

dannyfreeman mentioned this issue Feb 28, 2023

Add test for a .NET type that can be found in clojure CLR #48

Closed

sogaiu closed this as completed Mar 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No tests to express expected failure for templates and other things #42

No tests to express expected failure for templates and other things #42

sogaiu commented Feb 10, 2023 •

edited

Loading

sogaiu commented Feb 11, 2023

sogaiu commented Feb 11, 2023

dannyfreeman commented Feb 13, 2023

sogaiu commented Feb 14, 2023

dannyfreeman commented Feb 14, 2023

sogaiu commented Feb 14, 2023 •

edited

Loading

dannyfreeman commented Feb 15, 2023

sogaiu commented Feb 15, 2023

sogaiu commented Feb 27, 2023 •

edited

Loading

sogaiu commented Feb 27, 2023 •

edited

Loading

dannyfreeman commented Feb 28, 2023 •

edited

Loading

sogaiu commented Mar 1, 2023 •

edited

Loading

dannyfreeman commented Mar 1, 2023 •

edited

Loading

NoahTheDuke commented Mar 1, 2023

dannyfreeman commented Mar 1, 2023

dmiller commented Mar 1, 2023

dannyfreeman commented Mar 1, 2023 •

edited

Loading

dmiller commented Mar 1, 2023

sogaiu commented Mar 7, 2023

sogaiu commented Mar 12, 2023

No tests to express expected failure for templates and other things #42

No tests to express expected failure for templates and other things #42

Comments

sogaiu commented Feb 10, 2023 • edited Loading

sogaiu commented Feb 11, 2023

sogaiu commented Feb 11, 2023

dannyfreeman commented Feb 13, 2023

sogaiu commented Feb 14, 2023

dannyfreeman commented Feb 14, 2023

sogaiu commented Feb 14, 2023 • edited Loading

dannyfreeman commented Feb 15, 2023

sogaiu commented Feb 15, 2023

sogaiu commented Feb 27, 2023 • edited Loading

sogaiu commented Feb 27, 2023 • edited Loading

dannyfreeman commented Feb 28, 2023 • edited Loading

sogaiu commented Mar 1, 2023 • edited Loading

dannyfreeman commented Mar 1, 2023 • edited Loading

NoahTheDuke commented Mar 1, 2023

dannyfreeman commented Mar 1, 2023

dmiller commented Mar 1, 2023

dannyfreeman commented Mar 1, 2023 • edited Loading

dmiller commented Mar 1, 2023

sogaiu commented Mar 7, 2023

sogaiu commented Mar 12, 2023

sogaiu commented Feb 10, 2023 •

edited

Loading

sogaiu commented Feb 14, 2023 •

edited

Loading

sogaiu commented Feb 27, 2023 •

edited

Loading

sogaiu commented Feb 27, 2023 •

edited

Loading

dannyfreeman commented Feb 28, 2023 •

edited

Loading

sogaiu commented Mar 1, 2023 •

edited

Loading

dannyfreeman commented Mar 1, 2023 •

edited

Loading

dannyfreeman commented Mar 1, 2023 •

edited

Loading