Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a LALR grammar for Rust with testing support #21452

Merged
merged 1 commit into from
Jan 25, 2015

Conversation

bleibig
Copy link
Contributor

@bleibig bleibig commented Jan 21, 2015

This adds a new lexer/parser combo for the entire Rust language can be generated with with flex and bison, taken from my project at https://github.com/bleibig/rust-grammar. There is also a testing script that runs the generated parser with all *.rs files in the repository (except for tests in compile-fail or ones that marked as "ignore-test" or "ignore-lexer-test"). If you have flex and bison installed, you can run these tests using the new "check-grammar" make target.

This does not depend on or interact with the existing testing code in the grammar, which only provides and tests a lexer specification.

OS X users should take note that the version of bison that comes with the Xcode toolchain (2.3) is too old to work with this grammar, they need to download and install version 3.0 or later.

The parser builds up an S-expression-based AST, which can be displayed by giving the "-v" argument to parser-lalr (normally it only gives output on error). It is only a rough approximation of what is parsed and doesn't capture every detail and nuance of the program.

Hopefully this should be sufficient for issue #2234, or at least a good starting point.

@rust-highfive
Copy link
Collaborator

r? @nikomatsakis

(rust_highfive has picked a reviewer for you, use r? to override)

@steveklabnik
Copy link
Member

😍

@emberian
Copy link
Member

+10000, so glad to finally see this work start to move in-tree.

@erickt
Copy link
Contributor

erickt commented Jan 21, 2015

Wow, this is awesome. Did any of these tools find any ambiguity in the grammar?

@hawkw
Copy link
Contributor

hawkw commented Jan 21, 2015

+1

@aturon
Copy link
Member

aturon commented Jan 21, 2015

Whoa...!

@zwarich
Copy link

zwarich commented Jan 21, 2015

Strictly speaking, this isn't a LALR grammar for Rust, because it relies on the use of a push_back function in semantic actions to push tokens back onto the token stream.

@brson
Copy link
Contributor

brson commented Jan 21, 2015

This looks pretty awesome. What are the prospects for testing that files the Rust parser rejects are also rejected by the grammar?

@steveklabnik steveklabnik mentioned this pull request Jan 21, 2015
@jbclements
Copy link
Contributor

I love where this is headed!

@bleibig
Copy link
Contributor Author

bleibig commented Jan 23, 2015

The grammar as it stands has a number of S/R conflicts, but they are all resolved through use of the precedence features in bison to (hopefully) match how the production rust parser works in these situations.

As far as testing goes, the testing script does not do negative tests for programs that should fail to parse, but that feature can easily be added to the script. We can do that using programs in the compile-fail directory, however not all files there fail to parse as they are meant to fail in a later stage of compilation. We can check whether it's supposed to parse first with rustc -Z parse-only, but it might be a better idea to split the compile-fail directory so that files that fail to parse are in a new "parse-fail" directory.

@nikomatsakis
Copy link
Contributor

@bors r+ f39297f

@bors
Copy link
Contributor

bors commented Jan 24, 2015

⌛ Testing commit f39297f with merge 4e4e8cf...

bors added a commit that referenced this pull request Jan 24, 2015
This adds a new lexer/parser combo for the entire Rust language can be generated with with flex and bison, taken from my project at https://github.com/bleibig/rust-grammar. There is also a testing script that runs the generated parser with all *.rs files in the repository (except for tests in compile-fail or ones that marked as "ignore-test" or "ignore-lexer-test"). If you have flex and bison installed, you can run these tests using the new "check-grammar" make target.

This does not depend on or interact with the existing testing code in the grammar, which only provides and tests a lexer specification.

OS X users should take note that the version of bison that comes with the Xcode toolchain (2.3) is too old to work with this grammar, they need to download and install version 3.0 or later.

The parser builds up an S-expression-based AST, which can be displayed by giving the "-v" argument to parser-lalr (normally it only gives output on error). It is only a rough approximation of what is parsed and doesn't capture every detail and nuance of the program.

Hopefully this should be sufficient for issue #2234, or at least a good starting point.
@bors bors merged commit f39297f into rust-lang:master Jan 25, 2015
@sanxiyn
Copy link
Member

sanxiyn commented Jan 27, 2015

Where should one send changes to grammar now? bleibig/rust-grammar or rust-lang/rust?

@steveklabnik
Copy link
Member

Rust-lang/rust

@keleshev
Copy link

This grammar is likely not LALR(1). When an ambiguity exists in an LALR or LR grammar, it could be resolved in two ways, either by:

  1. rewriting the grammar to resolve ambiguity, or by
  2. employing heuristics, for example, deciding whether to shift or reduce based on operator precedence.

In first case, the grammar is guaranteed to be (LA)LR, but in second—it might or might not. This Rust grammar is resolving ambiguities with precedence, so—it might not be (LA)LR.

Example

expr:
 | expr "+" expr
 | expr "*" expr
 | NUBMER

It is ambiguous because 1 + 2 * 3 could be parsed into either
(1 + 2) * 3 or 1 + (2 * 3), so it is not (LA)LR.

To resolve the ambiguity, you can rewrite this grammar as:

expr:
 | NUMBER 
 | sum

sum:
 | product "+" product

product:
 | expr "*" expr

Now, it is LALR(1) and it will parse the expression only as 1 + (2 * 3).

But, practically speaking, most (LA)LR parser generators allow you to resolve grammar ambiguities with precedence, so this is probably not a big deal.

@keleshev
Copy link

This grammar seems to define assignment and compound assignment operators as left-associative (which is corresponds to reference description), however this example confirms that = is right-associative:

fn main() {
  let mut u: ();
  let mut a: u8;

  u = (a = 2);  // right associativity
//(u = a) = 2;     left associativity, doesn't work
  u = a = 2;    // this works, so it must be right-associative

  print!("{} {}", a, u == ());
}

// prefix_exprs
%precedence RETURN

%left '=' SHLEQ SHREQ MINUSEQ ANDEQ OREQ PLUSEQ STAREQ SLASHEQ CARETEQ PERCENTEQ

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the place where erroneous left-associativity of assignment operators is defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.