Skip to content

Commit

Permalink
Add vartan-go command
Browse files Browse the repository at this point in the history
  • Loading branch information
nihei9 committed Mar 27, 2022
1 parent d3867e0 commit d0431e3
Show file tree
Hide file tree
Showing 12 changed files with 1,016 additions and 108 deletions.
279 changes: 227 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,80 +1,111 @@
# vartan

vartan provides a compiler that generates a LALR(1) or SLR(1) parsing table and a driver for golang.
vartan is a parser generator for golang and supports LALR(1) and SLR(1). vartan also provides a command to perform syntax analysis to allow easy debugging of your grammar.

[![Test](https://github.com/nihei9/vartan/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/nihei9/vartan/actions/workflows/test.yml)

## Status

🚧 Now Developing

## Installation

Compiler:

```sh
$ go install github.com/nihei9/vartan/cmd/vartan@latest
```

Code Generator:

```sh
$ go install github.com/nihei9/vartan/cmd/vartan-go@latest
```

## Usage

### 1. Define your grammar

vartan uses BNF-like DSL to define your grammar. As an example, let's write a grammar that represents a simple expression.

```
%name expr
%left mul div
%left add sub
expr
: expr add_op term
| term
;
term
: term mul_op factor
| factor
;
factor
: number
| id
;
: expr add expr
| expr sub expr
| expr mul expr
| expr div expr
| func_call
| integer
| id
| '(' expr ')' #ast #(expr $2)
;
func_call
: id '(' args ')' #ast #(func_call $1 $3)
| id '(' ')' #ast #(func_call $1)
;
args
: args ',' expr #ast #(args $1... $3)
| expr
;
whitespaces: "[\u{0009}\u{0020}]+" #skip;
number: "[0-9]+";
id: "[A-Za-z_]+";
add_op: '+';
mul_op: '*';
integer: "0|[1-9][0-9]*";
id: "[A-Za-z_][0-9A-Za-z_]*";
add: '+';
sub: '-';
mul: '*';
div: '/';
```

Save the above grammar to a file in UTF-8. In this explanation, the file name is `expr.vr`.

⚠️ The input file must be encoded in UTF-8.

### 2. Compile the grammar

Next, generate a parsing table using `vartan compile` command.

```sh
$ vartan compile -g expr.vr -o expr.json
16 conflicts
```

### 3. Debug

#### 3.1. Parse

If you want to make sure that the grammar behaves as expected, you can use `vartan parse` command to try parse without implementing a driver.

⚠️ An encoding that `vartan parse` command and the driver can handle is only UTF-8.

```sh
$ echo -n 'foo + bar * baz * 100' | vartan parse expr.json
$ echo -n 'foo(10, bar(a)) + 99 * x' | vartan parse expr.json
expr
├─ expr
│ └─ term
│ └─ factor
│ └─ id "foo"
├─ add_op "+"
└─ term
├─ term
│ ├─ term
│ │ └─ factor
│ │ └─ id "bar"
│ ├─ mul_op "*"
│ └─ factor
│ └─ id "baz"
├─ mul_op "*"
└─ factor
└─ number "100"
│ └─ func_call
│ ├─ id "foo"
│ └─ args
│ ├─ expr
│ │ └─ integer "10"
│ └─ expr
│ └─ func_call
│ ├─ id "bar"
│ └─ args
│ └─ expr
│ └─ id "a"
├─ add "+"
└─ expr
├─ expr
│ └─ integer "99"
├─ mul "*"
└─ expr
└─ id "x"
```

When `vartan parse` command successfully parses the input data, it prints a CST or an AST (if any).

## Debug
#### 3.2. Resolve conflicts

`vartan compile` command also generates a description file having `-description.json` suffix along with a parsing table. This file describes each state in the parsing table in detail. If your grammar contains conflicts, see `Conflicts` and `States` sections of this file. Using `vartan show` command, you can see the description file in a readable format.

Expand All @@ -86,39 +117,183 @@ LALR(1)

# Conflicts

No conflict was detected.
16 conflicts were detected.

# Terminals

1 - - <eof>
2 - - error
3 - - whitespaces
4 - - number
5 - - id
6 - - add_op (+)
7 - - mul_op (*)
3 - - x_1 (\()
4 - - x_2 (\))
5 - - x_3 (,)
6 - - whitespaces
7 - - integer
8 - - id
9 2 l add (+)
10 2 l sub (-)
11 1 l mul (*)
12 1 l div (/)

# Productions

1 - - expr' → expr
2 - - expr → expr + term
3 - - expr → term
4 - - term → term * factor
5 - - term → factor
6 - - factor → number
7 - - factor → id
2 2 l expr → expr + expr
3 2 l expr → expr - expr
4 1 l expr → expr * expr
5 1 l expr → expr / expr
6 - - expr → func_call
7 - - expr → integer
8 - - expr → id
9 - - expr → \( expr \)
10 - - func_call → id \( args \)
11 - - func_call → id \( \)
12 - - args → args , expr
13 - - args → expr
# States
## State 0
1 expr' → ・ expr

shift 4 on number
shift 3 on \(
shift 4 on integer
shift 5 on id
goto 1 on expr
goto 2 on term
goto 3 on factor
goto 2 on func_call


## State 1

1 expr' → expr ・
2 expr → expr ・ + expr
3 expr → expr ・ - expr
4 expr → expr ・ * expr
5 expr → expr ・ / expr
shift 6 on +
shift 7 on -
shift 8 on *
shift 9 on /
reduce 1 on <eof>
## State 2
6 expr → func_call ・
reduce 6 on <eof>, \), ,, +, -, *, /
...
```
### 4. Generate a parser
Using `vartan-go` command, you can generate a source code of a parser to recognize your grammar.
```sh
$ vartan-go expr.json
```
Then you will get the following files.
* `expr_parser.go`
* `expr_lexer.go`
* `expr_semantic_action.go`
You need to implement a driver to use the parser. An example is below.
```go
package main
import (
"fmt"
"io"
"os"
)
func main() {
toks, err := NewTokenStream(os.Stdin)
if err != nil {
fmt.Println(err)
os.Exit(1)
}
gram := NewGrammar()
treeAct := NewSyntaxTreeActionSet(gram, true, false)
p, err := NewParser(toks, gram, SemanticAction(treeAct))
if err != nil {
fmt.Println(err)
os.Exit(1)
}
err = p.Parse()
if err != nil {
fmt.Println(err)
os.Exit(1)
}
synErrs := p.SyntaxErrors()
if len(synErrs) > 0 {
for _, synErr := range synErrs {
printSyntaxError(os.Stderr, synErr, gram)
}
os.Exit(1)
}
fmt.Println("accepted")
PrintTree(os.Stdout, treeAct.AST())
}
func printSyntaxError(w io.Writer, synErr *SyntaxError, gram Grammar) {
var msg string
tok := synErr.Token
switch {
case tok.EOF():
msg = "<eof>"
case tok.Invalid():
msg = fmt.Sprintf("'%v' (<invalid>)", string(tok.Lexeme()))
default:
if alias := gram.TerminalAlias(tok.TerminalID()); alias != "" {
msg = fmt.Sprintf("'%v' (%v)", string(tok.Lexeme()), alias)
} else {
msg = fmt.Sprintf("'%v' (%v)", string(tok.Lexeme()), gram.Terminal(tok.TerminalID()))
}
}
fmt.Fprintf(w, "%v:%v: %v: %v", synErr.Row+1, synErr.Col+1, synErr.Message, msg)
if len(synErr.ExpectedTerminals) > 0 {
fmt.Fprintf(w, "; expected: %v", synErr.ExpectedTerminals[0])
for _, t := range synErr.ExpectedTerminals[1:] {
fmt.Fprintf(w, ", %v", t)
}
}
fmt.Fprintf(w, "\n")
}
```
Please save the above source code to `main.go` and create a directory structure like the following.
```
/project_root
├── expr_parser.go
├── expr_lexer.go
├── expr_semantic_action.go
└── main.go (the driver you implemented)
```
Now, you can perform the syntax analysis.
```sh
$ echo -n 'foo+99' | go run .
accepted
expr
├─ expr
│ └─ id "foo"
├─ add "+"
└─ expr
└─ integer "99"
```
```sh
$ echo -n 'foo+99?' | go run .
1:7: unexpected token: '?' (<invalid>); expected: <eof>, +, -, *, /
exit status 1
```
Loading

0 comments on commit d0431e3

Please sign in to comment.