Skip to content

Commit

Permalink
Add maleeni-go command
Browse files Browse the repository at this point in the history
maleeni-go generates a lexer that recognizes a specific lexical specification.
  • Loading branch information
nihei9 committed Sep 13, 2021
1 parent 96a555a commit f691b5c
Show file tree
Hide file tree
Showing 4 changed files with 668 additions and 37 deletions.
108 changes: 71 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,27 @@
# maleeni

maleeni provides a command that generates a portable DFA for lexical analysis and a driver for golang. maleeni also provides a command to perform lexical analysis to allow easy debugging of your lexical specification.
maleeni is a lexer generator for golang. maleeni also provides a command to perform lexical analysis to allow easy debugging of your lexical specification.

[![Test](https://github.com/nihei9/maleeni/actions/workflows/test.yml/badge.svg?branch=main)](https://github.com/nihei9/maleeni/actions/workflows/test.yml)

## Installation

Compiler:

```sh
$ go install github.com/nihei9/maleeni/cmd/maleeni@latest
```

Code Generator:

```sh
$ go install github.com/nihei9/maleeni/cmd/maleeni-go@latest
```

## Usage

### 1. Define your lexical specification

First, define your lexical specification in JSON format. As an example, let's write the definitions of whitespace, words, and punctuation.

```json
Expand All @@ -35,14 +45,17 @@ First, define your lexical specification in JSON format. As an example, let's wr

Save the above specification to a file in UTF-8. In this explanation, the file name is lexspec.json.

### 2. Compile the lexical specification

Next, generate a DFA from the lexical specification using `maleeni compile` command.

```sh
$ maleeni compile -l lexspec.json -o clexspec.json
```

If you want to make sure that the lexical specification behaves as expected, you can use `maleeni lex` command to try lexical analysis without having to implement a driver.
`maleeni lex` command outputs tokens in JSON format. For simplicity, print significant fields of the tokens in CSV format using jq command.
### 3. Debug (Optional)

If you want to make sure that the lexical specification behaves as expected, you can use `maleeni lex` command to try lexical analysis without having to generate a lexer. `maleeni lex` command outputs tokens in JSON format. For simplicity, print significant fields of the tokens in CSV format using jq command.

⚠️ An encoding that `maleeni lex` and the driver can handle is only UTF-8.

Expand Down Expand Up @@ -76,50 +89,71 @@ The JSON format of tokens that `maleeni lex` command prints is as follows:
| eof | bool | When this field is `true`, it means the token is the EOF token. |
| invalid | bool | When this field is `true`, it means the token is an error token. |

When using the driver, please import `github.com/nihei9/maleeni/driver` and `github.com/nihei9/maleeni/spec` package.
You can use the driver easily in the following way:
### 4. Generate the lexer

Using `maleeni-go` command, you can generate a source code of the lexer to recognize your lexical specification.

```sh
$ maleeni-go clexspec.json > lexer.go
```

The above command generates the lexer and saves it to `lexer.go` file. To use the lexer, you need to call `NewLexer` function defined in `lexer.go`. The following code is a simple example. In this example, the lexer reads a source code from stdin and writes the result, tokens, to stdout.

```go
// Read your lexical specification file.
f, err := os.Open(path)
if err != nil {
// error handling
}
data, err := ioutil.ReadAll(f)
if err != nil {
// error handling
}
clexspec := &spec.CompiledLexSpec{}
err = json.Unmarshal(data, clexspec)
if err != nil {
// error handling
}
package main

// Generate a lexer.
lex, err := driver.NewLexer(clexspec, src)
if err != nil {
// error handling
}
import (
"fmt"
"os"
)

// Perform lexical analysis.
for {
tok, err := lex.Next()
func main() {
lex, err := NewLexer(NewLexSpec(), os.Stdin)
if err != nil {
// error handling
}
if tok.Invalid {
// An error token appeared.
// error handling
}
if tok.EOF {
// The EOF token appeared.
break
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}

// Do something using `tok`.
for {
tok, err := lex.Next()
if err != nil {
fmt.Fprintln(os.Stderr, err)
os.Exit(1)
}
if tok.EOF {
break
}
if tok.Invalid {
fmt.Printf("invalid: '%v'\n", string(tok.Lexeme))
} else {
fmt.Printf("valid: %v: '%v'\n", tok.KindName, string(tok.Lexeme))
}
}
}
```

Please save the above source code to `main.go` and create a directory structure like the one below.

```
/project_root
├── lexer.go ... Lexer generated from the compiled lexical specification (the result of `maleeni-go`).
└── main.go .... Caller of the lexer.
```

Now, you can perform the lexical analysis.

```sh
$ echo -n 'I want to believe.' | go run main.go lexer.go
valid: word: 'I'
valid: whitespace: ' '
valid: word: 'want'
valid: whitespace: ' '
valid: word: 'to'
valid: whitespace: ' '
valid: word: 'believe'
valid: punctuation: '.'
```

## More Practical Usage

See also [this example](example/README.md).
Expand Down
68 changes: 68 additions & 0 deletions cmd/maleeni-go/generate.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
package main

import (
_ "embed"
"encoding/json"
"fmt"
"io/ioutil"
"os"

"github.com/nihei9/maleeni/driver"
"github.com/nihei9/maleeni/spec"
"github.com/spf13/cobra"
)

func Execute() error {
err := generateCmd.Execute()
if err != nil {
fmt.Fprintf(os.Stderr, "%v\n", err)
return err
}

return nil
}

var generateFlags = struct {
pkgName *string
}{}

var generateCmd = &cobra.Command{
Use: "maleeni-go",
Short: "Generate a lexer for Go",
Long: `maleeni-go generates a lexer for Go. The lexer recognizes the lexical specification specified as the argument.`,
Example: ` maleeni-go clexspec.json > lexer.go`,
Args: cobra.ExactArgs(1),
RunE: runGenerate,
SilenceErrors: true,
SilenceUsage: true,
}

func init() {
generateFlags.pkgName = generateCmd.Flags().StringP("package", "p", "main", "package name")
}

func runGenerate(cmd *cobra.Command, args []string) (retErr error) {
clspec, err := readCompiledLexSpec(args[0])
if err != nil {
return fmt.Errorf("Cannot read a compiled lexical specification: %w", err)
}

return driver.GenLexer(clspec, *generateFlags.pkgName)
}

func readCompiledLexSpec(path string) (*spec.CompiledLexSpec, error) {
f, err := os.Open(path)
if err != nil {
return nil, err
}
data, err := ioutil.ReadAll(f)
if err != nil {
return nil, err
}
clspec := &spec.CompiledLexSpec{}
err = json.Unmarshal(data, clspec)
if err != nil {
return nil, err
}
return clspec, nil
}
12 changes: 12 additions & 0 deletions cmd/maleeni-go/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
package main

import (
"os"
)

func main() {
err := Execute()
if err != nil {
os.Exit(1)
}
}
Loading

0 comments on commit f691b5c

Please sign in to comment.