Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor date detection in lexer #440

Closed
moorereason opened this issue Sep 12, 2020 · 3 comments · Fixed by #448 or #447
Closed

Refactor date detection in lexer #440

moorereason opened this issue Sep 12, 2020 · 3 comments · Fixed by #448 or #447
Assignees
Labels
performance Issue related to a performance problem or pull request improving performance.

Comments

@moorereason
Copy link
Contributor

Describe the Issue

As of 5c94d86, 18-20% of memory consumption is occurring inside of the dateRegexp matching in lexRvalue when unmarshalling the benchmark.toml doc. There are very few dates in the TOML document.

sh$ go tool pprof mytest mem.pprof
(pprof) top1
Showing nodes accounting for 36.09kB, 17.95% of 201.03kB total
Dropped 81 nodes (cum <= 1kB)
Showing top 1 nodes out of 95
      flat  flat%   sum%        cum   cum%
   36.09kB 17.95% 17.95%    36.09kB 17.95%  regexp.(*bitState).reset

To Reproduce

Run the unmarshal happening in BenchmarkUnmarshalToml once with pprof profiling enabled. Here's the jist of it:

package main

import (
        "io/ioutil"
        "log"
        "time"

        "github.com/pelletier/go-toml"
        "github.com/pkg/profile"
)

func main() {
        defer profile.Start(
                profile.MemProfile,
                profile.MemProfileRate(1),
                profile.ProfilePath(".")).Stop()

        benchmarkUnmarshalToml()
}

func benchmarkUnmarshalToml() {
        bytes, err := ioutil.ReadFile("benchmark.toml")
        if err != nil {
                log.Fatal(err)
        }

        target := benchmarkDoc{}
        err = toml.Unmarshal(bytes, &target)
        if err != nil {
                log.Fatal(err)
        }
}

type benchmarkDoc struct {
// ... see benchmark/benchmark_test.go

Expected Behavior

No regular expressions in the lexer. Better memory performance.

Versions

  • go-toml 5c94d86
  • go version go1.15.2 linux/amd64

Additional Context

Part of the issue may be that the regexp is executed every time a number or date is expected.

Related to #167

@pelletier
Copy link
Owner

Totally right, that needs to go. I was lazy when I put that in back then, now it's biting us in the back. Sorry 😬

@pelletier pelletier added the performance Issue related to a performance problem or pull request improving performance. label Sep 12, 2020
@moorereason
Copy link
Contributor Author

As an additional side-effect, factoring out the regexp would get rid of init() usage in the lexer, which is generally frowned upon for libraries. The same issue applies to numberUnderscoreInvalidRegexp and hexNumberUnderscoreInvalidRegexp in the parser.

@pelletier pelletier self-assigned this Oct 11, 2020
pelletier added a commit that referenced this issue Oct 11, 2020
Fixes #440.

```
benchmark                    old ns/op     new ns/op     delta
BenchmarkUnmarshalToml-8     269582        257032        -4.66%

benchmark                    old allocs     new allocs     delta
BenchmarkUnmarshalToml-8     2650           2650           +0.00%

benchmark                    old bytes     new bytes     delta
BenchmarkUnmarshalToml-8     127761        127030        -0.57%
```
@pelletier
Copy link
Owner

With the two patches:

benchmark                    old ns/op     new ns/op     delta
BenchmarkUnmarshalToml-8     297364        257032        -13.56%

benchmark                    old allocs     new allocs     delta
BenchmarkUnmarshalToml-8     2746           2650           -3.50%

benchmark                    old bytes     new bytes     delta
BenchmarkUnmarshalToml-8     133359        127030        -4.75%

@pelletier pelletier linked a pull request Oct 11, 2020 that will close this issue
pelletier added a commit that referenced this issue Oct 11, 2020
* Remove underscore regexps

Fixes #440.

```
benchmark                    old ns/op     new ns/op     delta
BenchmarkUnmarshalToml-8     269582        257032        -4.66%

benchmark                    old allocs     new allocs     delta
BenchmarkUnmarshalToml-8     2650           2650           +0.00%

benchmark                    old bytes     new bytes     delta
BenchmarkUnmarshalToml-8     127761        127030        -0.57%
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Issue related to a performance problem or pull request improving performance.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants