Skip to content
This repository has been archived by the owner on Jan 28, 2021. It is now read-only.

Make oniguruma a default regex engine #618

Merged
merged 2 commits into from
Feb 25, 2019
Merged

Make oniguruma a default regex engine #618

merged 2 commits into from
Feb 25, 2019

Conversation

kuba--
Copy link
Contributor

@kuba-- kuba-- commented Feb 13, 2019

Signed-off-by: kuba-- kuba@sourced.tech
Closes #615

Integrate hyperscan as a new (default) regex engine.

@kuba-- kuba-- changed the title Integrate hyperscan as a new regex engine Integrate hyperscan and onigmo as a new regex engines Feb 15, 2019
@@ -75,7 +89,7 @@ jobs:
- make TEST=ruby integration

- language: java
jdk: oraclejdk9
jdk: openjdk10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to fix a build

go.mod Outdated
@@ -8,13 +8,15 @@ require (
github.com/boltdb/bolt v1.3.1
github.com/cespare/xxhash v1.1.0 // indirect
github.com/codahale/hdrhistogram v0.0.0-20161010025455-3a0bb77429bd // indirect
github.com/flier/gohs v1.0.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binding for hyperscan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we update this file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"go mod tidy"

go.mod Outdated
github.com/go-ole/go-ole v1.2.2 // indirect
github.com/go-sql-driver/mysql v1.4.1
github.com/gogo/protobuf v1.2.0 // indirect
github.com/google/go-cmp v0.2.0 // indirect
github.com/gorilla/handlers v1.4.0 // indirect
github.com/gorilla/mux v1.7.0 // indirect
github.com/hashicorp/memberlist v0.1.3 // indirect
github.com/linyows/go-onigmo v0.1.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binding for onigmo

@kuba-- kuba-- requested a review from ajnavarro February 15, 2019 00:57
@kuba-- kuba-- added the enhancement New feature or request label Feb 15, 2019

if n == "hyperscan" {
return "hyperscan"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all these lines can be transformed into:

switch n {
case "onigmo", "oniguruma", "hyperscan":
        return n
}

if err != nil {
return nil, err
}
runtime.SetFinalizer(re, (*onigmo.Regexp).Free)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finalizers are not guaranteed to be called. I don't think it's a proper mechanism to free the regexps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The safest refactoring (IMO) is to extend our matcher interface by adding close/free function. Or maybe we can sacrifice some performance and have only one Match function which will be registered and do all - compile/match/free.
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can check in the usage place if the matcher implements io.Closer and then implement Close() error method only in the matchers that need some closing.

That way, current matchers do not need any refactoring and nothing needs to change. Then, in the place we use the matcher we add the io.Closer spec and call it if needed.

if err != nil {
return nil, err
}
runtime.SetFinalizer(db, (hyperscan.Database).Close)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with this

.travis.yml Outdated
- cd onigmo-6.2.0
- ./configure
- make
- sudo make install
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also update the docs about how to install onigmo. Isn't there a more user-friendly way of installing it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly I could not find any convenient way (like apt-get or brew), so I compiled from source. The good think - there is no dependencies, so it compiles smoothly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated a README, as well.

@kuba--
Copy link
Contributor Author

kuba-- commented Feb 22, 2019

@ajnavarro, @erizocosmico - guys take a look into the next iteration.
After y'da's discussion about regex engine(s), we made a decision to make oniguruma a default one (if you want go's regex, you need to compile mysql with --tags go).
Apart from that, I've added a Disposer (next to Matcher) to our API. it's mainly for C-libs.
Basically Disposer will be called if we don't want to cache compiled matchers.
PTAL

@erizocosmico
Copy link
Contributor

Didn't oniguruma use to segfault/panic in certain times? That's why we removed it. Was that fixed? If not, why are we adding this back again?

@kuba--
Copy link
Contributor Author

kuba-- commented Feb 22, 2019

The next step is to integrate it in gitbase. On gitbase only oniguruma will be available. We can think about GOREGEXP queries (where we'll use go std. lib), but it will require changes in vitess parser.

@kuba--
Copy link
Contributor Author

kuba-- commented Feb 22, 2019

Regarding #618 (comment)
In my opinion it was fixed after I added a Pool but maybe @jfontan knows more details.

@ajnavarro
Copy link
Contributor

@kuba-- could you change the PR title please? It's a little bit misleading now.

README.md Outdated
make oniguruma # on linux you may need sudo
```

If you want to use regex engine from go's standard library, you have to compile **go-mysql-server** with `--tags go` argument.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we change the tag to something more specific, like golang_regex_engine or something like that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are a lot of possibilities that another library is using the same go tag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is a risk, so lets prefix it with mysql_

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, mysql_go is too broad, but maybe is the go way ;)

@kuba-- kuba-- changed the title Integrate hyperscan and onigmo as a new regex engines Make the oniguruma a default regex engine Feb 22, 2019
@kuba-- kuba-- changed the title Make the oniguruma a default regex engine Make oniguruma a default regex engine Feb 22, 2019
README.md Outdated
```

If you want to use regex engine from go's standard library, you have to compile **go-mysql-server** with `--tags go` argument.
Besides standard go's library and [oniguruma](github.com/kkos/oniguruma) the **go-mysql-server** also supports multiple engines originaly implemented in C.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove any other regex engine and focus on only one.

@jfontan
Copy link
Contributor

jfontan commented Feb 22, 2019

As @kuba says in #618 (comment) I believe it was fixed with the regexp Pool.

@kuba--
Copy link
Contributor Author

kuba-- commented Feb 22, 2019

@ajnavarro the build finally passed
Shall I change go tag to something like mysql-go-regex?
Also question - totally remove other engines, or rename tags to follow the same pattern:
mysql-onigmo-regex
mysql-hyperscan-regex
?

@ajnavarro
Copy link
Contributor

I would say to totally remove old engines. Fewer engines, less code to maintain. If we need them in the future we can go to the commit history and get them.

mysql-go-regex sounds good.

Signed-off-by: kuba-- <kuba@sourced.tech>
@kuba--
Copy link
Contributor Author

kuba-- commented Feb 25, 2019

@ajnavarro - PTAL. The new tag is mysql_go_regex.

Signed-off-by: kuba-- <kuba@sourced.tech>
@kuba--
Copy link
Contributor Author

kuba-- commented Feb 25, 2019

Updated after go mod tidy

Copy link
Contributor

@ajnavarro ajnavarro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the go.mod file, that should be updated.

@kuba--
Copy link
Contributor Author

kuba-- commented Feb 25, 2019

@kuba-- - it was just updated (go mod tidy) - PTAL

@ajnavarro ajnavarro merged commit bdd9e0c into src-d:master Feb 25, 2019
@kuba-- kuba-- deleted the perf-615/regex branch February 25, 2019 21:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants