-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up enry.IsVendor
#15213
Speed up enry.IsVendor
#15213
Conversation
`enry.IsVendor` is kinda slow as it simply iterates across all regexps. This PR ajdusts the regexps to combine them to make this process a little quicker. Related go-gitea#15143 Signed-off-by: Andrew Thornton <art27@cantab.net>
This currently is about twice as fast as |
Seems like a good idea. Linguist does similar, but they simply |
This iterative process should suggest that some further simplifications should speed things up further. However if we drop full regexp compatibility we should be able to make it faster with a path trie. (I recently came across a library that looked reasonable for this - but its name escapes me.) |
My suspicion is that go's regexp simplifier is just not as good as ruby's or even that GitHub is using something like hyperscan to make it faster. It's really worth noting that twice as fast here is still not really enough - it really would be better to be of the order of at least 10 times as fast. Now in the associated issue the profile indicated more than a few other places that needed speed ups - however I suspect that some of these may be already on their way to being improved. (For example I think since the chi migration we shouldn't be listing branches or tags to do a http push anymore and we don't do a hash to assert if a password is set - we do however test password authentication twice during pushing!!) |
If |
I don't think we can assume that but I guess there are likely to be some common paths that could be captured. It's possible though that a trie can be just as fast as a cache. Certainly though there are lots of places we could be memoizing by throwing stuff in to caches or precomputing - in particular commit verification. |
Maybe this could be contributed upstream? |
Signed-off-by: Andrew Thornton <art27@cantab.net>
Signed-off-by: Andrew Thornton <art27@cantab.net>
Should definitely upstream this, including the tests which already seem more extensive than enry's own. Thought I guess as a short-term solution it's fine. |
🚀 |
Backport go-gitea#15213 `enry.IsVendor` is kinda slow as it simply iterates across all regexps. This PR ajdusts the regexps to combine them to make this process a little quicker. Related go-gitea#15143 Signed-off-by: Andrew Thornton <art27@cantab.net>
Backport go-gitea#15213 `enry.IsVendor` is kinda slow as it simply iterates across all regexps. This PR ajdusts the regexps to combine them to make this process a little quicker. Related go-gitea#15143 Signed-off-by: Andrew Thornton <art27@cantab.net>
enry.IsVendor
is kinda slow as it simply iterates across all regexps.This PR ajdusts the regexps to combine them to make this process a
little quicker.
Related #15143
Signed-off-by: Andrew Thornton art27@cantab.net