-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
various string search perf improvements #29678
Conversation
@nanosoldier |
base/strings/search.jl
Outdated
@@ -149,7 +153,7 @@ _nthbyte(a::Union{AbstractVector{UInt8},AbstractVector{Int8}}, i) = a[i] | |||
|
|||
function _searchindex(s::String, t::String, i::Integer) | |||
# Check for fast case of a single byte | |||
lastindex(t) == 1 && return something(findnext(isequal(t[1]), s, i), 0) | |||
sizeof(t) == 1 && return something(findnext(isequal(first(t)), s, i), 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be equivalent, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it is quite:
sizeof(t) == 1
tests thatt
has one code unit. It should probably be written asncodeunits(t) == 1
, which just happens to be the same forString
.lastindex(t) == 1
tests thatt
has only one character since the last valid index is at the start of the string. It could equivalently be written aslength(t) == 1
which might be a bit more efficient.
It might make sense to special-case ncodeunits(t) == 1
since that implies length(t) == 1
and is more efficient to compute, so (ncodeunits(t) == 1 || length(t) == 1)
could be more efficient? Or not since it's more complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, but why is the comment # Check for fast case of a single byte
then? I feel like the code I wrote adheres more to the comment than the previous code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does but I think the logic works fine for any single-char string, no?
nothing_sentinel(i) = i == 0 ? nothing : i | ||
|
||
function findnext(pred::Fix2{<:Union{typeof(isequal),typeof(==)},<:AbstractChar}, | ||
s::String, i::Integer) | ||
if i < 1 || i > sizeof(s) | ||
i == sizeof(s) + 1 && return nothing | ||
throw(BoundsError(s, i)) | ||
__str_throw_boundserror(s, i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope the compiler gurus can eventually fix having to manually include this non-intuitive and ugly hack that is regrettably present in a lot of base function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is mostly related to
julia/base/compiler/optimize.jl
Lines 344 to 349 in 0a685d3
elseif head === :enter | |
# try/catch is a couple function calls, | |
# but don't inline functions with try/catch | |
# since these aren't usually performance-sensitive functions, | |
# and llvm is more likely to miscompile them when these functions get large | |
return typemax(Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh? There is no try/catch here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad. I got confused.
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
base/strings/search.jl
Outdated
@@ -253,7 +257,14 @@ julia> findnext("Lang", "JuliaLang", 2) | |||
6:9 | |||
``` | |||
""" | |||
findnext(t::AbstractString, s::AbstractString, i::Integer) = _search(s, t, i) | |||
function findnext(t::AbstractString, s::AbstractString, i::Integer) | |||
if sizeof(t) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cant be correct for AbstractString
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely not. What is it trying to check for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the comment above
ff401e8
to
105b55e
Compare
105b55e
to
fdecddc
Compare
Upon rethinking this, I don't think this fast path is really useful. |
Before
After:
Fixes #29555