Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

various string search perf improvements #29678

Closed
wants to merge 1 commit into from
Closed

Conversation

KristofferC
Copy link
Member

@KristofferC KristofferC commented Oct 16, 2018

Before

julia> @btime findnext(isequal('v'), "abcdefghijklmnopqrstuv", 1)
  20.982 ns (0 allocations: 0 bytes)
22

julia> @btime findnext(",", "fsdfdsfdsfsdfds, sdfsfd", 1)
  39.258 ns (0 allocations: 0 bytes)
16:16

julia> @btime split("str,str,str,str,str,str,str", ",")
  662.151 ns (10 allocations: 432 bytes)

julia> @btime findlast(",", "fsdfds,fdsfsdfdssdfsfd")
  44.284 ns (0 allocations: 0 bytes)
7:7

julia> @btime findlast(isequal(','), "fsdfds,fdsfsdfdssdfsfd")
  25.692 ns (0 allocations: 0 bytes)
7

After:

julia> @btime findnext(isequal('v'), "abcdefghijklmnopqrstuv", 1)
  11.596 ns (0 allocations: 0 bytes)
22

julia> @btime findnext(",", "fsdfdsfdsfsdfds, sdfsfd", 1)
  19.166 ns (0 allocations: 0 bytes)
16:16

julia> @btime split("str,str,str,str,str,str,str", ",")
  406.869 ns (10 allocations: 432 bytes)

julia> @btime findlast(",", "fsdfds,fdsfsdfdssdfsfd")
  19.495 ns (0 allocations: 0 bytes)
7:7

julia> @btime findlast(isequal(','), "fsdfds,fdsfsdfdssdfsfd")
  22.811 ns (0 allocations: 0 bytes)
7

Fixes #29555

@KristofferC
Copy link
Member Author

@nanosoldier runbenchmarks("string", vs = ":master")

@@ -149,7 +153,7 @@ _nthbyte(a::Union{AbstractVector{UInt8},AbstractVector{Int8}}, i) = a[i]

function _searchindex(s::String, t::String, i::Integer)
# Check for fast case of a single byte
lastindex(t) == 1 && return something(findnext(isequal(t[1]), s, i), 0)
sizeof(t) == 1 && return something(findnext(isequal(first(t)), s, i), 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be equivalent, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is quite:

  • sizeof(t) == 1 tests that t has one code unit. It should probably be written as ncodeunits(t) == 1, which just happens to be the same for String.
  • lastindex(t) == 1 tests that t has only one character since the last valid index is at the start of the string. It could equivalently be written as length(t) == 1 which might be a bit more efficient.

It might make sense to special-case ncodeunits(t) == 1 since that implies length(t) == 1 and is more efficient to compute, so (ncodeunits(t) == 1 || length(t) == 1) could be more efficient? Or not since it's more complicated.

Copy link
Member Author

@KristofferC KristofferC Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, but why is the comment # Check for fast case of a single byte then? I feel like the code I wrote adheres more to the comment than the previous code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does but I think the logic works fine for any single-char string, no?

@KristofferC KristofferC added performance Must go faster strings "Strings!" labels Oct 16, 2018
nothing_sentinel(i) = i == 0 ? nothing : i

function findnext(pred::Fix2{<:Union{typeof(isequal),typeof(==)},<:AbstractChar},
s::String, i::Integer)
if i < 1 || i > sizeof(s)
i == sizeof(s) + 1 && return nothing
throw(BoundsError(s, i))
__str_throw_boundserror(s, i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope the compiler gurus can eventually fix having to manually include this non-intuitive and ugly hack that is regrettably present in a lot of base function.

Copy link
Member Author

@KristofferC KristofferC Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is mostly related to

elseif head === :enter
# try/catch is a couple function calls,
# but don't inline functions with try/catch
# since these aren't usually performance-sensitive functions,
# and llvm is more likely to miscompile them when these functions get large
return typemax(Int)
.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh? There is no try/catch here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad. I got confused.

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@KristofferC
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@@ -253,7 +257,14 @@ julia> findnext("Lang", "JuliaLang", 2)
6:9
```
"""
findnext(t::AbstractString, s::AbstractString, i::Integer) = _search(s, t, i)
function findnext(t::AbstractString, s::AbstractString, i::Integer)
if sizeof(t) == 1
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cant be correct for AbstractString.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not. What is it trying to check for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the comment above

@KristofferC KristofferC force-pushed the kc/search_str_perf branch 3 times, most recently from ff401e8 to 105b55e Compare October 17, 2018 16:17
@KristofferC
Copy link
Member Author

Upon rethinking this, I don't think this fast path is really useful.

@DilumAluthge DilumAluthge deleted the kc/search_str_perf branch August 24, 2021 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

findnext for char in string is slower than in 0.6
5 participants