-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a faster string-search method with simd instructions than std::find #10858
Conversation
✅ Deploy Preview for meta-velox canceled.
|
e99e699
to
457860e
Compare
@Yuhta from #10731
I found that the performance of the code will be reduced by 30% when placed in the cpp file. So for the case, I change the code like this :
I haven't found a good way to handle this situation, could you give me some advice?
RedHat define the valid values:
|
991be1a
to
54422e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider the other suggestions in #10731 (review)
54422e3
to
2cf06d7
Compare
Updates code for the suggestions in #10731:
done
done
I have extended StringBenchmark to support different algrithm, and kmp is always slower than simdStrStr(maybe my implement is not good so I add std::boyer_moore_searcher for comparison):
I update ut refer to folly uts, add random test cases. Could you have look at this once more? |
@Yuhta ping~ |
58a2c79
to
3ae8be1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmark result looks very good
3ae8be1
to
13220d3
Compare
@Yuhta update code
As we inline kPageSize, pageSafe method is faster than before, I move the check to each batch iteration, we only miss the comparison-span-pages case, maybe better, like clickhouse:
new benchmark result:
|
13220d3
to
40a8df4
Compare
0e2a09d
to
b6c6706
Compare
@Yuhta ping~ |
Any more comments about this ? |
7aa4d26
to
446f09d
Compare
cb6e208
to
ce1f33c
Compare
@Yuhta ping ~ |
ce1f33c
to
480c3ae
Compare
480c3ae
to
a16bf39
Compare
@assignUser @majetideepak @Yuhta |
@skadilover CMake looks fine, can't comment on the rest :) |
a16bf39
to
08faa90
Compare
add ut / benchmark check the last page
08faa90
to
1c11800
Compare
@Yuhta has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…nd (facebookincubator#10858) Summary: ### Description In practice, c_strstr is always faster than std::find in most cases, refer to [1]. However, the c_strstr interface will truncate '\0', which makes it unable to handle unicode characters. In addition, glibc uses two-way- search, see[2], but it does not seem to be exposed as a common interface. It requires some additional code to determine the best path. In some scenarios, the decision-making code may cause performance degradation. ### Solution There are many optimization algorithms for string-search, refer to [7], but they cannot change the essence of the problem, which is a search process of O(n*m) complexity. I think the most common way to improve performance is to optimize the search instructions. For example, doris (the same is true for clickhouse, this part of the code of doris was copied from clickhouse, and the Volnitsky[11] part was removed), see [5]. This PR refers to the implementation of [5], [6], [8] : [5] doris : first-two comapre first, and the issue of page-safe was also mentioned [6] stringzilla: first-mid-lst compare first [8] simd-strstr implement by WojciechMula, first-lst comapre first. They all look few chars first, the more chars you choose to look the higher probability of hitting the optimized path, and the more cost of 'optimized path' itself. In the end, the solution I chose is to first compare first-last-chars and then compare the remaining bytes. This is because in practice, our users seem to be able to get the matching results through first-last-chars. **ut & benchmark** This part of the code refers to folly’s test code, see [9] [10], of course, some adjustments have been made based on the code implementation. ### Benchmark : std::find vs simdStrStr: findSuccessful(opt_50_5) 31.66ms 31.59 findSuccessful(opt_100_10) 36.81ms 27.16 findSuccessful(opt_100_20) 96.47ms 10.37 findSuccessful(opt_1k_10) 63.27ms 15.81 findSuccessful(opt_1k_100) 156.95ms 6.37 findSuccessful(std_50_5) 45.24ms 22.11 findSuccessful(std_100_10) 191.23ms 5.23 findSuccessful(std_100_20) 322.11ms 3.10 findSuccessful(std_1k_10) 122.69ms 8.15 findSuccessful(std_1k_100) 168.10ms 5.95 findUnsuccessful(std_first_char_match) 819.13ms 1.22 findUnsuccessful(opt_first_char_match) 261.46ms 3.82 findUnsuccessful(std_first_char_unmatch) 249.12ms 4.01 findUnsuccessful(opt_first_char_unmatch) 199.02ms 5.02 ### References [1] gcc std::search issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414 [2] glibc strstr implements: https://github.com/lattera/glibc/blob/master/string/strstr.c [3] two way search: http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260 [4] std::boyer_moore_searcher https://en.cppreference.com/w/cpp/utility/functional/boyer_moore_searcher https://github.com/mclow/search-library?tab=readme-ov-file [5] doris string-search https://github.com/apache/doris/blob/4e326e5f496109edb253d094869d057573ab7a59/be/src/vec/common/string_searcher.h#L44 [6] stringzilla https://github.com/ashvardanian/StringZilla/blob/main/include/stringzilla/stringzilla.h#L2187 [7] string-search algorithms https://www-igm.univ-mlv.fr/~lecroq/string/ [8] sse4 strst https://github.com/WojciechMula/sse4-strstr [9] folly benchmark https://github.com/facebook/folly/blob/ce5edfb9b08ead9e78cb46879e7b9499861f7cd2/folly/test/FBStringTestBenchmarks.cpp.h [10] folly ut https://github.com/facebook/folly/blob/ce5edfb9b08ead9e78cb46879e7b9499861f7cd2/folly/test/FBStringTest.cpp#L1277 [11] clickhouse Volnitsky https://github.com/ClickHouse/ClickHouse/blob/a51f867ce014a4cb8f97c46e004b90f5feafd80f/src/Common/Volnitsky.h#L482 Pull Request resolved: facebookincubator#10858 Reviewed By: pedroerp Differential Revision: D65949876 Pulled By: Yuhta fbshipit-source-id: e48929a7f61694da93fe3df4ce777909118b9312
Description
In practice, c_strstr is always faster than std::find in most cases, refer to [1]. However, the c_strstr interface will truncate '\0', which makes it unable to handle unicode characters. In addition, glibc uses two-way- search, see[2], but it does not seem to be exposed as a common interface. It requires some additional code to determine the best path. In some scenarios, the decision-making code may cause performance degradation.
Solution
There are many optimization algorithms for string-search, refer to [7], but they cannot change the essence of the problem, which is a search process of O(n*m) complexity. I think the most common way to improve performance is to optimize the search instructions. For example, doris (the same is true for clickhouse, this part of the code of doris was copied from clickhouse, and the Volnitsky[11] part was removed), see [5].
This PR refers to the implementation of [5], [6], [8] :
[5] doris : first-two comapre first, and the issue of page-safe was also mentioned
[6] stringzilla: first-mid-lst compare first
[8] simd-strstr implement by WojciechMula, first-lst comapre first.
They all look few chars first, the more chars you choose to look the higher probability of hitting the optimized path, and the more cost of 'optimized path' itself.
In the end, the solution I chose is to first compare first-last-chars and then compare the remaining bytes. This is because in practice, our users seem to be able to get the matching results through first-last-chars.
ut & benchmark
This part of the code refers to folly’s test code, see [9] [10], of course, some adjustments have been made based on the code implementation.
Benchmark : std::find vs simdStrStr:
findSuccessful(opt_50_5) 31.66ms 31.59
findSuccessful(opt_100_10) 36.81ms 27.16
findSuccessful(opt_100_20) 96.47ms 10.37
findSuccessful(opt_1k_10) 63.27ms 15.81
findSuccessful(opt_1k_100) 156.95ms 6.37
findSuccessful(std_50_5) 45.24ms 22.11
findSuccessful(std_100_10) 191.23ms 5.23
findSuccessful(std_100_20) 322.11ms 3.10
findSuccessful(std_1k_10) 122.69ms 8.15
findSuccessful(std_1k_100) 168.10ms 5.95
findUnsuccessful(std_first_char_match) 819.13ms 1.22
findUnsuccessful(opt_first_char_match) 261.46ms 3.82
findUnsuccessful(std_first_char_unmatch) 249.12ms 4.01
findUnsuccessful(opt_first_char_unmatch) 199.02ms 5.02
References
[1] gcc std::search issue:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66414
[2] glibc strstr implements:
https://github.com/lattera/glibc/blob/master/string/strstr.c
[3] two way search:
http://www-igm.univ-mlv.fr/~lecroq/string/node26.html#SECTION00260
[4] std::boyer_moore_searcher
https://en.cppreference.com/w/cpp/utility/functional/boyer_moore_searcher
https://github.com/mclow/search-library?tab=readme-ov-file
[5] doris string-search
https://github.com/apache/doris/blob/4e326e5f496109edb253d094869d057573ab7a59/be/src/vec/common/string_searcher.h#L44
[6] stringzilla
https://github.com/ashvardanian/StringZilla/blob/main/include/stringzilla/stringzilla.h#L2187
[7] string-search algorithms
https://www-igm.univ-mlv.fr/~lecroq/string/
[8] sse4 strst
https://github.com/WojciechMula/sse4-strstr
[9] folly benchmark
https://github.com/facebook/folly/blob/ce5edfb9b08ead9e78cb46879e7b9499861f7cd2/folly/test/FBStringTestBenchmarks.cpp.h
[10] folly ut
https://github.com/facebook/folly/blob/ce5edfb9b08ead9e78cb46879e7b9499861f7cd2/folly/test/FBStringTest.cpp#L1277
[11] clickhouse Volnitsky
https://github.com/ClickHouse/ClickHouse/blob/a51f867ce014a4cb8f97c46e004b90f5feafd80f/src/Common/Volnitsky.h#L482