Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize String#compare in case of ASCII only #7352

Merged
merged 1 commit into from
Apr 6, 2019

Conversation

wooster0
Copy link
Contributor

This optimizes String#compare when other and self is ASCII:

Benchmark.ips do |bm|
  bm.report "String#new_compare with ASCII" do
    "HeLlo".new_compare("hELlO", true) == 0
  end

  bm.report "String#compare with ASCII" do
    "HeLlo".compare("hELlO", true) == 0
  end
end
String#new_compare with ASCII 504.84M (  1.98ns) (±31.44%)  0 B/op        fastest
    String#compare with ASCII  12.06M ( 82.89ns) (±33.57%)  0 B/op  41.85× slower

@ysbaddaden
Copy link
Contributor

What about longer and much longer strings?

src/string.cr Outdated Show resolved Hide resolved
@wooster0
Copy link
Contributor Author

wooster0 commented Jan 31, 2019

@ysbaddaden then it's even slower:

string = "aA" * 500
Benchmark.ips do |bm|
  bm.report "String#new_compare for ASCII" do
    string.new_compare(string, true) == 0
  end

  bm.report "String#compare for ASCII" do
    string.compare(string, true) == 0
  end
end
String#new_compare for ASCII 367.12M (  2.72ns) (±22.18%)  0 B/op          fastest
    String#compare for ASCII  92.58k (  10.8µs) (±19.51%)  0 B/op  3965.56× slower

string1 = "aA" * 5000
string2 = "aa" * 5000
Benchmark.ips do |bm|
  bm.report "String#new_compare for ASCII" do
    string1.new_compare(string2, true) == 0
  end

  bm.report "String#compare for ASCII" do
    string1.compare(string2, true) == 0
  end
end
String#new_compare for ASCII 148.19M (  6.75ns) (±31.68%)  0 B/op           fastest
    String#compare for ASCII   6.81k ( 146.8µs) (±26.48%)  0 B/op  21755.59× slower

@ysbaddaden
Copy link
Contributor

Great!

src/string.cr Outdated Show resolved Hide resolved
@wooster0
Copy link
Contributor Author

wooster0 commented Feb 2, 2019

I changed String#compare to String#equals? now and it returns a Bool because I couldn't really find a way to make compare as fast as equals? while making it have the same behavior as <=> in the ASCII-only branch.

But also, I've scrolled for quite some time through this: https://github.com/search?l=C&o=asc&q=strcasecmp&s=indexed&type=Code. strcasecmp is similar to String#compare. It returns a positive number when the first string is greater than the other, zero if they are equal and a negative number if the string is less than other (e.g. strcasecmp("hi", "h"); returns 105, that's the codepoint of h). But I couldn't find a single instance in all this code where the user didn't only want to know whether the strings are equal regardless of the case. The codepoint itself that is returned by that method is never actually used. They only check if it's negative or positive (you often see ! before strcasecmp or == 0, != 0 at the end).
So I think it would be better to just let this method return a bool already.

Copy link
Contributor

@ysbaddaden ysbaddaden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't drop #compare because you don't like it: you're throwing away the possibility to compare strings case insensitively.

@asterite
Copy link
Member

asterite commented Feb 2, 2019

A use case: sort strings case insensitive.

src/string.cr Outdated Show resolved Hide resolved
@wooster0
Copy link
Contributor Author

wooster0 commented Feb 2, 2019

ok I think I've found a relatively efficient way to do it now.

src/string.cr Show resolved Hide resolved
src/string.cr Outdated Show resolved Hide resolved
@Sija
Copy link
Contributor

Sija commented Feb 2, 2019

Having String#equals?(other : String) : Bool delegating to compare(other, case_insensitive: true) == 0 would be nice tho'.

@wooster0
Copy link
Contributor Author

wooster0 commented Feb 3, 2019

A bug in the Crystal compiler occurred in the test_darwin CircleCI. Doesn't seem to be related.

src/string.cr Outdated Show resolved Hide resolved
src/string.cr Outdated Show resolved Hide resolved
@Sija
Copy link
Contributor

Sija commented Feb 4, 2019

CI failed on Darwin with:

./bin/crystal build  -o .build/std_spec spec/std_spec.cr

while requiring "prelude" (Exception)
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::Program#top_level_semantic<Crystal::ASTNode+>:Tuple(Crystal::ASTNode+, Crystal::TypeDeclarationProcessor)
  from Crystal::Program#semantic<Crystal::ASTNode+, Bool>:Crystal::ASTNode+
  from Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result
  from Crystal::Command#run:(Bool | Crystal::Compiler::Result | Nil)
  from __crystal_main
  from main
Caused by: while requiring "exception" (Exception)
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::Program#top_level_semantic<Crystal::ASTNode+>:Tuple(Crystal::ASTNode+, Crystal::TypeDeclarationProcessor)
  from Crystal::Program#semantic<Crystal::ASTNode+, Bool>:Crystal::ASTNode+
  from Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result
  from Crystal::Command#run:(Bool | Crystal::Compiler::Result | Nil)
  from __crystal_main
  from main
Caused by: while requiring "callstack" (Exception)
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::Program#top_level_semantic<Crystal::ASTNode+>:Tuple(Crystal::ASTNode+, Crystal::TypeDeclarationProcessor)
  from Crystal::Program#semantic<Crystal::ASTNode+, Bool>:Crystal::ASTNode+
  from Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result
  from Crystal::Command#run:(Bool | Crystal::Compiler::Result | Nil)
  from __crystal_main
  from main
Caused by: while requiring "c/dlfcn" (Exception)
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::Program#top_level_semantic<Crystal::ASTNode+>:Tuple(Crystal::ASTNode+, Crystal::TypeDeclarationProcessor)
  from Crystal::Program#semantic<Crystal::ASTNode+, Bool>:Crystal::ASTNode+
  from Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result
  from Crystal::Command#run:(Bool | Crystal::Compiler::Result | Nil)
  from __crystal_main
  from main
Caused by: can't find file 'c/dlfcn'

If you're trying to require a shard:
- Did you remember to run `shards install`?
- Did you make sure you're running the compiler in the same directory as your shard.yml? (Crystal::CrystalPath::Error)
  from Crystal::Program#find_in_path<String, (String | Nil)>:Array(String)
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::TopLevelVisitor#visit<Crystal::Expressions>:Bool
  from Crystal::ASTNode+@Crystal::ASTNode#accept<Crystal::TopLevelVisitor>:Nil
  from Crystal::Program#top_level_semantic<Crystal::ASTNode+>:Tuple(Crystal::ASTNode+, Crystal::TypeDeclarationProcessor)
  from Crystal::Program#semantic<Crystal::ASTNode+, Bool>:Crystal::ASTNode+
  from Crystal::Compiler#compile<Array(Crystal::Compiler::Source), String>:Crystal::Compiler::Result
  from Crystal::Command#run:(Bool | Crystal::Compiler::Result | Nil)
  from __crystal_main
  from main

Error: you've found a bug in the Crystal compiler. Please open an issue, including source code that will allow us to reproduce the bug: https://github.com/crystal-lang/crystal/issues

Copy link
Member

@straight-shoota straight-shoota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation might be fast, but it doesn't seem to be anywhere complete.
Please add an example where a case-insensitive comparing character is not the first one but somewhere further inside the string.

spec/std/string_spec.cr Outdated Show resolved Hide resolved
src/string.cr Outdated Show resolved Hide resolved
src/string.cr Outdated Show resolved Hide resolved
src/string.cr Outdated Show resolved Hide resolved
src/string.cr Outdated Show resolved Hide resolved
src/string.cr Outdated Show resolved Hide resolved
src/string.cr Show resolved Hide resolved
@bcardiff
Copy link
Member

bcardiff commented Apr 5, 2019

@r00ster91 would you mind doing a rebase on master so the CI can be happy?

@ysbaddaden ysbaddaden dismissed their stale review April 5, 2019 21:43

old review

@bcardiff bcardiff merged commit 11aca41 into crystal-lang:master Apr 6, 2019
@bcardiff
Copy link
Member

bcardiff commented Apr 6, 2019

Thanks @r00ster91 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants