-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LibraryScanner: Improve hashing of directory contents #2497
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
3171c9c
Add utility functions and typedefs for cache keys
uklotzde 686a229
Use a cryptographic hash function for directory hashes
uklotzde 2ebaede
Add reasoning for cache key calculation
uklotzde 44ee5df
Use prepare/bind/exec for cleanup SQL query
uklotzde 160618c
Truncate message digest into cache key
uklotzde 5c6996d
Use static_assert from C++11
uklotzde 866704d
Merge branch 'master' into libraryscannerhash
uklotzde File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
#include "util/cache.h" | ||
|
||
#include <gtest/gtest.h> | ||
|
||
#include <QtDebug> | ||
|
||
namespace { | ||
|
||
class CacheTest : public testing::Test { | ||
}; | ||
|
||
TEST_F(CacheTest, InvalidCacheKey) { | ||
EXPECT_FALSE(mixxx::isValidCacheKey( | ||
mixxx::invalidCacheKey())); | ||
} | ||
|
||
TEST_F(CacheTest, InvalidCacheKeyEmptyByteArray) { | ||
EXPECT_FALSE(mixxx::isValidCacheKey( | ||
mixxx::cacheKeyFromMessageDigest(QByteArray()))); | ||
EXPECT_FALSE(mixxx::isValidCacheKey( | ||
mixxx::cacheKeyFromMessageDigest(""))); | ||
EXPECT_FALSE(mixxx::isValidCacheKey( | ||
mixxx::cacheKeyFromMessageDigest("\0"))); | ||
} | ||
|
||
TEST_F(CacheTest, ValidCacheKey) { | ||
EXPECT_TRUE(mixxx::isValidCacheKey( | ||
mixxx::cacheKeyFromMessageDigest(QByteArrayLiteral("test")))); | ||
} | ||
|
||
TEST_F(CacheTest, ValidCacheKeySingleZeroAscii) { | ||
EXPECT_TRUE(mixxx::isValidCacheKey( | ||
mixxx::cacheKeyFromMessageDigest(QByteArray("0")))); | ||
} | ||
|
||
TEST_F(CacheTest, ValidCacheKeySingleZeroByte) { | ||
EXPECT_TRUE(mixxx::isValidCacheKey( | ||
mixxx::cacheKeyFromMessageDigest(QByteArray("\0", 1)))); | ||
} | ||
|
||
} // anonymous namespace |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you made a performance check for toUtf8() compared to use the QString bytes?
I can Imagine that the later is faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raw bytes of QString are not platform-independent when considering endianess, but UTF-8 is. Does this matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no function to obtain the raw bytes of a QString. Using toLatin1() may lose characters and toLocal8Bit() is platform dependent while the database is portable. Please clarify which option you consider here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was just the idea to check if something like
hasher.addData(reinterprete_cast<char*>(currentFile.utf16()))
is faster or not.
This way there is the double amount to hash but no utf8 conversion.
I don't care about endianes, because this would be only apply if the DB is on an external HD, accessing the same OS with different CPU architectures.
Thinking of this a independent solution is not bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that the additional allocations really matter here and won't invest hours to figure out how to do a realistic performance test to prove this. The optimized solution would also produce non-portable data, in contrast to all other data in the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double the data should not be an issue. sha256 runtime not increase 1:1 with the amount of input data. I'm on my phone, but you should see the same effect on desktop pcs:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual number of bytes that we feed into QCryptographicHash should not be an issue. But QString might need to allocate a temporary QByteArray for the UTF-8 representation and the conversion itself also takes some time.
Yes, utf16() would be faster, but it is non-portable. If someone is able to proof that it is much faster and worth the drawback, please go ahead ;) I will not do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curious I did the test. Here are the results:
Conclusion:
As expected there is only performance gain for ASCII characters. Except in non Latin charter sets two byte utf8 characters are a minority.
For a typical file name of 30 charters, the unicode() version is way faster. If we append all strings first, the utf8 version catches up at one point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those numbers also need to be put in relation to the file system access. I guess that the differences on the overall performance are negligible. That's what I had in mind with "realistic performance test".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes of cause. I was just interested if we have a low hanging fruit here.