-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request - multi-threaded indexing and search #227
Comments
@denis-bogdanas could you benchmark klogg in same conditions? In this fork I've added parallel search and tuned indexing code a bit. Precompiled builds are available on bintray. Also "matches overview" is updated on UI thread, turning it off should affect benchmark results. |
@variar What version? Is klogg-17.12.0.245-setup.exe ok? I would rater not mess up with compiling from sources. |
Either klogg-17.12.0.245-setup.exe or klogg-17.12.0.245-portable.zip. Portable version does not require any installation, just unpack the zip archive. These precompiled binaries are only for 64 bit systems. |
Results for klogg: Searching word "aziza": 264-286 MB/s, average ~270. Summary: 9% slower indexing, 30% faster searching than glogg, but still no signs of parallelism. |
Thanks for the feedback. Several things come to mind:
I'll try to hack some benchmarking code and make PR at least for qfile replacement. |
I'm using process explorer to get the numbers btw. Data for EmEditor: Indexing ~425 MB/s, search 309 MB/s, one core used. |
I tried 50K lines buffer for search. It's even a bit slower. From your description looks like reading and processing is done sequentially: you first read some buffer then process it. While processing is done SSD is not used. The way I guess I'd do it is to have one thread that reads buffers into a synchronized queue and several worker threads that get buffers from the queue and process it. And another thread that combines the results. |
I've done some research. Adding posix_fadvise(...,POSIX_FADV_SEQUENTIAL) make file reading actually slower on my pc. I'll test FILE_FLAG_SEQUENTIAL_SCAN on Windows later. Adding separate thread to read data from file and pass it on for indexing improves file loading time. You can try new portable build. Use something like 8-16MiB for file loading buffer in settings (this is now for "readahead" buffer, not the size of chunk being read from disk at a time). Index is still built in one thread, but I'll try to make it parallel. |
Thanks! I'm actually using EmEditor at least until trial expires. But superior performance is something that might attract many other users. Hell, it might be a real-life benchmarking tool for SSD performance. |
FYI, switching from naive loop to std::memchr when searching for line breaks and tabs makes initial file load IO-bound on my 850 EVO. Changes are needed in one loop, I'll try to cherry-pick them to glogg without all my multi-threaded reading/processing experiments. |
Please consider implementing indexing/search on multiple threads, to take advantage of SSD speed. This will greatly speedup operations on 10GB+ files.
My benchmarks. I have a Samsung 850 EVO SSD and both indexing and search are CPU-limited on a single thread. SSD read speed on a large file is ~340 MB/s for indexing and 210 MB/s for search of a short word. Benchmarked sequential read speed of my SSD is 450 MB/s. And there are NVMe SSDs on the market that have read speed on the order of 2-3 GB/s.
EmEditor btw is faster on both operations but still single-threaded CPU limited.
The text was updated successfully, but these errors were encountered: