-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make crypto functions execute in the threadpool asynchronously #678
Comments
I dont think so. Not like zlib, apis in crypto module are commonly in O(n) or O(nlog n) time. |
cc @indutny? |
I don't think that there will be any performance benefit for AES/Hashing and stuff like that. Maybe DiffieHellman, but it does not have stream API. |
Is there a good case for keeping this request open? |
Yeah, for DH. |
I would like to second this, but for symmetric ciphers (createCipher, createCipheriv, createDecipher, createDeciperiv). It would really be great to have a way to do cipher and decipher ops off the main thread, not for any performance benefit, but so as not to block the event loop. I know a single cipher op over 64KB takes around 0.2ms (AES-256-GCM), but for a server doing crypto at rest on an SSD this could mean that 5000-10000 cipher ops a second would quickly burn through the entire main thread budget. Being able to do cipher and decipher ops in the threadpool would make it easier to get more throughput, without the event loop blocking, and with the event loop being the control plane and not the data plane. |
I created a module, @ronomon/crypto-async, to test out the idea of doing cipher, hash and hmac operations in the threadpool. Latency per operation is only slightly affected (or not at all) through interaction with the thread pool, but the throughput gains are up to 3x. Best of all, the event loop is not blocked. The benchmark is here. |
@jorangreef Looks interesting and the numbers are impressive but I'm curious how it holds up in real-world scenarios. Since it uses thread pool, it's going to suffer from head-of-line blocking if there are slow DNS or file operations queued up. |
Thanks @bnoordhuis, would increasing UV_THREADPOOL_SIZE to 64 or 128 help? I think UV_THREADPOOL_SIZE should usually be more than the number of CPU cores because most of the threads will be waiting and will not be hot on the CPU? |
That should help when most threads are blocked on I/O but latency will probably go through the roof when you have all 64 or 128 threads doing encryption. With computationally bound workloads you normally don't want more than N or N-1 threads (where N = number of cores) because otherwise you pay too much in scheduling overhead. |
Yes, with the benchmark I set concurrency to 1/2 number of available cores to keep latency within reasonable bounds: https://github.com/ronomon/crypto-async/blob/master/benchmark.js#L6 |
I was surprised that latency increased already when using N, and N-1 threads (where N = number of cores). I thought it would only increase from N-1. Is there some contention in the threadpool that can be optimized? |
|
I ran
|
--prof only profiles the main thread. |
I ran How can one get the page fault rate from the raw output? |
You can collect it with |
Just an update, after more tests and benchmarks, there may be a case for not limiting crypto in Node to 1 core by design. It would probably be good if Node's crypto could scale with the number of cores available. For example, with @ronomon/crypto-async on 4 cores (and with 1 MB buffers), you can do AES-256-CTR encryption at 3050.40 MB/s vs 945.20 MB/s for Node restricted to 1 core. For buffers larger than 1024 bytes, it may make more throughput sense to do the crypto asynchronously, off the event loop, and there are ways to solve threadpool competition with dns and fs operations. Here are some notes around controlling concurrency and adjusting threadpool size. |
Should this remain open? |
Yes, this should definitely remain open. There are huge (2x-3x) real-world throughput gains to be had for cipher, hash, and hmac operations on large (1MB+) buffers, scaling linearly with the number of cores, see crypto-async:
There are some tuning considerations, but it's logical, since you can start leveraging more CPU cores for your crypto, not just a single core. A single core just can't keep up with doing crypto for high throughput network links or non-volatile memory devices these days. Node's threadpool does have issues with conflating IO tasks (which are run cold) with CPU tasks (which are run hot), but that does not mean async crypto is not worth pursuing, it just means the threadpool needs some work. |
@Trott what's the status on this? Are we working on it? |
@ryzokuken I don’t think anybody is working on that. The things that are discussed here, about some separation in the thread pool between I/O- and CPU-bound work might be a good idea, but essentially it’s a different topic, and if you feel comfortable tackling this particular issue, go for it! |
@addaleax let me give this a try. |
There's some discussion on the API in this libuv PR. On my backburner... |
Fixes: nodejs#678 Refs: nodejs#26854 Signed-off-by: James M Snell <jasnell@gmail.com>
Fixes: nodejs#678 Refs: nodejs#26854 Signed-off-by: James M Snell <jasnell@gmail.com> PR-URL: nodejs#35093 Reviewed-By: Matteo Collina <matteo.collina@gmail.com> Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Michaël Zasso <targos@protonmail.com> Reviewed-By: Rich Trott <rtrott@gmail.com>
For anyone who's still following this issue, Node.js now support Web Crypto API which is asynchronous. See also here |
i would expect:
.update()
and.digest()
methodsreadable
event)btw i have no idea how much performance benefit this would provide, if any. i just like the idea of having everything executing in the threadpool...
nodejs/node-v0.x-archive#4298
The text was updated successfully, but these errors were encountered: