-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leverage accelerated vector hardware instructions in Vector Search #96370
Comments
Pinging @elastic/es-search (Team:Search) |
Sample microbenchmark results here to give a sense of the potential performance impact from this change. Model name: 11th Gen Intel(R) Core(TM) i9-11900F @ 2.50GHz : AVX-512
|
Thanks @ChrisHegarty for the great work and for opening this issue. I think the first step is to upgrade Elasticsearch to a Lucene 9.7 snapshot. I will open a PR. |
I need to mention: At moment this ONLY works with Java 20. Support for Java 21 is coming a bit later when Java 21 goes into RC phase and all APIs are finalized (this is the same for the new memory segment MMapDirectory support). We won't backport to Java 19. It does not hurt to enable the module also on other java versions, but it should be best be done conditionally. As with The startup warning about the incubator module will be printed to stderr and is unrelated to Lucene and cannot be disabled. |
In addition when doing performance tests, use large indexes and many queries and let benchmarks run with appropiate warmup for longer time. The vector features take a bit longer until they enable the optimizations in the JVM. The first few queries will be way slower than with the legacy Lucene vector implementation (we measured 40 times slower dotProduct until the optimizations kick in and then it gets up to 2-5 times faster (than the classical lucene code) depending on hardware capabilities and vector size). Also keep in mind to update the documentation that there should be vector sizes fitting the hardware infrastructure. Vector dimensions of 100 are bad, 128 is perfect! We have some formulas depending on the hardware. As a good default: For AVX2 CPUs with 256bit vector support, the calculation says (4 bytes/floatvector, 4 lanes to be executed in parallel) means dimension of float vectors should be multiples of 32 (so 32, 64, 96, 128,.... are good sizes). With AVX3 it is multiples of 64. For byte/binary vectors it is a bit different. So possibly tell users to have vector dimensions that are multiples of 64 to catch all variants. |
Yeah, and this is fine. Elasticsearch currently bundles JDK 20.0.1 - and will bundle 20.0.2 when it comes out. You can set your own JDK, but ... too bad if you set it something older! ;-)
Yeah, we should certainly do this soon, as you say. 21 RDP 1 is early June, and we wanna make sure that all this works well.
Good point. I had overlooked this.
The log messages for MMAap do end up in the ES logs. I assume it will be the same for Vector, but I'll need to double check. I think that our usage of JVM flags is ok, but this could be worth checking, especially in different test environments.
I am sorry that this is not suppressible - yes, it is my fault, I added it to the JDK. In general, I think that these kinda warnings should be suppressible from the command line. But that is another discussion, best left for another day. |
A note on vector dimension sizes. There are some dimensions that perform better than others, but of course all dimension sizes behave correctly. E.g. 1536 will perform better than 1535. The Lucene VectorUtilPanamaProvider (where the JDK Vector API is leveraged) has support for 128, 256, and 512 preferred bit sizes. For best alignment on hardware that supports:
Given this, vector dimensions that are multiples of the number of per element_type packed values, for the given hardware preferred bit size, perform best. We've not tried to over optimise for "downsizing" on tail cases, rather preferring to keep the code simple. The logs contain a message noting the preferred bit size that is in use for the hardware that we're running on. E.g. running on my mac I can see the following in the Elasticsearch logs:
|
A note on existing / current Elasticsearch Vector Search benchmarks.
|
The ES Nightly benchmark updates the jvm.options file, it needed a change to enable the Vector API, see elastic/rally-teams#79 |
@ChrisHegarty Impressive results! |
…6617) Lucene has integrated hardware accelerated vector calculations. Meaning, calculations like `dot_product` can be much faster when using the Lucene defined functions. When a `dense_vector` is indexed, we already support this. However, when `index: false` we store float vectors as binary fields in Lucene and decode them ourselves. Meaning, we don't use the underlying Lucene structures or functions. To take advantage of the large performance boost, this PR refactors the binary vector values in the following way: - Eagerly decode the binary blobs when iterated - Call the Lucene defined VectorUtil functions when possible related to: #96370
@ChrisHegarty this is completed, right? Can we close this issue? |
The upcoming Lucene 9.7.0 release has support for SIMD vectorized implementations of the low-level primitives used by Vector Search. The vectorized implementations use the currently Incubating Panama Vector API, see apache/lucene#12311 apache/lucene#12327. The Lucene changelog notes say it all [1]
We should evaluate the impact of enabling this in Elasticsearch. Specifically,
Refactor similar usages in Elasticsearch to use the Lucene VectorUtil functions - since they are much faster, e.g.
DenseVectorFieldMapper.java
can useVectorUtil::dotProduct
(rather than its own slower scalar implementation). 96617Merge Lucene 9.7.0 without enabling the new Panamaized vectorized implementations. We want to validate and baseline the upgrade to 9.7.0 independently of this change. Allow, say 24+ hours, to get at least one nightly benchmark run.
Add
--add-modules jdk.incubator.vector
to the Elasticsearch startup - this will enable the faster Lucene VectorUtil implementation. https://github.com/elastic/elasticsearch/blob/main/distribution/tools/server-cli/src/main/java/org/elasticsearch/server/cli/ServerProcess.java#L222. This will raise a warning at startup, document that this is ok (similar to the security manager warning - yes, we know, it is ok! )[1] GITHUB#12302, GITHUB#12311: Add vectorized implementations of VectorUtil.dotProduct(), squareDistance(), cosine() with Java 20 jdk.incubator.vector APIs. Applications started with command line parameter "java --add-modules jdk.incubator.vector" on exactly Java 20 will automatically use the new vectorized implementations if running on a supported platform (x86 AVX2 or later, ARM SVE or later). This is an opt-in feature and requires explicit Java command line flag! When enabled, Lucene logs a notice using java.util.logging. Please test thoroughly and report bugs/slowness to Lucene's mailing list.
The text was updated successfully, but these errors were encountered: