Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage accelerated vector hardware instructions in Vector Search #96370

Closed
8 tasks done
ChrisHegarty opened this issue May 26, 2023 · 12 comments
Closed
8 tasks done

Leverage accelerated vector hardware instructions in Vector Search #96370

ChrisHegarty opened this issue May 26, 2023 · 12 comments
Assignees
Labels
>feature release highlight :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@ChrisHegarty
Copy link
Contributor

ChrisHegarty commented May 26, 2023

The upcoming Lucene 9.7.0 release has support for SIMD vectorized implementations of the low-level primitives used by Vector Search. The vectorized implementations use the currently Incubating Panama Vector API, see apache/lucene#12311 apache/lucene#12327. The Lucene changelog notes say it all [1]

We should evaluate the impact of enabling this in Elasticsearch. Specifically,

[1] GITHUB#12302, GITHUB#12311: Add vectorized implementations of VectorUtil.dotProduct(), squareDistance(), cosine() with Java 20 jdk.incubator.vector APIs. Applications started with command line parameter "java --add-modules jdk.incubator.vector" on exactly Java 20 will automatically use the new vectorized implementations if running on a supported platform (x86 AVX2 or later, ARM SVE or later). This is an opt-in feature and requires explicit Java command line flag! When enabled, Lucene logs a notice using java.util.logging. Please test thoroughly and report bugs/slowness to Lucene's mailing list.

@ChrisHegarty ChrisHegarty added :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team labels May 26, 2023
@ChrisHegarty ChrisHegarty self-assigned this May 26, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@ChrisHegarty
Copy link
Contributor Author

ChrisHegarty commented May 26, 2023

Sample microbenchmark results here to give a sense of the potential performance impact from this change.

Model name: 11th Gen Intel(R) Core(TM) i9-11900F @ 2.50GHz : AVX-512

java -jar target/vectorbench.jar -p size=1024
...
Benchmark                                (size)   Mode  Cnt   Score   Error   Units
BinaryCosineBenchmark.cosineDistanceNew    1024  thrpt    5  10.637 ± 0.068  ops/us
BinaryCosineBenchmark.cosineDistanceOld    1024  thrpt    5   1.115 ± 0.008  ops/us
BinaryDotProductBenchmark.dotProductNew    1024  thrpt    5  22.050 ± 0.007  ops/us
BinaryDotProductBenchmark.dotProductOld    1024  thrpt    5   3.349 ± 0.041  ops/us
BinarySquareBenchmark.squareDistanceNew    1024  thrpt    5  16.215 ± 0.129  ops/us
BinarySquareBenchmark.squareDistanceOld    1024  thrpt    5   2.479 ± 0.032  ops/us
FloatCosineBenchmark.cosineNew             1024  thrpt    5   9.394 ± 0.048  ops/us
FloatCosineBenchmark.cosineOld             1024  thrpt    5   0.750 ± 0.002  ops/us
FloatDotProductBenchmark.dotProductNew     1024  thrpt    5  25.657 ± 2.105  ops/us
FloatDotProductBenchmark.dotProductOld     1024  thrpt    5   3.320 ± 0.079  ops/us
FloatSquareBenchmark.squareNew             1024  thrpt    5  19.437 ± 0.122  ops/us
FloatSquareBenchmark.squareOld             1024  thrpt    5   2.355 ± 0.003  ops/us

@javanna
Copy link
Member

javanna commented May 26, 2023

Thanks @ChrisHegarty for the great work and for opening this issue. I think the first step is to upgrade Elasticsearch to a Lucene 9.7 snapshot. I will open a PR.

@uschindler
Copy link
Contributor

uschindler commented May 26, 2023

I need to mention: At moment this ONLY works with Java 20. Support for Java 21 is coming a bit later when Java 21 goes into RC phase and all APIs are finalized (this is the same for the new memory segment MMapDirectory support).

We won't backport to Java 19. It does not hurt to enable the module also on other java versions, but it should be best be done conditionally.

As with MMapDirectory the code uses java.util.logging to report status and warnings. Those warnings should be fed to normal Elasticsearch logs with a wrapper (@ChrisHegarty: did you add the glue code log4j-jul?) to give users a feedback that all is sane - don't hide the messages! There are also some combinations of JVM settigs that autoatically disable the vector support (like missing or disabled AVX2 on x86) or non-tiered compilation (e.g., C2 disabled).

The startup warning about the incubator module will be printed to stderr and is unrelated to Lucene and cannot be disabled.

@uschindler
Copy link
Contributor

uschindler commented May 26, 2023

In addition when doing performance tests, use large indexes and many queries and let benchmarks run with appropiate warmup for longer time. The vector features take a bit longer until they enable the optimizations in the JVM. The first few queries will be way slower than with the legacy Lucene vector implementation (we measured 40 times slower dotProduct until the optimizations kick in and then it gets up to 2-5 times faster (than the classical lucene code) depending on hardware capabilities and vector size).

Also keep in mind to update the documentation that there should be vector sizes fitting the hardware infrastructure. Vector dimensions of 100 are bad, 128 is perfect!

We have some formulas depending on the hardware. As a good default: For AVX2 CPUs with 256bit vector support, the calculation says (4 bytes/floatvector, 4 lanes to be executed in parallel) means dimension of float vectors should be multiples of 32 (so 32, 64, 96, 128,.... are good sizes). With AVX3 it is multiples of 64. For byte/binary vectors it is a bit different. So possibly tell users to have vector dimensions that are multiples of 64 to catch all variants.

@ChrisHegarty
Copy link
Contributor Author

I need to mention: At moment this ONLY works with Java 20.

Yeah, and this is fine. Elasticsearch currently bundles JDK 20.0.1 - and will bundle 20.0.2 when it comes out. You can set your own JDK, but ... too bad if you set it something older! ;-)

Support for Java 21 is coming a bit later when Java 21 goes into RC phase and all APIs are finalized (this is the same for the new memory segment MMapDirectory support).

Yeah, we should certainly do this soon, as you say. 21 RDP 1 is early June, and we wanna make sure that all this works well.

We won't backport to Java 19. It does not hurt to enable the module also on other java versions, but it should be best be done conditionally.

Good point. I had overlooked this.

As with MMapDirectory the code uses java.util.logging to report status and warnings. Those warnings should be fed to normal Elasticsearch logs with a wrapper (@ChrisHegarty: did you add the glue code log4j-jul?) to give users a feedback that all is sane - don't hide the messages! There are also some combinations of JVM settigs that autoatically disable the vector support (like missing or disabled AVX2 on x86) or non-tiered compilation (e.g., C2 disabled).

The log messages for MMAap do end up in the ES logs. I assume it will be the same for Vector, but I'll need to double check. I think that our usage of JVM flags is ok, but this could be worth checking, especially in different test environments.

The startup warning about the incubator module will be printed to stderr and is unrelated to Lucene and cannot be disabled.

I am sorry that this is not suppressible - yes, it is my fault, I added it to the JDK. In general, I think that these kinda warnings should be suppressible from the command line. But that is another discussion, best left for another day.

@ChrisHegarty
Copy link
Contributor Author

ChrisHegarty commented Jun 2, 2023

A note on vector dimension sizes.

There are some dimensions that perform better than others, but of course all dimension sizes behave correctly. E.g. 1536 will perform better than 1535.

The Lucene VectorUtilPanamaProvider (where the JDK Vector API is leveraged) has support for 128, 256, and 512 preferred bit sizes. For best alignment on hardware that supports:

  • 128 bit sizes (e.g. Mac m1, NEON), we pack 4 individual values of element_type float or 16 of element_type byte;
  • 256 bit sizes (e.g. Sky Lake, AVX 2), we pack 8 individual values of element_type float or 32 of element_type byte;
  • 512 bit sizes (e.g. Rocket Lake, AVX 512), we pack 16 individual values of element_type float or 64 of element_type byte.

Given this, vector dimensions that are multiples of the number of per element_type packed values, for the given hardware preferred bit size, perform best. We've not tried to over optimise for "downsizing" on tail cases, rather preferring to keep the code simple.

The logs contain a message noting the preferred bit size that is in use for the hardware that we're running on. E.g. running on my mac I can see the following in the Elasticsearch logs:

[..][WARN ][stderr ..] [y..] Jun 01, 2023 4:01:26 PM org.apache.lucene.util.VectorUtilPanamaProvider <init>
[..][WARN ][stderr ..] [y..] INFO: Java vector incubator API enabled; uses preferredBitSize=128

@ChrisHegarty
Copy link
Contributor Author

ChrisHegarty commented Jun 2, 2023

A note on existing / current Elasticsearch Vector Search benchmarks.

  1. The SO Vector benchmark track tests vector search performance with vectors (of element_type float) with 768 dimensions.
 "titleVector": {
    "type": "dense_vector",
    "dims" : 768,
    "index" : true,
    "similarity": "dot_product"
  }
  1. The Dense Vector benchmark track tests vector search performance with vectors (of element_type float) with 96 dimensions.
 "vector": {
   "type": "dense_vector",
   "dims" : 96,
   "index" : true,
   "similarity": "dot_product",
   "index_options": {
     "type": "hnsw",
     "m": 32,
     "ef_construction": 100
   }
 }

@ChrisHegarty
Copy link
Contributor Author

The ES Nightly benchmark updates the jvm.options file, it needed a change to enable the Vector API, see elastic/rally-teams#79

@ChrisHegarty
Copy link
Contributor Author

ChrisHegarty commented Jun 6, 2023

(This benchmark was run on an N2 Ice Lake VM instance on GCP)

Affects on the SO Vector benchmark:

  • indexing-throughput improved by 30%
  • merge-time decreased by 40%
  • script-score-query-java-latency improved by 40%

Screenshot 2023-06-06 at 09 28 54

Screenshot 2023-06-06 at 09 30 39

Screenshot 2023-06-06 at 09 31 34

@mayya-sharipova
Copy link
Contributor

@ChrisHegarty Impressive results!

elasticsearchmachine pushed a commit that referenced this issue Jun 8, 2023
…6617)

Lucene has integrated hardware accelerated vector calculations. Meaning,
calculations like `dot_product` can be much faster when using the Lucene
defined functions.

When a `dense_vector` is indexed, we already support this. However, when
`index: false` we store float vectors as binary fields in Lucene and
decode them ourselves. Meaning, we don't use the underlying Lucene
structures or functions.

To take advantage of the large performance boost, this PR refactors the
binary vector values in the following way:

 - Eagerly decode the binary blobs when iterated
 - Call the Lucene defined VectorUtil functions when possible

related to: #96370
@javanna
Copy link
Member

javanna commented Jun 9, 2023

@ChrisHegarty this is completed, right? Can we close this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature release highlight :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

6 participants