Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce long[] array allocation for bitset in readBitSetIterator #13828

Merged
merged 6 commits into from
Oct 8, 2024

Conversation

easyice
Copy link
Contributor

@easyice easyice commented Sep 26, 2024

I got a memory allocation flamegraph from my cluster, It shows the DocIdsWriter#readBitSetIterator allocating about ~14GB RAM in 30 seconds, this proposal uses a scratchLongs[] array instead of allocating a new array.

image
mem.html.zip

When using some mock data that let DocIdsWriter#readBitSetIterator allocate a long[] array of size 96, the JMH output shows:

main:

Benchmark                                Mode  Cnt        Score     Error   Units
BKDWithBitMap.search                     avgt    5       65.001 ±   0.579   us/op
BKDWithBitMap.search:gc.alloc.rate       avgt    5    18626.467 ± 166.049  MB/sec
BKDWithBitMap.search:gc.alloc.rate.norm  avgt    5  1269568.090 ±   0.001    B/op
BKDWithBitMap.search:gc.count            avgt    5      423.000            counts
BKDWithBitMap.search:gc.time             avgt    5      218.000                ms

pr:

Benchmark                                Mode  Cnt       Score     Error   Units
BKDWithBitMap.search                     avgt    5      63.622 ±   0.784   us/op
BKDWithBitMap.search:gc.alloc.rate       avgt    5    9582.419 ± 118.030  MB/sec
BKDWithBitMap.search:gc.alloc.rate.norm  avgt    5  639272.088 ±   0.001    B/op
BKDWithBitMap.search:gc.count            avgt    5     209.000            counts
BKDWithBitMap.search:gc.time             avgt    5     111.000                ms
JMH Code
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@State(Scope.Benchmark)
@Warmup(iterations = 3, time = 3)
@Measurement(iterations = 5, time = 5)
@Fork(
    value = 1,
    jvmArgsPrepend = {"--add-modules=jdk.unsupported"})
public class BKDWithBitMap {
  IndexReader reader;
  IndexSearcher searcher;
  Query q = LongPoint.newSetQuery("f", 0);

  @Setup
  public void setup() throws IOException {
    Path path = Files.createTempDirectory("points");
    Directory dir = MMapDirectory.open(path);

    try (IndexWriter w =
        new IndexWriter(
            dir,
            new IndexWriterConfig()
                .setMaxBufferedDocs(IndexWriterConfig.DISABLE_AUTO_FLUSH)
                .setRAMBufferSizeMB(256.0))) {
      int actualIndexed = 0;
      for (int i = 0; i < 5_000_000; ++i) {
        Document doc = new Document();
        doc.add(new StringField("id", Long.toString(i), Field.Store.NO));
        if (i % 12 == 0) {
          doc.add(new LongPoint("f", 0));
        }
        w.addDocument(doc);
        if (actualIndexed++ % 1_000_000 == 0) {
          System.out.println("Indexed: " + actualIndexed);
        }
      }
      w.commit();
      w.forceMerge(1);
      w.commit();
    }

    reader = DirectoryReader.open(dir);
    searcher = new IndexSearcher(reader);
    searcher.setQueryCache(null);
  }

  @Benchmark
  public void search(Blackhole bh) throws IOException {
    Weight weight = searcher.createWeight(searcher.rewrite(q), ScoreMode.COMPLETE_NO_SCORES, 1f);
    ScorerSupplier ss = weight.scorerSupplier(reader.leaves().get(0));
    if (ss != null) {
      Scorer scorer = ss.get(Long.MAX_VALUE);
      bh.consume(scorer);
    }
  }
}

Copy link
Contributor

@gf2121 gf2121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments, otherwise LGTM. Thanks @easyice !

@@ -36,6 +38,7 @@ final class DocIdsWriter {
private static final byte LEGACY_DELTA_VINT = (byte) 0;

private final int[] scratch;
private long[] scratchLongs = new long[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use LongsRef.EMPTY_LONGS here.

long[] bits = new long[longLen];
in.readLongs(bits, 0, longLen);
FixedBitSet bitSet = new FixedBitSet(bits, longLen << 6);
if (longLen > scratchLongs.length) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this comparison when growNoCopy check this as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s an interesting question, currently we have some places that perform the comparison before growNoCopy and some that don't. In #13171, I found a slight performance impact regarding the extra assignment like scratchLongs = scratchLongs. But indeed, keeping things simple is more important.

}
in.readLongs(scratchLongs, 0, longLen);
// make ghost bits clear
Arrays.fill(scratchLongs, longLen, scratchLongs.length, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make scratchLongs a LongsRef to know how many words we used last time, so that we can avoid this clear when unnecessary?

Copy link
Contributor Author

@easyice easyice Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. If the current longLen is greater than the previous value, we don’t need to perform the clear.

Copy link
Contributor

@gf2121 gf2121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @easyice !

@easyice easyice merged commit 50bf845 into apache:main Oct 8, 2024
3 checks passed
expani added a commit to expani/lucene that referenced this pull request Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants