Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement method to bulk add all collection elements to a PriorityQueue #770

Merged
merged 2 commits into from
Mar 30, 2022

Conversation

BaurzhanSakhariev
Copy link
Contributor

Hi!

This PR doesn't have a link to a Jira Issue as according to contribution guidelines I can either create a patch and attach it to an issue on Jira or open a pull request - let me know if I misunderstood things and have to create a Jira issue anyway.

Description

PriorityQueue class doesn't have a constructor accepting array/collection and I created one for the cases, when heap elements are known in advance to save some time on queue creation: O(N) (heapify) vs Nlog(N) (calling add() in loop).

Solution

JDK PriorityQueue has a constructor, accepting Collection and it internally uses heapify - this PR is basically adaptation of this method (index is not 0 but 1 based and no need to check comparator presence).

Tests

I added a test creating a queue filled by random elements with the new method and checking its validity. Also added a potential usage (just took randomly first usage of add) and it didn't cause any failures.

Tried to run benchmarks using https://github.com/mikemccand/luceneutil but
got an error

src/python/competition.py", line 302
    raise ValueError(f'mode must be "cpu" or "heap" but got: {mode}')

may I ask is there any detailed guide on running Lucene benchmarks?

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Lucene maintainers access to contribute to my PR branch. (optional but recommended)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.

@dweiss
Copy link
Contributor

dweiss commented Mar 27, 2022

Maybe add a method called addAll(Collection) which would be smart enough to detect when it can use this accelerated addition instead of element-by-element addition? I'm not sure an additional constructor is needed here.

@BaurzhanSakhariev
Copy link
Contributor Author

Maybe add a method called addAll(Collection) which would be smart enough to detect when it can use this accelerated addition instead of element-by-element addition? I'm not sure an additional constructor is needed here.

Good point with Collection, thanks. No need to convert list to array if we don't use it directly anyway (because of 1-index shift) - can rather copy from collection.

It's always possible to build a heap if elements are provided and comparable - in JDK they check what kind of collection is provided and take comparator from it - but here we have lessThan. My understanding is that user decides whether it's possible to use faster addition - one of use cases when add() are not interrupted by pop() or any other structure changing operations, i.e when we build first and only then do something.

I added a constructor as other option would be to use existing ones and then do smth like addAll(items) - but I wanted to avoid usage of default constructor, filling it with sentinel null objects. But I actually found an issue in my constructor - need to add maxSize >= ArrayUtil.MAX_ARRAY_LENGTH check there, thanks!

Would it it be fine if I keep the constructor but adjust it to copy directly from Collection and add a size check?

@dweiss
Copy link
Contributor

dweiss commented Mar 27, 2022

I wanted to avoid usage of default constructor, filling it with sentinel null objects

In all honesty, this is probably not something you can ever, ever notice when you compare the cost of building the heap with clearing a chunk of memory. And it makes the API simpler in my opinion. But sure, you can keep the constructor.

@BaurzhanSakhariev BaurzhanSakhariev force-pushed the baur/constructor-with-array branch from 2c744a6 to 815e2ee Compare March 27, 2022 21:24
@BaurzhanSakhariev
Copy link
Contributor Author

Fair point, I changed it to addAll. Also added a case to add into partially filled queue - just realized that you probably meant this use case and I misunderstood it.

@BaurzhanSakhariev BaurzhanSakhariev force-pushed the baur/constructor-with-array branch from 815e2ee to 47a2c13 Compare March 27, 2022 21:28
@BaurzhanSakhariev BaurzhanSakhariev changed the title Add PriorityQueue constructor accepting array as argument Implement method to add all collection elements to a PriorityQueue Mar 27, 2022
@BaurzhanSakhariev BaurzhanSakhariev force-pushed the baur/constructor-with-array branch from 73abca1 to be0ca3e Compare March 28, 2022 21:47
@BaurzhanSakhariev BaurzhanSakhariev changed the title Implement method to add all collection elements to a PriorityQueue Implement method to bulk add all collection elements to a PriorityQueue Mar 29, 2022
@BaurzhanSakhariev BaurzhanSakhariev force-pushed the baur/constructor-with-array branch from ba1f019 to d3d7d5b Compare March 29, 2022 21:13
Copy link
Contributor

@dweiss dweiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you! Let's wait for checks to pass. Please add CHANGES.txt entry under the corresponding version (main or 9x, if you'd like this backported).

@BaurzhanSakhariev BaurzhanSakhariev force-pushed the baur/constructor-with-array branch from b8ad734 to ae2caca Compare March 30, 2022 07:48
@BaurzhanSakhariev BaurzhanSakhariev force-pushed the baur/constructor-with-array branch from ae2caca to f390314 Compare March 30, 2022 07:58
… from IllegalArgumentException to ArrayIndexOutOfBoundsException for consistency with add().
@dweiss
Copy link
Contributor

dweiss commented Mar 30, 2022

I've added a proper issue number (https://issues.apache.org/jira/browse/LUCENE-10494) and tweaked minor details of the implementation (exception types thrown in addAll and add were different). I think it's good to go.

@dweiss dweiss merged commit 69b040f into apache:main Mar 30, 2022
@dweiss
Copy link
Contributor

dweiss commented Mar 30, 2022

Thank you!

asfgit pushed a commit that referenced this pull request Mar 30, 2022
…ue (#770)

Implement method to add all collection elements to a PriorityQueue
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
@BaurzhanSakhariev
Copy link
Contributor Author

Thanks!

wjp719 added a commit to wjp719/lucene that referenced this pull request Apr 5, 2022
* main:
  LUCENE-10456: Implement Weight#count for MultiRangeQuery (apache#731)
  LUCENE-10484: Move CHANGES entry to 9.2.
  LUCENE-10484: Add support for concurrent facets random sampling (apache#765)
  LUCENE-10002: Replaces usages of FacetsCollector with FacetsCollectorManager (apache#764)
  LUCENE-10466: Move changelog entry to 9.2
  LUCENE-10466: Ensure IndexSortSortedNumericDocValuesRangeQuery handles sort types besides LONG
  LUCENE-10498: don't count num hits nor accumulate scores unless necessary (apache#782)
  LUCENE-10184: mention of opening a Jira issue (apache#781)
  Update link to contribution guide
  Remove redundant index (apache#776)
  Implement method to bulk add all collection elements to a PriorityQueue (apache#770)
  LUCENE-10491: Fix correctness bug in TaxonomyFacetSumValueSource score providing (apache#775)
  Replace usages of search(Query, Collector) in CheckHits (apache#763)
  LUCENE-10002: Add FixedBitSetCollector and corresponding collector manager to test framework (apache#766)
  LUCENE-10475: Merge o.a.l.a.[ja|ko].util into o.a.l.a.[ja|ko].dict (apache#772)
  LUCENE-10184: add CONTRIBUTING.md; reorganize README. (apache#771)
  Add CHANGES entry for LUCENE-10325
  LUCENE-10325: Add getTopDims functionality to Facets (apache#747)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants