-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement BloomFilter skipping index building logic #242
Implement BloomFilter skipping index building logic #242
Conversation
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
* Bloom filter interface inspired by [[org.apache.spark.util.sketch.BloomFilter]] but adapts to | ||
* Flint index use and remove unnecessary API. | ||
*/ | ||
public interface BloomFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, Aren't there any opensource library implementations we can leverage.
is this because we need custom serialization to write to opensearch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed as the Javadoc, we're porting Spark's built-in BloomFilter
, BitArray
, Murmur3_x86_32
to our flint-core
library. We only maintain the minimal API as needed and implement them within flint-core
, considering the future possibility below:
- Integration with other query engine: We can implement bloom filter index in other query engine with the flint-core library
- User creates Flint index using our library in ingestion pipeline: We can add BloomFilter field type and user can generate Flint index at ingestion time
Description
This is the first PR for BloomFilter skipping index support. This PR is focus on bloom filter building side and introduced core classes as below. Please read #206 for big picture, including final user experience, design decision, proof of concept and benchmark.
PR Planned
Documentation
Updated user manual: https://github.com/dai-chen/opensearch-spark/blob/add-bloom-filter-building-logic/docs/index.md#feature-highlights
Class Diagram
Issues Resolved
#206
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.