Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pull-based Ingestion] Introduce the new pull-based ingestion engine, APIs, and Kafka plugin #16958

Merged
merged 40 commits into from
Jan 29, 2025

Conversation

yupeng9
Copy link
Contributor

@yupeng9 yupeng9 commented Jan 6, 2025

Description

This PR implements the basics of the pull-based ingestion described in this RFC, including:

  1. The APIs for the pull-based ingestion source
  2. A Kafka plugin that implements the ingestion source API
  3. A new IngestionEngine that pulls data from the ingestion sources

Currently WIP, and there are a few improvements to make and test coverage to increase

Related Issues

Resolves #16927 #16929 #16928

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing labels Jan 6, 2025
Copy link
Contributor

github-actions bot commented Jan 6, 2025

❌ Gradle check result for 16dd9d0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how would the FGAC security model work, espl with security plugin which intercepts transport actions to validate if authorised users can perform bulk actions on certain indices. Is the intent to handle permissions at a Kafka "partition level"
Another aspect is maintaining Kafka checkpoints durably, I'm yet to read that part but would be good to understand how are we handling fail overs and recoveries

Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
…cessing

Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
Signed-off-by: Yupeng Fu <yupeng@uber.com>
@andrross
Copy link
Member

@mch2 @msfroh @yupeng9 I just pushed a rebase to pull in the commit that fixes the failure related to spotless in the precommit check. Hopefully everything should pass now...

Copy link
Contributor

❕ Gradle check result for acb627e: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.MinimumClusterManagerNodesIT.testThreeNodesNoClusterManagerBlock

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@mch2 mch2 merged commit a1a1596 into opensearch-project:main Jan 29, 2025
29 of 30 checks passed
@getsaurabh02 getsaurabh02 added Roadmap:Cost/Performance/Scale Project-wide roadmap label v3.0.0 Issues and PRs related to version 3.0.0 labels Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing Roadmap:Cost/Performance/Scale Project-wide roadmap label v3.0.0 Issues and PRs related to version 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Pull-based ingestion source APIs
8 participants