-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support FlintTable batch write #1653
Support FlintTable batch write #1653
Conversation
Signed-off-by: Peng Huo <penghuo@gmail.com>
Signed-off-by: Peng Huo <penghuo@gmail.com>
Codecov Report
@@ Coverage Diff @@
## feature/flint #1653 +/- ##
================================================
Coverage 97.19% 97.19%
Complexity 4107 4107
================================================
Files 371 371
Lines 10464 10464
Branches 706 706
================================================
Hits 10170 10170
Misses 287 287
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. |
/** | ||
* copy from spark {@link JacksonGenerator}. | ||
*/ | ||
case class FlintJacksonGenerator(dataType: DataType, writer: Writer, options: JSONOptions) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reviewer:
This class is copy from SPARK JacksonGenerator. I did not find easy way to directly use it. You can only review the function i defined
def writeAction(action: String, idOrdinal: Option[Int], row: InternalRow): Unit = {}
flint/flint-core/src/main/scala/org/opensearch/flint/core/storage/OpenSearchWriter.java
Show resolved
Hide resolved
Signed-off-by: Peng Huo <penghuo@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes!
Description
Implementation Detail
FlintTable
is capable of batch write operations in overwrite mode. This interacts with theFlintOpenSearchClient
within the FlintCore package. During this process, we utilize theCREATE
action within the OpenSearch bulk request. Users have the capability to provide an ID field within their options. If no ID is provided, OpenSearch will generate one automatically. When writing to FlintCore, the following conditions are checked:If a document with the same ID already exists, the system will skip this entry and do nothing.
If no document with the same ID is found, the system will index the new document.
Why not use INDEX action
The INDEX action will delete doc with same id, and index new doc. In case Luncene does not really delete the doc, the storage size is doubled.
Usage Example
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.