-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add refresh index API for partition skipping index #1658
Add refresh index API for partition skipping index #1658
Conversation
Signed-off-by: Chen Dai <daichen@amazon.com>
Codecov Report
@@ Coverage Diff @@
## feature/flint opensearch-project/sql#1658 +/- ##
================================================
Coverage 97.19% 97.19%
Complexity 4107 4107
================================================
Files 371 371
Lines 10464 10464
Branches 706 706
================================================
Hits 10170 10170
Misses 287 287
Partials 7 7
Flags with carried forward coverage won't be shown. Click here to find out more. |
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
val indexType = (meta \ "kind").extract[String] | ||
val indexedColumns = (meta \ "indexedColumns").asInstanceOf[JArray] | ||
|
||
indexType match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | ||
* Indexed column name and its Spark SQL type. | ||
*/ | ||
val indexedColumn: (String, String) | ||
val columnName: String | ||
val columnType: String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use Spark StructType for columnType if it is spark sql type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I may remove this columnType
field from _meta
and this class after we refactored the Spark SQL type => Flint type mapping
flint/integ-test/src/test/scala/org/opensearch/flint/spark/FlintSparkSkippingIndexSuite.scala
Show resolved
Hide resolved
extends FlintSparkSkippingStrategy { | ||
|
||
override def outputSchema(): Map[String, String] = { | ||
Map(indexedColumn._1 -> convertToFlintType(indexedColumn._2)) | ||
Map(columnName -> convertToFlintType(columnType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does the flintType align need to align with AggregateFunction output type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it needs to align. As comment above, I may need to use AggregateFunction output type if columnType
is unnecessary. I've noted this down. Thanks!
Description
Please see more details in changes in doc: https://github.com/dai-chen/sql-1/blob/add-partion-index-building/flint/docs/index.md
Flint Metadata Spec
Update metadata with more info (index kind, column name and type). In this way,
FlintSparkSkippingIndex
can be deserialized out of_meta
JSON.Flint Spark API
Add
refreshIndex
API withFULL
(batch) andINCREMENTAL
(streaming) mode.Partition Index Implementation
Followed the first approach below for now. Need to evaluate the performance later.
FIRST_VALUE
function to aggregateGROUP BY
Sample index data:
Issues Resolved
opensearch-project/opensearch-spark#2
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.