Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abstracting source relations for enhanced covering index rewriting #391

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Jun 20, 2024

Description

This PR introduces new abstractions for the covering index query rewriter, facilitating support for different source table relation matching and rewriting. This enhancement paves the way for future support of Iceberg table relations.

PR Planned

Changes

Added new FlintSparkSourceRelationProvider and FlintSparkSourceRelation abstraction. Please see Scala doc for its responsibility in details. Basically,

  • FlintSparkSourceRelationProvider: determine if a given logical relation can be supported by Flint optimizer.
  • FlintSparkSourceRelation: provide all information required by query rewriting for a specific source relation.

Will refactor ApplyFlintSparkSkippingIndex and FlintSparkValidationHelper.isTableProviderSupported based on these in future.

Screenshot 2024-04-30 at 11 55 56 AM

Testing

spark-sql> CREATE INDEX all ON myglue.ds_tables.http_logs
         > (
         >   `@timestamp`,
         >   clientip,
         >   request,
         >   status,
         >   size
         > );

scala> sc.setLogLevel("INFO")
scala> sql("EXPLAIN SELECT clientip FROM myglue.ds_tables.http_logs WHERE status != 200").show

# Logging explains whether and why the index is applied
24/05/03 17:51:17 INFO FlintSparkSourceRelationProvider: Loaded source relation providers [file]
24/05/03 17:51:17 INFO ApplyFlintSparkCoveringIndex: Provider [file] can match sub plan LogicalRelation
24/05/03 17:51:18 INFO ApplyFlintSparkCoveringIndex: Found covering index 
[flint_myglue_ds_tables_http_logs_all_index] on table myglue.ds_tables.http_logs
24/05/03 17:51:18 INFO ApplyFlintSparkCoveringIndex:
 Is covering index flint_myglue_ds_tables_http_logs_all_index applicable: true
   Index state: Some(active)
   Index filter condition: None
   Columns required: Set(clientip, status)
   Columns indexed: Set(@timestamp, request, size, clientip, status)

Issues Resolved

#298

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

dai-chen added 8 commits June 20, 2024 11:20
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added maintenance Code refactoring 0.5 labels Jun 20, 2024
@dai-chen dai-chen self-assigned this Jun 20, 2024
@dai-chen dai-chen marked this pull request as ready for review June 21, 2024 16:48
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen requested a review from penghuo June 25, 2024 16:18
@dai-chen dai-chen merged commit beac01a into opensearch-project:main Jun 26, 2024
4 checks passed
@dai-chen dai-chen deleted the refactor-covering-index-query-rewriter branch June 26, 2024 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.5 maintenance Code refactoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants