Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checkpoint mandatory configuration #92

Merged

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Oct 23, 2023

Description

Add new Flint config for checkpoint mandatory option. It is enabled by default which means all incremental refresh (CREATE statement with auto_refresh=true) must provide checkpoint_location. Doc: https://github.com/dai-chen/opensearch-spark/blob/add-mandatory-checkpoint-option/docs/index.md#configurations

TODO

Currently all Spark streaming job related validation happens when job start. For example, OS index is created even though job start failed due to missing checkpoint location. For this checkpoint validation, it can be solved by building streaming job early. However, other check on table/options maybe performed only when job start. Need to figure out how to validate early in general. Issue: #65

Example

spark-sql> CREATE SKIPPING INDEX ON ds_tables.http_logs
         > (status VALUE_SET)
         > WITH (
         >   auto_refresh=true
         > );
java.lang.IllegalStateException: Checkpoint location is mandatory for incremental refresh 
if spark.flint.index.checkpoint.mandatory enabled

# Currently create OS index succeeds. Will improve as TODO above.
spark-sql> DROP SKIPPING INDEX ON ds_tables.http_logs;

spark-sql> SET spark.flint.index.checkpoint.mandatory=false;
spark.flint.index.checkpoint.mandatory	false

spark-sql> CREATE SKIPPING INDEX ON ds_tables.http_logs
         > (status VALUE_SET)
         > WITH (
         >   auto_refresh=true
         > );

spark-sql> DESC SKIPPING INDEX ON ds_tables.http_logs;
status	int	VALUE_SET

Issues Resolved

#87

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added the enhancement New feature or request label Oct 23, 2023
@dai-chen dai-chen self-assigned this Oct 23, 2023
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen merged commit 022b974 into opensearch-project:main Oct 25, 2023
@dai-chen dai-chen deleted the add-mandatory-checkpoint-option branch October 25, 2023 15:31
@dai-chen dai-chen changed the title Add checkpoint mandatory option Add checkpoint mandatory configuration Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants