Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend INCLUDE address[, address+] SELECT #265

Open
7 tasks
skejserjensen opened this issue Dec 5, 2024 · 2 comments
Open
7 tasks

Extend INCLUDE address[, address+] SELECT #265

skejserjensen opened this issue Dec 5, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@skejserjensen
Copy link
Contributor

skejserjensen commented Dec 5, 2024

Initial support for INCLUDE address... was added in #266. However, it can be extended in many directions which is tracked by this issue.

  • Check the metadata of instances with addresses provided in INCLUDE and give a warning if it is a cloud node.
  • Enable use of INCLUDE in sub-queries, maybe by not requiring it to be the first part of a SQL statement.
  • Enable use of INCLUDE in EXPLAIN, maybe by not requiring it to be the first part of a SQL statement.
  • Create a version of the SQL query without the INCLUDE address once during parsing instead of with string operations.
  • Add integration tests that verify correct behavior (i.e., an error is raised) if the connection to an edge node is lost
  • Order the operations performed across the local instance and the remote instances to minimize query response time.
  • Make remote queries a physical remote data reading operator so reading remote data can be optimized as part of the plan.
@skejserjensen
Copy link
Contributor Author

Extracting the start and end of INCLUDE and the addresses in the parser should now be simpler as the old TokenWithLocation struct in sqlparser 0.52.0 has been replaced with the TokenWithSpan struct in sqlparser 0.53.0. The old struct only contained the start location of the Token while the new struct contains both the start and end location.

@skejserjensen
Copy link
Contributor Author

It may also be worth considering alternative ways to query external and/or remote data in addition to or as a replacement for INCLUDE. For example, datafusion-cli can directly query Apache Parquet files over the network and allows external tables to be created to simplify such queries. It should be possible to implement similar functionality in ModelarDB by registering one or more TableProviderFactorys with Apache DataFusion as a TableProviderFactory allows tables to be created on the fly. While this would allow data to be read remotely, TableProviderFactory does not seem to support pushing down a part of the query plan, to do this datafusion-federation from the datafusion-contrib could be used. Finally, to support including many different systems in the query, datafusion-table-providers from the datafusion-contrib could be used

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant