-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ppl project
table command
#936
base: main
Are you sure you want to change the base?
Conversation
…ete table / view that can later be efficiently queried or stored in OpenSearch MV Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Haven't go to implementation detail. One high level question: Besides, IMO name |
I'm thinking about the same. Not clear in which case users have to run DDL in PPL. I know Kusto supports management commands. Is it because they don't support SQL? https://learn.microsoft.com/en-us/kusto/query/?view=microsoft-fabric#management-commands |
@LantaoJin @dai-chen @ykmr1224 IMO we should review each DDL command to see if it can save time/effort/learning for the customer and if that is the case we should add it - the goal here is to create a fully functional language that is a one stop shop and not to mandate users moving back to SQL when they need some missing functionality In addition other pipeline languages (such as splunk) do offer DDL commands : |
Hi @LantaoJin |
This syntax was mentioned in PPL vision doc. Sent to you offline. |
Hmm, these search output commands are DML commands IMO. They might equal to |
I agree that the DDL-like PPL commands import fundamental capabilities. Before delivering the totally new concept to PPL syntax, I have several questions:
|
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
Signed-off-by: YANGDB <yang.db.dev@gmail.com>
@LantaoJin I've change the syntax from |
# Conflicts: # docs/ppl-lang/PPL-Example-Commands.md # ppl-spark-integration/src/main/antlr4/OpenSearchPPLLexer.g4
|
Description
add project table command to allow materializing queries into a concrete table / view that can later be efficiently queried or stored in OpenSearch MV
PPL
project
commandOverview
Using
project
command to materialize a query into a dedicated view:In some cases it is required to construct a projection view (materialized into a view) of the query results.
This projection can be later used as a source of continued queries for further slicing and dicing the data, in addition such tables can be also saved into a MV table that are pushed into OpenSearch and can be used for visualization and enhanced performant queries.
The command can also function as an ETL process where the original datasource will be transformed and ingested into the output projected view using the ppl transformation and aggregation operators
Syntax
PROJECT (IF NOT EXISTS)? viewName (USING datasource)? (OPTIONS optionsList)? (PARTITIONED BY partitionColumnNames)? location?
viewName
Specifies a view name, which may be optionally qualified with a database name.
USING datasource
Data Source is the input format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc.
OPTIONS optionsList
Specifies a set of key-value pairs used to configure the data source. These options vary depending on the chosen data source and may include properties such as file paths, authentication details, format-specific parameters, etc.
PARTITIONED BY
Specifies the columns on which the data should be partitioned. Partitioning splits the data into separate logical divisions based on distinct values of the specified column(s), which can optimize query performance.
location
Specifies the physical location where the view or table data is stored. This could be a path in a distributed file system like HDFS, S3 Object storage or a local filesystem.
QUERY**
The outcome view (viewName) is populated using the data from the select statement.
Usage Guidelines
The project command produces a view based on the resulting rows returned from the query.
Any query can be used in the
AS <query>
statement and attention must be used to the volume and compute that may incur due to such queries.As a precautions an
explain cost | source = table | ...
can be run prior to theproject
statement to have a better estimation.Examples:
Usage Guidelines
The project command produces a view based on the resulting rows returned from the query.
Any query can be used in the
AS <query>
statement and attention must be used to the volume and compute that may incur due to such queries.As a precautions an
explain cost | source = table | ...
can be run prior to theproject
statement to have a better estimation.Examples:
Effective SQL push-down query
The project command is translated into an equivalent SQL
create table <viewName> [Using <datasuorce>] As <statement>
as shown here:References
Related Issues
#928
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.