-
Notifications
You must be signed in to change notification settings - Fork 7
feat: add read task in bigquery component #156
Conversation
It is only for sync the task progress. |
|
||
func constructTableColumns(myDataset *bigquery.Dataset, ctx context.Context, compConfig *base.ComponentConfig) ([]TableColumns, error) { | ||
tableIT := myDataset.Tables(ctx) | ||
tables := []TableColumns{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we only fetch the table we want to use?
If we still want to fetch multiple tables, we can use a map with table_name as the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@donch1989
I was thinking to take the table name from setting parts to input.
Then, the users can choose the table / tables they want to extract with smart hint or drag-and-drop menu.
And, we also can adjust the columns without saving pipeline again if users want to change table.
What do you think about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sounds good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created another ticket and leave TODO memo here.
I will deal with it in the near future.
data/bigquery/v0/insert.go
Outdated
@@ -31,8 +32,10 @@ func insertDataToBigQuery(projectID, datasetID, tableName string, valueSaver Dat | |||
func getDataSaver(input *structpb.Struct, schema bigquery.Schema) (DataSaver, error) { | |||
inputObj := input.GetFields()["data"].GetStructValue() | |||
dataMap := map[string]bigquery.Value{} | |||
transformer := base.InstillDynamicFormatTransformer{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chuang8511
I don't think we need to convert the case here. Since the schema is not defined in VDP, let's just keep it the same as it is in BigQuery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@donch1989
I converted the schema from snake_case or camelCase to kebeb-case when we read the schema from BigQuery to save into VDP schema.
In this code, we did not modify the schema of BigQuery.
We modify the key to fetch the data from VDP.
Flow is like
- Get schema from BigQuery
- Change schema from snake_case / camelCase to kebab-case and save to VDP schema
- user input data with the VDP schema, which is kebab-case.
- Before we insert data into BigQuery, we fetch data with kebab-case from VDP components.
So, in the code you comment, we actually do not modify the schema in BigQuery.
Please correct me if I misunderstand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I know. But what if in BigQuery, there are two columns fooBar
and foo_bar
at the same time? The logic will go wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@donch1989
So, do you mean that we do not have to transform BigQuery schema into kebab-case for VDP schema?
If so, I will revert this part & this code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think we can just use the exact same names as those in BigQuery.
Please also help rebase it as well. |
0462093
to
f343235
Compare
@donch1989 Sorry... |
f1574b4
to
3f82148
Compare
Because - we want to read from data components This commit - add read task to bigquery
Because - we want to read from data components This commit - add read task to bigquery
🤖 I have created a release *beep* *boop* --- ## [0.21.0-beta](v0.20.2-beta...v0.21.0-beta) (2024-07-02) ### Features * add mail component ([#178](#178)) ([04b19d0](04b19d0)) * add read task for gcs ([#155](#155)) ([77fe2fc](77fe2fc)) * add read task in bigquery component ([#156](#156)) ([4d2e7ec](4d2e7ec)) * **anthropic:** add Anthropic component ([#176](#176)) ([030881d](030881d)) * **anthropic:** add UsageHandler functions in anthropic ([#186](#186)) ([ebaa61f](ebaa61f)) * **compogen:** add extra section with --extraContents flag' ([#171](#171)) ([391bb98](391bb98)) * **instill:** remove extra-params field ([#188](#188)) ([b17ff73](b17ff73)) * **redis:** simplify the TLS configuration ([#194](#194)) ([0a8baf7](0a8baf7)) ### Bug Fixes * **all:** fix typos ([#174](#174)) ([cb3c2fb](cb3c2fb)) * **compogen:** wrong bracket direction in substitution ([#184](#184)) ([dfe8306](dfe8306)) * expose input and output for anthropic for instill credit ([#190](#190)) ([a36e876](a36e876)) * update doc ([#185](#185)) ([6e6639a](6e6639a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Because
This commit