Skip to content
This repository has been archived by the owner on Jan 9, 2025. It is now read-only.

feat: add read task in bigquery component #156

Merged
merged 15 commits into from
Jun 24, 2024
Merged

feat: add read task in bigquery component #156

merged 15 commits into from
Jun 24, 2024

Conversation

chuang8511
Copy link
Contributor

Because

  • we want to read from data components

This commit

  • add read task to bigquery

Copy link

linear bot commented Jun 10, 2024

@chuang8511
Copy link
Contributor Author

It is only for sync the task progress.
Will change PR status to open later.


func constructTableColumns(myDataset *bigquery.Dataset, ctx context.Context, compConfig *base.ComponentConfig) ([]TableColumns, error) {
tableIT := myDataset.Tables(ctx)
tables := []TableColumns{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we only fetch the table we want to use?

If we still want to fetch multiple tables, we can use a map with table_name as the key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@donch1989
I was thinking to take the table name from setting parts to input.
Then, the users can choose the table / tables they want to extract with smart hint or drag-and-drop menu.
And, we also can adjust the columns without saving pipeline again if users want to change table.

What do you think about that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sounds good

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created another ticket and leave TODO memo here.
I will deal with it in the near future.

@chuang8511 chuang8511 requested a review from donch1989 June 13, 2024 17:26
@chuang8511 chuang8511 marked this pull request as ready for review June 13, 2024 17:26
@@ -31,8 +32,10 @@ func insertDataToBigQuery(projectID, datasetID, tableName string, valueSaver Dat
func getDataSaver(input *structpb.Struct, schema bigquery.Schema) (DataSaver, error) {
inputObj := input.GetFields()["data"].GetStructValue()
dataMap := map[string]bigquery.Value{}
transformer := base.InstillDynamicFormatTransformer{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chuang8511
I don't think we need to convert the case here. Since the schema is not defined in VDP, let's just keep it the same as it is in BigQuery.

Copy link
Contributor Author

@chuang8511 chuang8511 Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@donch1989
I converted the schema from snake_case or camelCase to kebeb-case when we read the schema from BigQuery to save into VDP schema.

Here is the code

In this code, we did not modify the schema of BigQuery.
We modify the key to fetch the data from VDP.

Flow is like

  1. Get schema from BigQuery
  2. Change schema from snake_case / camelCase to kebab-case and save to VDP schema
  3. user input data with the VDP schema, which is kebab-case.
  4. Before we insert data into BigQuery, we fetch data with kebab-case from VDP components.

So, in the code you comment, we actually do not modify the schema in BigQuery.

Please correct me if I misunderstand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know. But what if in BigQuery, there are two columns fooBar and foo_bar at the same time? The logic will go wrong.

Copy link
Contributor Author

@chuang8511 chuang8511 Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@donch1989
So, do you mean that we do not have to transform BigQuery schema into kebab-case for VDP schema?
If so, I will revert this part & this code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can just use the exact same names as those in BigQuery.

@donch1989
Copy link
Member

Please also help rebase it as well.

@chuang8511
Copy link
Contributor Author

@donch1989
Sync
Because there is a frontend bug, I have not done the final e2e test.
I will do the test after the Console bug is fixed.
Next time, I will put more time to try hardcode first to skip the bug.
Today, I happen to have a tight schedule, so I could not do it now.

Sorry...

@chuang8511
Copy link
Contributor Author

I have done end-to-end test.
image

@chuang8511 chuang8511 requested a review from donch1989 June 19, 2024 12:02
@donch1989 donch1989 merged commit 4d2e7ec into main Jun 24, 2024
8 checks passed
@donch1989 donch1989 deleted the chunhao/ins-4853 branch June 24, 2024 02:58
namwoam pushed a commit to namwoam/component that referenced this pull request Jun 24, 2024
Because

- we want to read from data components

This commit

- add read task to bigquery
namwoam pushed a commit to namwoam/component that referenced this pull request Jun 24, 2024
Because

- we want to read from data components

This commit

- add read task to bigquery
donch1989 pushed a commit that referenced this pull request Jul 2, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.21.0-beta](v0.20.2-beta...v0.21.0-beta)
(2024-07-02)


### Features

* add mail component
([#178](#178))
([04b19d0](04b19d0))
* add read task for gcs
([#155](#155))
([77fe2fc](77fe2fc))
* add read task in bigquery component
([#156](#156))
([4d2e7ec](4d2e7ec))
* **anthropic:** add Anthropic component
([#176](#176))
([030881d](030881d))
* **anthropic:** add UsageHandler functions in anthropic
([#186](#186))
([ebaa61f](ebaa61f))
* **compogen:** add extra section with --extraContents flag'
([#171](#171))
([391bb98](391bb98))
* **instill:** remove extra-params field
([#188](#188))
([b17ff73](b17ff73))
* **redis:** simplify the TLS configuration
([#194](#194))
([0a8baf7](0a8baf7))


### Bug Fixes

* **all:** fix typos
([#174](#174))
([cb3c2fb](cb3c2fb))
* **compogen:** wrong bracket direction in substitution
([#184](#184))
([dfe8306](dfe8306))
* expose input and output for anthropic for instill credit
([#190](#190))
([a36e876](a36e876))
* update doc
([#185](#185))
([6e6639a](6e6639a))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Status: 👋 Done
Development

Successfully merging this pull request may close these issues.

3 participants