Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-534004: Added database and schema to the queries related to temporary stage #1274

Merged
merged 6 commits into from
Jan 4, 2023

Conversation

jekaterinakletnaja
Copy link
Contributor

@jekaterinakletnaja jekaterinakletnaja commented Oct 4, 2022

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-534004: write_pandas does not use schema provided #1034

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am modifying authorization mechanisms
    • I am adding new credentials
    • I am modifying OCSP code
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    Database and schema parameters are not used in the write_pandas code that is related to temporary stage - it refers only to the stage’s name. Therefore, in case the connection provided to write_pandas has no database and/or schema, execution fails, even if database and/or schema are provided as arguments to the write_pandas itself. Thus, indication of database and schema in the queries related to the temporary stage solves the issue.

@github-actions
Copy link

github-actions bot commented Oct 4, 2022

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@jekaterinakletnaja
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@jekaterinakletnaja jekaterinakletnaja changed the title Added database and schema to the queries related to temporary stage SNOW-534004: Added database and schema to the queries related to temporary stage Oct 4, 2022
Copy link
Contributor

@sfc-gh-stan sfc-gh-stan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you please add an integration test in test/integ/pandas/test_pandas_tool.py to verify your change 🙂 ?
In order to use the pytest fixtures we have (e.g. conn_cnx, db_parameters), you need to add a file named parameters.py under test/, which contains the parameters you pass to snowflake.connector.Connect like:

#!/usr/bin/env python
CONNECTION_PARAMETERS = {
    'account': 'accountname',
    'user': 'username',
    'password': 'password',
    'schema': 'schemaname',
    'database': 'databasename',
    'protocol': 'https',
    'host': 'hostname',
    'port': 'portnumber',
}

@sfc-gh-stan
Copy link
Contributor

Hi @jekaterinakletnaja , thank you for looking into the issue and submitting a fix, we really appreciate your contribution 🎉 ! Please address my comments, I will keep monitoring this PR, let's try to merge this change soon.

…ng format() to f-strings where backslashes are not used; adding tests
@jekaterinakletnaja
Copy link
Contributor Author

jekaterinakletnaja commented Oct 12, 2022

Hi @jekaterinakletnaja , thank you for looking into the issue and submitting a fix, we really appreciate your contribution 🎉 ! Please address my comments, I will keep monitoring this PR, let's try to merge this change soon.

Hi @sfc-gh-stan, appreciate your comments and suggestions! I reflected those in the latest commit and added tests.

At the same time, while adding tests and running previous ones related to write_pandas, I’ve run into a few issues that potentially are not related to the changes in this PR, but might be different bugs (however, due to the lack of deep knowledge about the code I might be wrong). Could you please take a look at my comments below and suggest how to approach them the best?

  1. The existing test test_empty_dataframe_write_pandas in snowflake-connector-python/test/integ/pandas/test_pandas_tools.py is failing with
...
    def chunk_helper(lst: T, n: int) -> Iterator[tuple[int, T]]:
        """Helper generator to chunk a sequence efficiently with current index like if enumerate was called on sequence."""
>       for i in range(0, len(lst), n):
E       ValueError: range() arg 3 must not be zero

Indeed, in case a df is empty, chunk_size is 0 according to it’s current definition, but 0 is not accepted as step for range.

  1. There are new tests (added by me in order to test creation and usage of stage_location and file_format_location, for example test_stage_location_building_db_schema). They are failing at this line with message
.0 = <map object at 0x7f9d855177d0>

>       [f"{quote}{c}{quote} {column_type_mapping[c]}" for c in df.columns]
    )
E   KeyError: 'name'

even in case when all assertion related code (for ex., lines 428-443 for test_stage_location_building_db_schema) is commented.
It is because column_type_mapping (which is the result of execution of infer_schema_sql) is an empty list, but we are searching for columns names in it. Do you have a feeling what might be wrong here? Is it expected that column_type_mapping is an empty list?

UPD: I’ve got what was the problem in p.2, my bad. It is solved now by the latest commit. Question in p.1 is still actual though.

@jekaterinakletnaja jekaterinakletnaja requested review from sfc-gh-stan and removed request for sfc-gh-mkeller October 12, 2022 21:23
Copy link
Contributor

@sfc-gh-stan sfc-gh-stan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing my suggestions, the code change looks good to go!
I made a few more comments about the tests, mainly concerned with code duplication/simplicity. Please let me know what you think 🙂 .

@sfc-gh-stan
Copy link
Contributor

sfc-gh-stan commented Oct 13, 2022

Hi @jekaterinakletnaja , thank you for looking into the issue and submitting a fix, we really appreciate your contribution 🎉 ! Please address my comments, I will keep monitoring this PR, let's try to merge this change soon.

Hi @sfc-gh-stan, appreciate your comments and suggestions! I reflected those in the latest commit and added tests.

At the same time, while adding tests and running previous ones related to write_pandas, I’ve run into a few issues that potentially are not related to the changes in this PR, but might be different bugs (however, due to the lack of deep knowledge about the code I might be wrong). Could you please take a look at my comments below and suggest how to approach them the best?

  1. The existing test test_empty_dataframe_write_pandas in snowflake-connector-python/test/integ/pandas/test_pandas_tools.py is failing with
...
    def chunk_helper(lst: T, n: int) -> Iterator[tuple[int, T]]:
        """Helper generator to chunk a sequence efficiently with current index like if enumerate was called on sequence."""
>       for i in range(0, len(lst), n):
E       ValueError: range() arg 3 must not be zero

Indeed, in case a df is empty, chunk_size is 0 according to it’s current definition, but 0 is not accepted as step for range.

  1. There are new tests (added by me in order to test creation and usage of stage_location and file_format_location, for example test_stage_location_building_db_schema). They are failing at this line with message
.0 = <map object at 0x7f9d855177d0>

>       [f"{quote}{c}{quote} {column_type_mapping[c]}" for c in df.columns]
    )
E   KeyError: 'name'

even in case when all assertion related code (for ex., lines 428-443 for test_stage_location_building_db_schema) is commented. It is because column_type_mapping (which is the result of execution of infer_schema_sql) is an empty list, but we are searching for columns names in it. Do you have a feeling what might be wrong here? Is it expected that column_type_mapping is an empty list?

UPD: I’ve got what was the problem in p.2, my bad. It is solved now by the latest commit. Question in p.1 is still actual though.

  1. This was actually a bug we recently fixed with SNOW-664933 Fix write_pandas with empty dataframe #1261. You should be able to resolve the test failure by pulling the latest code. It shouldn't be affected by your changes.
  2. Glad you're able to resolve it on your own 👍 Parsing these queries is tricky, again I think it's completely fine if we only test one query for each of table, stage and file format. For example, we could only test creation, because if copy into, select don't specify the same location, it should result in failure in other tests.

@jekaterinakletnaja
Copy link
Contributor Author

Hey @sfc-gh-stan, I like your idea for tests combo! Please let me know whether latest changes cover it in full.

@sfc-gh-stan sfc-gh-stan mentioned this pull request Oct 17, 2022
@sfc-gh-stan
Copy link
Contributor

Hey @sfc-gh-stan, I like your idea for tests combo! Please let me know whether latest changes cover it in full.

Hi @jekaterinakletnaja , your code LGTM and has passed all the merge gates 🎉 ! Normally this would be merged right away, but since the bug fix is technically a behavior change, it can only be merged into to main in the Python connector's next release in January 2023. I will keep monitoring and make sure that happens.
Again, thank you so much for this excellent contribution! We would like to give you credit in the next release note if you don't mind 😉 ?

@jekaterinakletnaja
Copy link
Contributor Author

Hi @jekaterinakletnaja , your code LGTM and has passed all the merge gates 🎉 ! Normally this would be merged right away, but since the bug fix is technically a behavior change, it can only be merged into to main in the Python connector's next release in January 2023. I will keep monitoring and make sure that happens. Again, thank you so much for this excellent contribution! We would like to give you credit in the next release note if you don't mind 😉 ?

Hi @sfc-gh-stan, thanks for the wonderful news! 🎉
It’s an honour to be in the release notes 😄

Looking forward to January 2023!

@sfc-gh-stan sfc-gh-stan enabled auto-merge (squash) January 4, 2023 16:41
@sfc-gh-stan sfc-gh-stan merged commit bee5dff into snowflakedb:main Jan 4, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Jan 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SNOW-534004: write_pandas does not use schema provided
2 participants