Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process query rows one at a time to reduce memory footprint #15268

Merged
merged 2 commits into from
Jul 17, 2023

Conversation

alopezz
Copy link
Contributor

@alopezz alopezz commented Jul 13, 2023

What does this PR do?

Changes fetchall() with iterating on the cursor, which is equivalent to fetching the results one by one.

Motivation

A support case reported running into increased memory usages when the queries return a large number of row. The snowflake docs indicate that this is the way to go when fetchall() results in memory usage issues.

Additional Notes

  • As far as I know, this should not result in any performance regression, as under the hood fetchall() is using fetchone() in a loop.
  • Reference to what's called when iterating over cursor to see how it's uses fetchone.
  • I haven't benchmarked the difference in memory usage of the change, and I'm instead relying on what's documented and on understanding our code, snowflake_connector's code, and the description of the issue from a support case.
  • I've tested this against a real snowflake instance, and I've made the necessary adjustments to our mock to try to follow the expected behavior as closely as possible.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • PR title must be written as a CHANGELOG entry (see why)
  • Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
  • PR must have changelog/ and integration/ labels attached
  • If the PR doesn't need to be tested during QA, please add a qa/skip-qa label.

@alopezz alopezz requested a review from a team as a code owner July 13, 2023 13:45
@github-actions
Copy link

github-actions bot commented Jul 13, 2023

Test Results

  2 files    2 suites   16s ⏱️
57 tests 57 ✔️ 0 💤 0
59 runs  57 ✔️ 2 💤 0

Results for commit 83bd212.

♻️ This comment has been updated with latest results.

@alopezz alopezz force-pushed the alopez/snowflake/fetchone branch from 4157470 to b548207 Compare July 13, 2023 13:55
@codecov
Copy link

codecov bot commented Jul 13, 2023

Codecov Report

Merging #15268 (83bd212) into master (052ce43) will decrease coverage by 0.01%.
The diff coverage is 80.00%.

Flag Coverage Δ
snowflake 96.61% <80.00%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

FlorentClarret
FlorentClarret previously approved these changes Jul 17, 2023
Copy link
Member

@FlorentClarret FlorentClarret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left one comment, otherwise LGTM 👍

Comment on lines 59 to 60
def fetchall(self):
return self.__data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep this one now we use fetchone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, I think I was keeping it around "just in case" but that's not a good justification, I'll drop that.

@alopezz alopezz merged commit e97c04d into master Jul 17, 2023
@alopezz alopezz deleted the alopez/snowflake/fetchone branch July 17, 2023 13:45
raw_version = self.execute_query_raw("select current_version();")
version = raw_version[0][0]
raw_version = next(self.execute_query_raw("select current_version();"))
version = raw_version[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the second [0] not required here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the next right above. When using fetchall(), raw_version was a list of tuples, and thus we were accessing the first element of the first item in the list. With fetchone(), execute_query_raw is returning an iterator, for which we're taking the first element already with next. Thus, in the new version of the code, raw_version is a tuple from which we're only interested in the first element.

@alopezz alopezz mentioned this pull request Jul 18, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants