Process query rows one at a time to reduce memory footprint #15268

alopezz · 2023-07-13T13:45:08Z

What does this PR do?

Changes fetchall() with iterating on the cursor, which is equivalent to fetching the results one by one.

Motivation

A support case reported running into increased memory usages when the queries return a large number of row. The snowflake docs indicate that this is the way to go when fetchall() results in memory usage issues.

Additional Notes

As far as I know, this should not result in any performance regression, as under the hood fetchall() is using fetchone() in a loop.
Reference to what's called when iterating over cursor to see how it's uses fetchone.
I haven't benchmarked the difference in memory usage of the change, and I'm instead relying on what's documented and on understanding our code, snowflake_connector's code, and the description of the issue from a support case.
I've tested this against a real snowflake instance, and I've made the necessary adjustments to our mock to try to follow the expected behavior as closely as possible.

Review checklist (to be filled by reviewers)

Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
PR title must be written as a CHANGELOG entry (see why)
Files changes must correspond to the primary purpose of the PR as described in the title (small unrelated changes should have their own PR)
PR must have changelog/ and integration/ labels attached
If the PR doesn't need to be tested during QA, please add a qa/skip-qa label.

github-actions · 2023-07-13T13:50:28Z

Test Results

  2 files   2 suites 16s ⏱️
57 tests 57 ✔️ 0 💤 0 ❌
59 runs 57 ✔️ 2 💤 0 ❌

Results for commit 83bd212.

♻️ This comment has been updated with latest results.

codecov · 2023-07-13T14:00:40Z

Codecov Report

Merging #15268 (83bd212) into master (052ce43) will decrease coverage by 0.01%.
The diff coverage is 80.00%.

Flag	Coverage Δ
snowflake	`96.61% <80.00%> (-0.15%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

FlorentClarret

Just left one comment, otherwise LGTM 👍

FlorentClarret · 2023-07-17T11:28:15Z

snowflake/tests/snowflake_connector_patch/_snowflake_connector_patch/connector.py

    def fetchall(self):
        return self.__data


Do we need to keep this one now we use fetchone?

Fair point, I think I was keeping it around "just in case" but that's not a good justification, I'll drop that.

hithwen · 2023-07-18T07:30:45Z

snowflake/datadog_checks/snowflake/check.py

-            raw_version = self.execute_query_raw("select current_version();")
-            version = raw_version[0][0]
+            raw_version = next(self.execute_query_raw("select current_version();"))
+            version = raw_version[0]


why is the second [0] not required here?

Because of the next right above. When using fetchall(), raw_version was a list of tuples, and thus we were accessing the first element of the first item in the list. With fetchone(), execute_query_raw is returning an iterator, for which we're taking the first element already with next. Thus, in the new version of the code, raw_version is a tuple from which we're only interested in the first element.

alopezz requested a review from a team as a code owner July 13, 2023 13:45

alopezz added the changelog/Fixed label Jul 13, 2023

ghost added the integration/snowflake label Jul 13, 2023

alopezz force-pushed the alopez/snowflake/fetchone branch from 4157470 to b548207 Compare July 13, 2023 13:55

FlorentClarret previously approved these changes Jul 17, 2023

View reviewed changes

alopezz added 2 commits July 17, 2023 14:08

Process query rows one at a time to reduce memory footprint

008eb65

Remove fetchall from mock

83bd212

alopezz dismissed FlorentClarret’s stale review via 83bd212 July 17, 2023 12:13

alopezz force-pushed the alopez/snowflake/fetchone branch from c070420 to 83bd212 Compare July 17, 2023 12:13

FlorentClarret approved these changes Jul 17, 2023

View reviewed changes

alopezz merged commit e97c04d into master Jul 17, 2023

alopezz deleted the alopez/snowflake/fetchone branch July 17, 2023 13:45

hithwen reviewed Jul 18, 2023

View reviewed changes

alopezz mentioned this pull request Jul 18, 2023

Bump snowflake version to 4.5.4 #15290

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process query rows one at a time to reduce memory footprint #15268

Process query rows one at a time to reduce memory footprint #15268

alopezz commented Jul 13, 2023

github-actions bot commented Jul 13, 2023 •

edited

Loading

codecov bot commented Jul 13, 2023 •

edited

Loading

FlorentClarret left a comment

FlorentClarret Jul 17, 2023

alopezz Jul 17, 2023

hithwen Jul 18, 2023

alopezz Jul 18, 2023

Process query rows one at a time to reduce memory footprint #15268

Process query rows one at a time to reduce memory footprint #15268

Conversation

alopezz commented Jul 13, 2023

What does this PR do?

Motivation

Additional Notes

Review checklist (to be filled by reviewers)

github-actions bot commented Jul 13, 2023 • edited Loading

Test Results

codecov bot commented Jul 13, 2023 • edited Loading

Codecov Report

FlorentClarret left a comment

Choose a reason for hiding this comment

FlorentClarret Jul 17, 2023

Choose a reason for hiding this comment

alopezz Jul 17, 2023

Choose a reason for hiding this comment

hithwen Jul 18, 2023

Choose a reason for hiding this comment

alopezz Jul 18, 2023

Choose a reason for hiding this comment

github-actions bot commented Jul 13, 2023 •

edited

Loading

codecov bot commented Jul 13, 2023 •

edited

Loading