Tdl 14376 pagination failure #50

prijendev · 2021-10-28T08:39:35Z

Description of change

Updated one condition to break out of pagination loop.
- If a whole blank page found, then stop looping.
Fixed pagination test case.

Note : Unittest case was not possible for this PR. We have replaced just one condition and method which having this condition is too large and using lot of other methods. There is sync method with sync.py module. That's why to mock the other methods was not possible due to same name of method and module. So, skipped it.

Manual QA steps

Verify that pagination does not break if last 2 records of any page is empty.

Risks

Rollback steps

revert this branch

…-14376-Pagination-failure

savan-chovatiya · 2021-12-02T10:33:39Z

tests/test_google_sheets_pagination.py

+
+                if stream == "sadsheet-pagination":
+                    # verify the data for the "sadsheet-pagination" stream is free of any duplicates or breaks by checking
+                    # our fake pk value ('id')


Add a detailed comment about the data present in this sheet.

karanpanchal-crest · 2021-12-07T06:22:17Z

tap_google_sheets/sync.py

@@ -514,7 +514,7 @@ def sync(client, config, catalog, state):
                            from_row=from_row,
                            columns=columns,
                            sheet_data_rows=sheet_data_rows)
-                        if row_num < to_row:
+                        if not sheet_data_rows: # If a whole blank page found, then stop looping.


@prijendev Can you please explain what the earlier behavior row_num < to_row: meant?

Here, to_row is initialized with a minimum of 200(max page size) or max_row. Then, it continues to add 200 until max_row. Initially, from_row is assigned by 2, and from the next page, it is assigned by to_row+1.(201 in second page). row_num is the addition of from_row and total records get in response. The above condition checks that if row_num is less than to_row or not based on which it set is_last_row true. But API does not return the last empty rows in response.

For example, rows 199 and 200 are empty, and a total 400 rows are there in the sheet. So, in 1st iteration

to_row = 200
from_row = 2
row_num = 2 + 197 = 199(1st row contain header value)
So, the above condition becomes true and breaks the loop.

KrisPersonal

Please add comments in the code as to why if row_num < to_row is replaced with sheet_data_rows.
If for example, if a sheet contains only 200 rows and let us say 99 and 100 are empty, then it will continue to process the remaining rows because of this condition being removed and by adding sheet_data_rows empty condition it will check whether the whole sheet is empty. Is that correct?

prijendev · 2021-12-09T05:13:23Z

Added detail comments in the code.

sheet_data_rows will check for an empty page, not a whole empty sheet.

tests/test_google_sheets_pagination.py

kspeer825

We are striving to writeup defects found by QA in a way that makes reproducing issues as simple as uncommenting/commenting lines marked with BUG. Unless there is an explicit TODO left in the test, or a bullet in the DoD of the card there shouldn't be any missing test cases. I believe that is the case here and the test additions can be removed. If you find this is not the case please raise this to us so we can adjust our bug reporting process.

kspeer825 · 2021-12-09T13:11:37Z

tests/test_google_sheets_pagination.py

@@ -61,6 +61,43 @@ def test_run(self):
                # verify that we can paginate with all fields selected
                self.assertGreater(record_count_by_stream.get(stream, 0), self.API_LIMIT)

+                record_count_sync = record_count_by_stream.get(stream, 0)


There should not have needed to be any test changes here besides adding the failing sheet back to the test. I don't think these test additions are adding any test coverage. The sheets have been setup in a way that relies on a specific column with incrementing values used to compare against the sdc column. See fake_pk_list above.

Removed extra assertion. The current test is written in such a way that it was respecting only Pagination stream.

tap-google-sheets/tests/test_google_sheets_pagination.py

Lines 80 to 87 in 25136fc

# verify the data for the "Pagination" stream is free of any duplicates or breaks by checking

# our fake pk value ('id')

# THIS ASSERTION CAN BE MADE BECAUSE WE SETUP DATA IN A SPECIFIC WAY. DONT COPY THIS

self.assertEqual(list(range(1, 239)), fake_pk_list)

# verify the data for the "Pagination" stream is free of any duplicates or breaks by checking

# the actual primary key values (__sdc_row)

self.assertEqual(list(range(2, 240)), actual_pk_list)

If we add back sadsheet-pagination then we have to write assertion according to this sheet also.

That comment is misleading, the test is setup to apply to both sheets

tap-google-sheets/tests/test_google_sheets_pagination.py

Line 38 in 25136fc

testable_streams = {"Pagination", "sadsheet-pagination"}

Additionally there is a large comment at the top of this test that should be removed

tap-google-sheets/tests/test_google_sheets_pagination.py

Lines 9 to 13 in 25136fc

# BUG_TDL-14376 | https://jira.talendforge.org/browse/TDL-14376

# Expectation: Tap will pick up next page (200 rows) iff there is a non-null value on that page

# We observed a BUG where the tap does not paginate properly on sheets where the last two rows in a batch

# are empty values. The tap does not capture anything on the subsequent pages when this happens.

#

Removed large comment at the top of this test. In the sadsheet-pagination sheet, there is a total of 238 rows and in that sheet row, no 199 and 200 are empty rows whereas in the Pagination sheet there are total of 239 rows with no empty row.
So, as two rows are empty in sadsheet-pagination, we need to write the separate assertion in which we are excluding rows no 199 and 200.

kspeer825

Responded to comments

* type of emailaddress corrected (#53) * Tdl 16079 check best practices (#51) * Initial commit for best practice update * updated setup.py and start_date test * Updated test cases * Updated start_date test case * Updated start_date test case * Updated comment * Revert back test case changes * Added new line * Tdl 14376 pagination failure (#50) * Initial commit for pagination failer * Fixed pagination test cases * Added comments * Added detail comment into the code * Removed unnecessary comment * Removed unnecessary assertion * Removed extra comment * added comment for bug (#49) * TDL-14475 added unsupported feature and unittests (#47) * added unsupported feature and unittests * added code comments * fixed indent * fixed indentation * resolved a bug of writing md when 2 consecutive empty headers * updated the logic for consecutive empty headers * rsolved comments * added test case for consecutive empty headers * added comments * resolved circleci errors * resolved comments Co-authored-by: namrata270998 <namrata.brahmbhatt@crestdatasystems.com> Co-authored-by: prijendev <prijen.khokhani@crestdatasys.com> * TDL-14397-Add skipped log when first row is empty (#46) * added logger message and unittests * added code comments * changed the logger message and logic * resolved comments Co-authored-by: namrata270998 <namrata.brahmbhatt@crestdatasystems.com> Co-authored-by: prijendev <prijen.khokhani@crestdatasys.com> * TDL-16054 added code comments (#52) * TDL-16054 added code comments * rsolved comments Co-authored-by: namrata270998 <namrata.brahmbhatt@crestdatasystems.com> Co-authored-by: prijendev <prijen.khokhani@crestdatasys.com> * Tdl 16280 implement request timeout (#54) * TDL-16280 added request timeout * TDL-16280: Added factor 3 to add more wait time between 2 calls * TDL-16280: Updated Connection error as it wasn't defined. * added backoff for access token * updated readme * updated request timeout and added jitter * added comment for jitter * added code coverage * added testcase for connection error * addd request timeout in config example * updated the json example * removed the client initialization outside with Co-authored-by: dbshah1212 <dhruvin.shah@crestdatasys.com> Co-authored-by: prijendev <prijen.khokhani@crestdatasys.com> Co-authored-by: namrata270998 <75604662+namrata270998@users.noreply.github.com> Co-authored-by: namrata270998 <namrata.brahmbhatt@crestdatasystems.com> Co-authored-by: dbshah1212 <dhruvin.shah@crestdatasys.com>

prijendev added 3 commits October 26, 2021 18:17

Initial commit for pagination failer

5db95bd

Merge remote-tracking branch 'origin/TDL-15623-Crest-Master' into TDL…

6cb0d95

…-14376-Pagination-failure

Fixed pagination test cases

136c84b

prijendev requested review from dbshah1212 and karanpanchal-crest October 28, 2021 08:39

savan-chovatiya reviewed Dec 2, 2021

View reviewed changes

Added comments

93185b6

savan-chovatiya approved these changes Dec 2, 2021

View reviewed changes

karanpanchal-crest reviewed Dec 7, 2021

View reviewed changes

karanpanchal-crest approved these changes Dec 8, 2021

View reviewed changes

prijendev requested review from KrisPersonal, kspeer825 and manand31 December 8, 2021 05:41

KrisPersonal requested changes Dec 8, 2021

View reviewed changes

Added detail comment into the code

ab60957

prijendev requested a review from KrisPersonal December 9, 2021 05:14

kspeer825 reviewed Dec 9, 2021

View reviewed changes

tests/test_google_sheets_pagination.py Show resolved Hide resolved

Removed unnecessary comment

6ee91a5

prijendev requested a review from kspeer825 December 9, 2021 13:12

kspeer825 suggested changes Dec 9, 2021

View reviewed changes

Removed unnecessary assertion

25136fc

prijendev requested a review from kspeer825 December 9, 2021 13:48

kspeer825 suggested changes Dec 9, 2021

View reviewed changes

Removed extra comment

2cf83bc

prijendev requested a review from kspeer825 December 9, 2021 15:15

KrisPersonal approved these changes Dec 9, 2021

View reviewed changes

kspeer825 approved these changes Dec 10, 2021

View reviewed changes

prijendev merged commit a2dce44 into TDL-15623-Crest-Master Dec 13, 2021

prijendev mentioned this pull request Dec 13, 2021

Tdl 15623 crest master #57

Merged

namrata270998 deleted the TDL-14376-Pagination-failure branch June 2, 2022 10:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tdl 14376 pagination failure #50

Tdl 14376 pagination failure #50

prijendev commented Oct 28, 2021 •

edited

Loading

savan-chovatiya Dec 2, 2021

prijendev Dec 2, 2021

karanpanchal-crest Dec 7, 2021

prijendev Dec 7, 2021 •

edited

Loading

KrisPersonal left a comment

prijendev commented Dec 9, 2021 •

edited

Loading

kspeer825 left a comment

kspeer825 Dec 9, 2021

prijendev Dec 9, 2021

kspeer825 Dec 9, 2021

prijendev Dec 9, 2021 •

edited

Loading

kspeer825 left a comment

	# verify the data for the "Pagination" stream is free of any duplicates or breaks by checking
	# our fake pk value ('id')
	# THIS ASSERTION CAN BE MADE BECAUSE WE SETUP DATA IN A SPECIFIC WAY. DONT COPY THIS
	self.assertEqual(list(range(1, 239)), fake_pk_list)

	# verify the data for the "Pagination" stream is free of any duplicates or breaks by checking
	# the actual primary key values (__sdc_row)
	self.assertEqual(list(range(2, 240)), actual_pk_list)

	# BUG_TDL-14376 \| https://jira.talendforge.org/browse/TDL-14376
	# Expectation: Tap will pick up next page (200 rows) iff there is a non-null value on that page
	# We observed a BUG where the tap does not paginate properly on sheets where the last two rows in a batch
	# are empty values. The tap does not capture anything on the subsequent pages when this happens.
	#

Tdl 14376 pagination failure #50

Tdl 14376 pagination failure #50

Conversation

prijendev commented Oct 28, 2021 • edited Loading

Description of change

Manual QA steps

Risks

Rollback steps

savan-chovatiya Dec 2, 2021

Choose a reason for hiding this comment

prijendev Dec 2, 2021

Choose a reason for hiding this comment

karanpanchal-crest Dec 7, 2021

Choose a reason for hiding this comment

prijendev Dec 7, 2021 • edited Loading

Choose a reason for hiding this comment

KrisPersonal left a comment

Choose a reason for hiding this comment

prijendev commented Dec 9, 2021 • edited Loading

kspeer825 left a comment

Choose a reason for hiding this comment

kspeer825 Dec 9, 2021

Choose a reason for hiding this comment

prijendev Dec 9, 2021

Choose a reason for hiding this comment

kspeer825 Dec 9, 2021

Choose a reason for hiding this comment

prijendev Dec 9, 2021 • edited Loading

Choose a reason for hiding this comment

kspeer825 left a comment

Choose a reason for hiding this comment

prijendev commented Oct 28, 2021 •

edited

Loading

prijendev Dec 7, 2021 •

edited

Loading

prijendev commented Dec 9, 2021 •

edited

Loading

prijendev Dec 9, 2021 •

edited

Loading