Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tdl 14376 pagination failure #50

Merged
merged 8 commits into from
Dec 13, 2021

Conversation

prijendev
Copy link
Contributor

@prijendev prijendev commented Oct 28, 2021

Description of change

  • Updated one condition to break out of pagination loop.
    • If a whole blank page found, then stop looping.
  • Fixed pagination test case.

Note : Unittest case was not possible for this PR. We have replaced just one condition and method which having this condition is too large and using lot of other methods. There is sync method with sync.py module. That's why to mock the other methods was not possible due to same name of method and module. So, skipped it.

Manual QA steps

  • Verify that pagination does not break if last 2 records of any page is empty.

Risks

Rollback steps

  • revert this branch


if stream == "sadsheet-pagination":
# verify the data for the "sadsheet-pagination" stream is free of any duplicates or breaks by checking
# our fake pk value ('id')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a detailed comment about the data present in this sheet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@@ -514,7 +514,7 @@ def sync(client, config, catalog, state):
from_row=from_row,
columns=columns,
sheet_data_rows=sheet_data_rows)
if row_num < to_row:
if not sheet_data_rows: # If a whole blank page found, then stop looping.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prijendev Can you please explain what the earlier behavior row_num < to_row: meant?

Copy link
Contributor Author

@prijendev prijendev Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, to_row is initialized with a minimum of 200(max page size) or max_row. Then, it continues to add 200 until max_row. Initially, from_row is assigned by 2, and from the next page, it is assigned by to_row+1.(201 in second page). row_num is the addition of from_row and total records get in response. The above condition checks that if row_num is less than to_row or not based on which it set is_last_row true. But API does not return the last empty rows in response.

For example, rows 199 and 200 are empty, and a total 400 rows are there in the sheet. So, in 1st iteration

to_row = 200
from_row = 2
row_num = 2 + 197 = 199(1st row contain header value)
So, the above condition becomes true and breaks the loop.

Copy link
Contributor

@KrisPersonal KrisPersonal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Please add comments in the code as to why if row_num < to_row is replaced with sheet_data_rows.
  2. If for example, if a sheet contains only 200 rows and let us say 99 and 100 are empty, then it will continue to process the remaining rows because of this condition being removed and by adding sheet_data_rows empty condition it will check whether the whole sheet is empty. Is that correct?

@prijendev
Copy link
Contributor Author

prijendev commented Dec 9, 2021

  1. Added detail comments in the code.
  2. sheet_data_rows will check for an empty page, not a whole empty sheet.

@prijendev prijendev requested a review from kspeer825 December 9, 2021 13:12
Copy link
Contributor

@kspeer825 kspeer825 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are striving to writeup defects found by QA in a way that makes reproducing issues as simple as uncommenting/commenting lines marked with BUG. Unless there is an explicit TODO left in the test, or a bullet in the DoD of the card there shouldn't be any missing test cases. I believe that is the case here and the test additions can be removed. If you find this is not the case please raise this to us so we can adjust our bug reporting process.

@@ -61,6 +61,43 @@ def test_run(self):
# verify that we can paginate with all fields selected
self.assertGreater(record_count_by_stream.get(stream, 0), self.API_LIMIT)

record_count_sync = record_count_by_stream.get(stream, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should not have needed to be any test changes here besides adding the failing sheet back to the test. I don't think these test additions are adding any test coverage. The sheets have been setup in a way that relies on a specific column with incrementing values used to compare against the sdc column. See fake_pk_list above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed extra assertion. The current test is written in such a way that it was respecting only Pagination stream.

# verify the data for the "Pagination" stream is free of any duplicates or breaks by checking
# our fake pk value ('id')
# THIS ASSERTION CAN BE MADE BECAUSE WE SETUP DATA IN A SPECIFIC WAY. DONT COPY THIS
self.assertEqual(list(range(1, 239)), fake_pk_list)
# verify the data for the "Pagination" stream is free of any duplicates or breaks by checking
# the actual primary key values (__sdc_row)
self.assertEqual(list(range(2, 240)), actual_pk_list)

If we add back sadsheet-pagination then we have to write assertion according to this sheet also.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment is misleading, the test is setup to apply to both sheets

testable_streams = {"Pagination", "sadsheet-pagination"}

Additionally there is a large comment at the top of this test that should be removed
# BUG_TDL-14376 | https://jira.talendforge.org/browse/TDL-14376
# Expectation: Tap will pick up next page (200 rows) iff there is a non-null value on that page
# We observed a BUG where the tap does not paginate properly on sheets where the last two rows in a batch
# are empty values. The tap does not capture anything on the subsequent pages when this happens.
#

Copy link
Contributor Author

@prijendev prijendev Dec 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed large comment at the top of this test. In the sadsheet-pagination sheet, there is a total of 238 rows and in that sheet row, no 199 and 200 are empty rows whereas in the Pagination sheet there are total of 239 rows with no empty row.
So, as two rows are empty in sadsheet-pagination, we need to write the separate assertion in which we are excluding rows no 199 and 200.

@prijendev prijendev requested a review from kspeer825 December 9, 2021 13:48
Copy link
Contributor

@kspeer825 kspeer825 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responded to comments

@prijendev prijendev requested a review from kspeer825 December 9, 2021 15:15
@prijendev prijendev merged commit a2dce44 into TDL-15623-Crest-Master Dec 13, 2021
@prijendev prijendev mentioned this pull request Dec 13, 2021
KrisPersonal pushed a commit that referenced this pull request Dec 13, 2021
* type of emailaddress corrected (#53)

* Tdl 16079 check best practices (#51)

* Initial commit for best practice update

* updated setup.py and start_date test

* Updated test cases

* Updated start_date test case

* Updated start_date test case

* Updated comment

* Revert back test case changes

* Added new line

* Tdl 14376 pagination failure (#50)

* Initial commit for pagination failer

* Fixed pagination test cases

* Added comments

* Added detail comment into the code

* Removed unnecessary comment

* Removed unnecessary assertion

* Removed extra comment

* added comment for bug (#49)

* TDL-14475 added unsupported feature and unittests (#47)

* added unsupported feature and unittests

* added code comments

* fixed indent

* fixed indentation

* resolved a bug of writing md when 2 consecutive empty headers

* updated the logic for consecutive empty headers

* rsolved comments

* added test case for consecutive empty headers

* added comments

* resolved circleci errors

* resolved comments

Co-authored-by: namrata270998 <namrata.brahmbhatt@crestdatasystems.com>
Co-authored-by: prijendev <prijen.khokhani@crestdatasys.com>

* TDL-14397-Add skipped log when first row is empty (#46)

* added logger message and unittests

* added code comments

* changed the logger message and logic

* resolved comments

Co-authored-by: namrata270998 <namrata.brahmbhatt@crestdatasystems.com>
Co-authored-by: prijendev <prijen.khokhani@crestdatasys.com>

* TDL-16054 added code comments (#52)

* TDL-16054 added code comments

* rsolved comments

Co-authored-by: namrata270998 <namrata.brahmbhatt@crestdatasystems.com>
Co-authored-by: prijendev <prijen.khokhani@crestdatasys.com>

* Tdl 16280 implement request timeout (#54)

* TDL-16280 added request timeout

* TDL-16280: Added factor 3 to add more wait time between 2 calls

* TDL-16280: Updated Connection error as it wasn't defined.

* added backoff for access token

* updated readme

* updated request timeout and added jitter

* added comment for jitter

* added code coverage

* added testcase for connection error

* addd request timeout in config example

* updated the json example

* removed the client initialization outside with

Co-authored-by: dbshah1212 <dhruvin.shah@crestdatasys.com>
Co-authored-by: prijendev <prijen.khokhani@crestdatasys.com>

Co-authored-by: namrata270998 <75604662+namrata270998@users.noreply.github.com>
Co-authored-by: namrata270998 <namrata.brahmbhatt@crestdatasystems.com>
Co-authored-by: dbshah1212 <dhruvin.shah@crestdatasys.com>
@namrata270998 namrata270998 deleted the TDL-14376-Pagination-failure branch June 2, 2022 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants