Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 35112/relax cats when not primary key #35645

Merged
merged 17 commits into from
Feb 28, 2024

Conversation

maxi297
Copy link
Contributor

@maxi297 maxi297 commented Feb 27, 2024

What

Addresses https://github.com/airbytehq/airbyte-internal-issues/issues/6377

How

  • Validate that we have at least as many records than the expected
  • Remove extra_records
  • Remove extra_fields
  • Remove basic_read.ignored_fields

Ignored fields

Removing ignored_fields is a bit messy as there are mismatch with basic_read tests full_refresh which I can't explain. For now, I'll only remove the basic_read.ignored_fields but it seems like both should follow the same logic

Script to detect mismatch (to run with cwd in airbyte-integrations/connectors):

import glob
import yaml


def _get_ignored_fields(test):
    ignored_fields_by_config_path = {}
    for test in test_config.get("acceptance_tests", {}).get(test, {}).get("tests", []):
        if test.get("ignored_fields"):
            ignored_fields_by_config_path[test.get("config_path", "secrets/config.json")] = test.get("ignored_fields")
    return ignored_fields_by_config_path


mismatching_sources = []
for file_name in glob.glob("source-*/acceptance-test-config.yml"):
    with open(file_name) as file:
        test_config = yaml.safe_load(file)
        basic_read_ignored_fields = _get_ignored_fields("basic_read")
        full_refresh_ignored_fields = _get_ignored_fields("full_refresh")
        if not basic_read_ignored_fields == full_refresh_ignored_fields:
            mismatching_sources.append(file_name)

print(mismatching_sources)
print(len(mismatching_sources))

Output:

['source-linkedin-ads/acceptance-test-config.yml', 'source-shopify/acceptance-test-config.yml', 'source-greenhouse/acceptance-test-config.yml', 'source-tiktok-marketing/acceptance-test-config.yml', 'source-insightly/acceptance-test-config.yml', 'source-facebook-pages/acceptance-test-config.yml', 'source-jira/acceptance-test-config.yml', 'source-coingecko-coins/acceptance-test-config.yml', 'source-klaviyo/acceptance-test-config.yml', 'source-apple-search-ads/acceptance-test-config.yml', 'source-instagram/acceptance-test-config.yml', 'source-square/acceptance-test-config.yml', 'source-gitlab/acceptance-test-config.yml', 'source-amazon-ads/acceptance-test-config.yml', 'source-mixpanel/acceptance-test-config.yml', 'source-coinmarketcap/acceptance-test-config.yml', 'source-amazon-seller-partner/acceptance-test-config.yml', 'source-bing-ads/acceptance-test-config.yml', 'source-freshsales/acceptance-test-config.yml', 'source-google-ads/acceptance-test-config.yml', 'source-sendgrid/acceptance-test-config.yml', 'source-monday/acceptance-test-config.yml', 'source-launchdarkly/acceptance-test-config.yml', 'source-salesforce/acceptance-test-config.yml', 'source-clickup-api/acceptance-test-config.yml', 'source-confluence/acceptance-test-config.yml', 'source-stripe/acceptance-test-config.yml', 'source-slack/acceptance-test-config.yml', 'source-recharge/acceptance-test-config.yml', 'source-google-pagespeed-insights/acceptance-test-config.yml', 'source-paystack/acceptance-test-config.yml', 'source-gcs/acceptance-test-config.yml', 'source-sentry/acceptance-test-config.yml', 'source-airtable/acceptance-test-config.yml', 'source-salesloft/acceptance-test-config.yml', 'source-apify-dataset/acceptance-test-config.yml', 'source-zendesk-talk/acceptance-test-config.yml', 'source-datadog/acceptance-test-config.yml', 'source-pinterest/acceptance-test-config.yml', 'source-hubspot/acceptance-test-config.yml', 'source-github/acceptance-test-config.yml', 'source-google-search-console/acceptance-test-config.yml', 'source-facebook-marketing/acceptance-test-config.yml', 'source-xero/acceptance-test-config.yml']
44

If we decide to align the comparison of records in basic_read and full_refresh or just remove it for full_refresh (see this Slack discussion), this would mean that:

  • remove exclude_fields from make_hashable
  • remove delete_fields

🚨 User Impact 🚨

This means that parts of the acceptance-test-config.yml are now outdated. There will be a follow up PR to clean them but we want to still deliver value with this.

Adding a lot of expected records for source-s3 (up to ID=20), we can generate this error:

INFO     detailed_logger acceptance_tests_logs/test_core.py__TestBasicRead__test_read[inputs0].txt:test_core.py:1168 Expected to have at least as many records than expected for stream test.
INFO     detailed_logger acceptance_tests_logs/test_core.py__TestBasicRead__test_read[inputs0].txt:test_core.py:1169 missing:
INFO     detailed_logger acceptance_tests_logs/test_core.py__TestBasicRead__test_read[inputs0].txt:conftest.py:344 [
 {
  "id": 10,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 12,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 17,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 19,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 3,
  "name": "asdf",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 1,
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 9,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 15,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 18,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 16,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 20,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 13,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 2,
  "toto": "godo",
  "name": "j4DyXTS7",
  "valid": true,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 14,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 11,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 }
]
INFO     detailed_logger acceptance_tests_logs/test_core.py__TestBasicRead__test_read[inputs0].txt:test_core.py:1171 expected:
INFO     detailed_logger acceptance_tests_logs/test_core.py__TestBasicRead__test_read[inputs0].txt:conftest.py:344 [
 {
  "id": 10,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 6,
  "name": "Le35Wyic",
  "valid": true,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 5,
  "name": "77h4aiMP",
  "valid": true,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 4,
  "name": "1q6jD8Np",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 12,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 17,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 7,
  "name": "xZhh1Kyl",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 19,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 8,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 3,
  "name": "asdf",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 1,
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 9,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 15,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 18,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 16,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 20,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 13,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 2,
  "toto": "godo",
  "name": "j4DyXTS7",
  "valid": true,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 14,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 11,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 }
]
INFO     detailed_logger acceptance_tests_logs/test_core.py__TestBasicRead__test_read[inputs0].txt:test_core.py:1173 actual:
INFO     detailed_logger acceptance_tests_logs/test_core.py__TestBasicRead__test_read[inputs0].txt:conftest.py:344 [
 {
  "id": 6,
  "name": "Le35Wyic",
  "valid": true,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 5,
  "name": "77h4aiMP",
  "valid": true,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 1,
  "name": "PVdhmjb1",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 4,
  "name": "1q6jD8Np",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 3,
  "name": "v0w8fTME",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 2,
  "name": "j4DyXTS7",
  "valid": true,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 7,
  "name": "xZhh1Kyl",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 8,
  "name": "M2t286iJ",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 }
]
INFO     detailed_logger acceptance_tests_logs/test_core.py__TestBasicRead__test_read[inputs0].txt:test_core.py:1175 extra:
INFO     detailed_logger acceptance_tests_logs/test_core.py__TestBasicRead__test_read[inputs0].txt:conftest.py:344 [
 {
  "id": 1,
  "name": "PVdhmjb1",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 3,
  "name": "v0w8fTME",
  "valid": false,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 },
 {
  "id": 2,
  "name": "j4DyXTS7",
  "valid": true,
  "_ab_source_file_last_modified": "2021-07-25T15:33:04.000000Z",
  "_ab_source_file_url": "simple_test.csv"
 }
]

Copy link

vercel bot commented Feb 27, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
airbyte-docs ⬜️ Ignored (Inspect) Visit Preview Feb 28, 2024 7:58pm

@maxi297 maxi297 changed the base branch from master to issue-35110/relax-cats-when-primary-key February 27, 2024 00:09
@octavia-squidington-iii octavia-squidington-iii added the area/documentation Improvements or additions to documentation label Feb 27, 2024
Base automatically changed from issue-35110/relax-cats-when-primary-key to master February 27, 2024 18:53
@@ -132,11 +115,6 @@ class IgnoredFieldsConfiguration(BaseConfig):
bypass_reason: Optional[str] = Field(default=None, description="Reason why this field is considered ignored.")


ignored_fields: Optional[Mapping[str, List[IgnoredFieldsConfiguration]]] = Field(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -214,7 +191,9 @@ class FullRefreshConfig(BaseConfig):
configured_catalog_path: Optional[str] = configured_catalog_path
timeout_seconds: int = timeout_seconds
deployment_mode: Optional[str] = deployment_mode
ignored_fields: Optional[Mapping[str, List[IgnoredFieldsConfiguration]]] = ignored_fields
ignored_fields: Optional[Mapping[str, List[IgnoredFieldsConfiguration]]] = Field(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "compare records" between "actual" and "expected_records" and between "first read" and "second read" is different hence why we are keeping this. It is worth wondering if those should be aligned but for now, we will assume this is out of scope for this change

@@ -888,13 +887,13 @@ def _validate_records_structure(records: List[AirbyteRecordMessage], configured_
), f" Record {record} from {record.stream} stream with fields {record_fields} should have some fields mentioned by json schema: {schema_pathes}"

@staticmethod
def _validate_schema(records: List[AirbyteRecordMessage], configured_catalog: ConfiguredAirbyteCatalog, fail_on_extra_columns: Boolean):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new field was removed as part of #35556 but was not cleaned up properly

@@ -1130,30 +1124,12 @@ async def test_airbyte_trace_message_on_failure(self, connector_config, inputs:

assert len(error_trace_messages) >= 1, "Connector should emit at least one error trace message"

@staticmethod
def remove_extra_fields(record: Any, spec: Any) -> Any:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't used but I don't know when the usage was removed

@maxi297 maxi297 marked this pull request as ready for review February 27, 2024 19:42
@maxi297 maxi297 requested review from a team and removed request for a team February 27, 2024 19:42
@maxi297 maxi297 requested review from alafanechere and brianjlai and removed request for lazebnyi and oustynova February 27, 2024 19:42
@maxi297 maxi297 marked this pull request as draft February 27, 2024 19:42
@maxi297 maxi297 marked this pull request as ready for review February 27, 2024 20:17
Copy link
Contributor

@brianjlai brianjlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just small things and responses to your comments

@maxi297 maxi297 merged commit 357c2d6 into master Feb 28, 2024
27 checks passed
@maxi297 maxi297 deleted the issue-35112/relax-cats-when-not-primary-key branch February 28, 2024 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants