Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDK: emit AirbyteTraceMessage with exception trace information #12593

Merged
merged 12 commits into from
May 6, 2022

Conversation

pedroslopez
Copy link
Contributor

@pedroslopez pedroslopez commented May 5, 2022

What

AirbyteTraceMessage has been introduced in the protocol to allow better debugging and attribution of failures when something in the connector goes wrong.

This change starts emitting AirbyteTraceMessages generically whenever an exception is raised. A future PR will make it easier to return better user-friendly messages.

closes #12472

How

  • Use sys.excepthook to register a global handler for uncaught exceptions. This handler is in charge of emitting the generic AirbyteTraceMessages.
  • Introduce an AirbyteTracedException class, which can be used to customize the trace message and provide better error messages.
  • Refactor the logger's secret filtering logic out into an AirbyteSecretHelper so it can be re-used to filter out secrets from trace messages.

Recommended reading order

  1. utils/traced_exception.py
  2. exception_handler.py
  3. tests

@github-actions github-actions bot added the CDK Connector Development Kit label May 5, 2022
Comment on lines 23 to 36
class AirbyteSecretHelper:
_secrets: List[str] = []

@classmethod
def update_secrets(cls, secrets: List[str]):
"""Update the list of secrets to be replaced"""
cls._secrets = secrets

@classmethod
def filter_secrets(cls, string: str) -> str:
"""Filter secrets from a string by replacing them with ****"""
for secret in cls._secrets:
string = string.replace(secret, "****")
return string
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved out the logic from AirbyteLogFormatter so it can be re-used (using this for trace messages now as well)

type=TraceType.ERROR,
emitted_at=now_millis,
error=AirbyteErrorTraceMessage(
message=self.message or "an error occurred with the source",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any suggestions on what this generic default message should be are welcome 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't necessarily source, could also be destination.

How about Something went wrong in the connector. See the logs for more details. ?
Pointing them at the logs seems sensible given that this is always going to be non-specific at this level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any environment variables or system arguments we can use to know if we are a source or destination? e.g. looking at the docker container name (source-mailchimp)?

Comment on lines 26 to 30
# emit an AirbyteTraceMessage for any exception that gets to this spot
traced_exc = exception_value if exception_type == AirbyteTracedException else AirbyteTracedException.from_exception(exception_value)
message = traced_exc.as_airbyte_message()
message_json = message.json(exclude_unset=True)
print(AirbyteSecretHelper.filter_secrets(message_json))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main part that is in charge of emitting the AirbyteTraceMessage: If the exception we get is already an AirbyteTracedException (which can be thrown from anywhere with appropriate messages and fields) we use it, if not then we build one from the exception. This is then turned into an AirbyteTraceMessage and filtered for secret values.

@pedroslopez pedroslopez requested a review from Phlair May 5, 2022 03:29
Copy link
Contributor

@Phlair Phlair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Mostly just comments on being pythonic, one open question around the AirbyteLogger which should be resolved before approve I think.

airbyte-cdk/python/airbyte_cdk/exception_handler.py Outdated Show resolved Hide resolved
type=TraceType.ERROR,
emitted_at=now_millis,
error=AirbyteErrorTraceMessage(
message=self.message or "an error occurred with the source",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't necessarily source, could also be destination.

How about Something went wrong in the connector. See the logs for more details. ?
Pointing them at the logs seems sensible given that this is always going to be non-specific at this level.

@CLAassistant
Copy link

CLAassistant commented May 5, 2022

CLA assistant check
All committers have signed the CLA.

super().__init__(internal_message)

def as_airbyte_message(self) -> AirbyteMessage:
now_millis = int(datetime.now().timestamp() * 1000)
Copy link
Contributor

@Phlair Phlair May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a Double in the protocol so should technically be a float here in python, but I'm not sure it would actually matter

Copy link
Contributor Author

@pedroslopez pedroslopez May 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting - the RecordMessage's emitted_at is an int 🤔 good catch!

changed in 65543d1

@pedroslopez pedroslopez marked this pull request as ready for review May 5, 2022 16:11
Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to other's Pythonic notes, but 👍 from me! The ergonomic suggestion is non-blocking.

message_json = message.json(exclude_unset=True)
print(filter_secrets(message_json))

sys.excepthook = hook_fn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool - like Node's process.on('uncaughtException')

airbyte-cdk/python/airbyte_cdk/exception_handler.py Outdated Show resolved Hide resolved
Comment on lines +41 to +42
with pytest.raises(subprocess.CalledProcessError) as err:
subprocess.check_output([sys.executable, "-c", cmd], stderr=subprocess.STDOUT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a neat way to test that a process crashed!

Copy link
Contributor

@Phlair Phlair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice one 👍

@pedroslopez
Copy link
Contributor Author

pedroslopez commented May 6, 2022

/publish-cdk dry-run=true

🕑 https://github.com/airbytehq/airbyte/actions/runs/2283881883
https://github.com/airbytehq/airbyte/actions/runs/2283881883

@pedroslopez pedroslopez force-pushed the pedroslopez/cdk-airbytetracemessage branch from 313b445 to b00fc78 Compare May 6, 2022 20:40
@pedroslopez
Copy link
Contributor Author

pedroslopez commented May 6, 2022

/publish-cdk dry-run=false

🕑 https://github.com/airbytehq/airbyte/actions/runs/2284023169
https://github.com/airbytehq/airbyte/actions/runs/2284023169

@pedroslopez pedroslopez merged commit 73c7fad into master May 6, 2022
@pedroslopez pedroslopez deleted the pedroslopez/cdk-airbytetracemessage branch May 6, 2022 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CDK Connector Development Kit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update the Python CDK to emit AirbyteTraceMessage
4 participants