Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AirbyteTraceMessage to Airbyte protocol #12458

Merged
merged 6 commits into from
May 3, 2022
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ definitions:
- SPEC
- CONNECTION_STATUS
- CATALOG
- TRACE
log:
description: "log message: any kind of logging you want the platform to know about."
"$ref": "#/definitions/AirbyteLogMessage"
Expand All @@ -43,6 +44,9 @@ definitions:
state:
description: "schema message: the state. Must be the last message produced. The platform uses this information"
"$ref": "#/definitions/AirbyteStateMessage"
trace:
description: "trace message: a message to provide human readable information to users"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is important to not call it human readable. It is really about providing structured information about the connector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps alternative language might be:

"trace message: a structured message to share information about connector and sync performance with external consumers"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or perhaps

"trace message: a structured message to communicate information about the status and performance of a connector"

I would be happy with either

"$ref": "#/definitions/AirbyteTraceMessage"
AirbyteRecordMessage:
type: object
additionalProperties: true
Expand Down Expand Up @@ -94,6 +98,45 @@ definitions:
message:
description: "the log message"
type: string
AirbyteTraceMessage:
type: object
additionalProperties: true
required:
- type
- emitted_at
properties:
type:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to leave a comment here suggesting that we use the json schema oneOf to implement the different types here instead of having a type property, and a separate property for each trace message type. However, I found this comment in our code base which indicates that jsonschema2pojo does not support oneOf. Since we use that to generate pojos for our platform, this means we can't use that here, so we'll have to stick with this type enum approach.

description: "the type of trace message"
type: string
enum:
- ERROR
emitted_at:
description: "the time in ms that the message was emitted"
type: number
error:
description: "error data from the sync"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: "error data from the sync"
description: "error trace message: the error object"

Suggesting this description change to use similar wording as the AirbyteMessage's various sub-type descriptions, since we have a similar structure here

"$ref": "#/definitions/AirbyteErrorTraceMessage"
AirbyteErrorTraceMessage:
type: object
additionalProperties: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 on additionalProperties - The thought was that we might take any additional properties and append them to the FailureReason's "metadata" property.

cc @pedroslopez

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recap from discussion on the doc: this will not be the case - additionalProperties is true so that it will be ok if a connector does erroneously set an additional property, and it's easier to make backwards-compatible changes to the protocol later.

No further properties are expected to be included in AirbyteErrorTraceMessage and the connector logs should give additional context.

required:
- message
properties:
message:
description: The human readable message that will be shown to the user in the UI
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove the mention of the UI? the protocol doesn't know anything about the UI.

For the protocol changes, it is important to not mix the what the data is with how it is being used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: The human readable message that will be shown to the user in the UI
description: A user-friendly message that indicates the cause of the error

Maybe using the wording user-friendly here is a good middle ground, as it indicates we want this to be understandable by users of the system, without referencing another part of the platform (UI)

type: string
internal_message:
description: The less human readable, more technical message showing the reason for the failure, e.g. the first line of a stack trace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can probably remove the concept of human readability. Also, why do we need this field? The example you're giving mentions the stack grace so I assume we could just read the stack_trace field first line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. Evan and I discussed this in the spec - @evantahler I think you were thinking that it's not necessarily going to be the first line of a stack trace and could be some other kind of message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the conversation again - @evantahler maybe you only wanted this field for AirbyteTraceMessages that are not errors? https://docs.google.com/document/d/1ctrj3Yh_GjtQ93aND-WH3ocqGxsmxyC3jfiarrF6NY0/edit?disco=AAAAYr7ZeRQ

Copy link
Contributor

@evantahler evantahler Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is backwards thinking, but the inspiration for the message once again comes from our (future) usage of these messages to create FailureReasons, and we've got all 3: message, internal_message, and stack_trace in FailureReasons.

I do think there is value for message, internal_message, and stack_trace. An error like

message: API Key not accepted - check your credentials
internal_message: 403: Access Denied
stack_trace: "403: Access Denied" from connect() on line 101...

could come from a number of different places in the code (really, any time we make an HTTP request). Analyzing these by internal message (which will always be the same) even though they come from different stack traces will be helpful. Grouping by the full stack trace without doing some extraction won't be helpful. The "message" might be decorated better to say "...check your username" vs "... check your apiKey" if we are able to do that.

Also, I think the connector, using the same programming language that generated the stack trace, is the best place to "extract" the most meaningful message from the stack, rather than the platform/airbyte-worker trying to do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on Evan's comment I think internal_message makes sense to keep, but I agree that we should remove less human readable from the description, and probably also remove e.g. the first line of a stack trace. Intead this could more simple like The message of the internal error that caused the failure or something along those lines

type: string
stack_trace:
description: The full stack trace of the error
type: string
failure_type:
description: The type of error
type: string
enum:
- SYSTEM_ERROR
- CONFIG_ERROR
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we will need to convert between the POJO type generated from this json schema to the type generated by the FailureReason json schema, it should be pretty trivial to lowercase these values when performing the conversion.

However, given that there are already other lowercase string enum values in this protocol file (e.g. SyncMode), we might as well make these enum values lowercase as well to have them be consistent with the FailureReason values.

AirbyteConnectionStatus:
description: Airbyte connection status
type: object
Expand Down