-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add basic schema and examples for ingest-events [INC-660] #230
Conversation
Adding the schema for ingest-events so we can perform more validations against clients (producers + consumers) on this topic. Specifically, the are 4 known immediate use cases for this: - Exceptions raised in the ingest consumer can be checked against this schema and DLQed if they do not conform - Relay CI can check that messages produced on this topic conform to this schema - Sentry devserver will validate all incoming messages against this schema - Sentry CI can run these examples against consumer code
@@ -0,0 +1,14 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To double check: are there other message types sharing this topic right now? User feedback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Until we have migrated relay, SDK, etc to ingest-feedback-events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I compared this to the example you added in #227.
event_id and type is there, but you have no project_id or payload in your schema? Is that actually true? How do you know what project the user feedback belongs to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I based my fields on this https://develop.sentry.dev/sdk/event-payloads/#required-attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is probably project_id in every event, and I can add that. But don't want to make it required, since I don't think it's required for feedback events. (In fact it turns out the feedback fields themselves are all optional 😅)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link you sent represents the payload that the user sends to Relay though, not the one received on this topic which happens after Relay processing. I assume the project ID is automatically added by Relay and is not optional, as a user feedback that doesn't belong to any project doesn't really make any sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at https://github.com/getsentry/sentry/blob/a1c47440d4399f571fe0161b441811d48148e43c/src/sentry/ingest/consumer/simple_event.py#L42 it seems like a message without a project_id would crash the consumer
"properties": { | ||
"type": { "type": "string" }, | ||
"payload": { | ||
"description": "bytes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "description": "bytes"
purely information for the human reader, or do we have a validator that special-cases this for msgpack messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only for humans, this is not doing any validation
"payload": { | ||
"description": "bytes" | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON schema allows additionalProperties by default, right? Because Relay does send more fields: https://github.com/getsentry/relay/blob/87b23e6953cd18a0d591c9d13e43baee0d0d4b95/relay-server/src/services/store.rs#L1268-L1271
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does -- the example included here also has those extra fields and is passing the tests
This PR depends on getsentry/sentry-kafka-schemas#230 It attempts to be somewhat cautious about DLQing and not DLQ anything that is even potentially retriable. This could be tweaked later. If a non-retriable exception is raised, the consumer tries to further determine whether the message is actually bad (or it's some transient retriable error) by validating the message against the schema. The message will be DLQed only if that also fails.
This PR depends on getsentry/sentry-kafka-schemas#230 It attempts to be somewhat cautious about DLQing and not DLQ anything that is even potentially retriable. This could be tweaked later. If a non-retriable exception is raised, the consumer tries to further determine whether the message is actually bad (or it's some transient retriable error) by validating the message against the schema. The message will be DLQed only if that also fails.
Adding the schema for ingest-events so we can perform more validations against clients (producers + consumers) on this topic.
Specifically, the are 4 known immediate use cases for this:
This is based on https://github.com/getsentry/relay/blob/19230c0deabd41096830e46fbec821999dd51ee3/relay-server/src/services/store.rs#L1246-L1261