-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(project): Check the relay behavior for invalid states #1767
Conversation
Add integration test for project config: * to check the behavior of the relay when we get the invalid project from upstream - the invalid state is saved in the ProjectCache and will be using till it expires - the inconming events are dropped because of the invalid state * to make sure we recover when the project expires and the new one we get is valid one - once the old state expired the new one is saved in the ProjectCache - the events can be sent again and they are properly processed
time.sleep(0.5) | ||
assert {str(e) for _, e in mini_sentry.test_failures} == { | ||
f"Relay sent us event: error fetching project state {public_key}: missing field `type`", | ||
"Relay sent us event: dropped envelope: invalid data (project_state)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably out of scope for this PR, but why do we drop events here? Shouldn't relay keep events queued for a while, until it can either get a valid project config or some timeout expires?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because in
relay/relay-server/src/actors/processor.rs
Line 2142 in 9808ea3
let mut state = self.prepare_state(message)?; |
prepare_state
function we are unable to find the project id and immediately reject the envelope relay/relay-server/src/actors/processor.rs
Line 1206 in 9808ea3
envelope_context.reject(Outcome::Invalid(DiscardReason::Internal)); |
The project id wasn't set, because we had an empty cache and then we got broken project state, where we could not parse the config, so we set the state to errored
, with basically all fields in the struct empty by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I guess that makes sense. @jan-auer is this behavior we would like to change in the future? That is, if a project does not have a previous entry and receives a broken one, leave its envelopes in the queue until the project config is repaired or we reach a timeout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed #1787 to keep track of this. Note that it's not the EnvelopeProcessor, but the project cache where the envelopes get dropped.
data = get_response(relay, packed, signature) | ||
assert {str(e) for _, e in mini_sentry.test_failures} == { | ||
f"Relay sent us event: error fetching project state {public_key}: missing field `type`", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add another sleep here that exceeds the grace period, and then assert that after that period, the event is dropped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's possible yeah, it's will just increase the wait time in integration tests.
I will lower the grace period then and will try to test grace period for expiration as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor note about shaving off a few seconds, but this looks good to me!
@@ -419,3 +419,16 @@ def test_cached_project_config(mini_sentry, relay): | |||
|
|||
assert data["configs"][public_key]["projectId"] == project_key | |||
assert not data["configs"][public_key]["disabled"] | |||
|
|||
# Wait till grace period expires as well and we should be dropping events now. | |||
time.sleep(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we shorten this sleep
by settings project_expiry
to 1 and project_grace_period
to 2? Or would that make the test flaky?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's why I gave it longer time here, since it depends on how fast you computer is, it can make this test flaky, which I would like to avoid.
Once we port those integration tests to Rust we could make this work and look much more elegant.
Add integration test for project config:
And the second test makes sure to check that relay uses internal cache and re-uses stale state if the new one is invalid.
ref: #1758
#skip-changelog