Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Safe-navigate to session flusher #2396

Merged
merged 1 commit into from
Dec 5, 2024

Conversation

ixti
Copy link
Contributor

@ixti ixti commented Sep 2, 2024

Sentry.session_flusher is nil-able. And under some circumstances (which I still can't figure out) in specs it's nil causing our spec suite randomly (but as of recent, pretty all the time) fail.

Example failure

     Failure/Error: post(path, env:, headers:, xhr: true)
     
     NoMethodError:
       undefined method `add_session' for nil
     # /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry/hub.rb:251:in `end_session'
     # /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry/hub.rb:260:in `with_session_tracking'
     # /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry-ruby.rb:411:in `with_session_tracking'
     # /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry/rack/capture_exceptions.rb:21:in `block in call'
     # /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry/hub.rb:59:in `with_scope'
     # /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry-ruby.rb:391:in `with_scope'
     # /home/ixti/.gem/ruby/3.3.4/gems/sentry-ruby-5.19.0/lib/sentry/rack/capture_exceptions.rb:20:in `call'
     # /home/ixti/.gem/ruby/3.3.4/gems/rails_semantic_logger-4.17.0/lib/rails_semantic_logger/rack/logger.rb:45:in `call_app'
     # /home/ixti/.gem/ruby/3.3.4/gems/rails_semantic_logger-4.17.0/lib/rails_semantic_logger/rack/logger.rb:26:in `block in call'
     # /home/ixti/.gem/ruby/3.3.4/bundler/gems/semantic_logger-093c0258b053/lib/semantic_logger/base.rb:190:in `block in tagged'
     # /home/ixti/.gem/ruby/3.3.4/bundler/gems/semantic_logger-093c0258b053/lib/semantic_logger/semantic_logger.rb:395:in `named_tagged'
     # /home/ixti/.gem/ruby/3.3.4/bundler/gems/semantic_logger-093c0258b053/lib/semantic_logger/base.rb:197:in `tagged'
     # /home/ixti/.gem/ruby/3.3.4/gems/rails_semantic_logger-4.17.0/lib/rails_semantic_logger/rack/logger.rb:26:in `call'
     # /home/ixti/.gem/ruby/3.3.4/gems/rack-3.1.7/lib/rack/runtime.rb:24:in `call'
     # /home/ixti/.gem/ruby/3.3.4/gems/rack-3.1.7/lib/rack/sendfile.rb:114:in `call'
     # /home/ixti/.gem/ruby/3.3.4/gems/rack-cors-2.0.2/lib/rack/cors.rb:102:in `call'
     # /home/ixti/.gem/ruby/3.3.4/gems/railties-7.2.1/lib/rails/engine.rb:535:in `call'
     # /home/ixti/.gem/ruby/3.3.4/gems/rack-test-2.1.0/lib/rack/test.rb:360:in `process_request'
     # /home/ixti/.gem/ruby/3.3.4/gems/rack-test-2.1.0/lib/rack/test.rb:153:in `request'
     # ./spec/support/helpers/protobufable_test_helper.rb:76:in `rpc'
     # ./spec/requests/asgard/auth_controller_spec.rb:220:in `block (4 levels) in <top (required)>'
     # /home/ixti/.gem/ruby/3.3.4/gems/webmock-3.23.1/lib/webmock/rspec.rb:39:in `block (2 levels) in <top (required)>'

RSpec Config (Sentry related)

require "sentry/test_helper"

RSpec.configure do |config|
  config.include Sentry::TestHelper
  config.before { setup_sentry_test }
  config.after { teardown_sentry_test }
end

Workaround

As a workaround, we just disabled session tracking:

Sentry.init do |config|
  # ...

  # See: https://github.com/getsentry/sentry-ruby/pull/2337
  config.excluded_exceptions << "Sidekiq::JobRetry::Handled"

  # See: https://github.com/getsentry/sentry-ruby/pull/2396
  config.auto_session_tracking = false

  # ...
end

Resolves: #2378

@ixti
Copy link
Contributor Author

ixti commented Sep 2, 2024

We tried and it still fails:

require "sentry/test_helper"

RSpec.configure do |config|
  config.include Sentry::TestHelper

  config.before do
    setup_sentry_test { _1.auto_session_tracking = true }
  end

  config.after { teardown_sentry_test }
end

@st0012
Copy link
Collaborator

st0012 commented Sep 15, 2024

Thanks for reporting the issue. It does look very puzzling as the only method that could nullify the session flusher is Sentry.close. Can you do a quick search to see if it's called anywhere in the test suite or codebase?

@MrSerth
Copy link
Contributor

MrSerth commented Oct 10, 2024

I am experiencing the same issue on production and hence reported #2378. Currently, the issue appears rather frequent for us, but not for each request:

Bildschirmfoto 2024-10-10 um 20 15 26

Is there any chance I could help to get this merged and released? Without a fix, each occurrence ends up with a 500 Internal Server Error and hides actually errors. I would rather miss a session tracing than having those errors.

In our app, we never call Sentry.close ourselves.

@ixti
Copy link
Contributor Author

ixti commented Oct 18, 2024

Thanks for reporting the issue. It does look very puzzling as the only method that could nullify the session flusher is Sentry.close. Can you do a quick search to see if it's called anywhere in the test suite or codebase?

We don't call Sentry.close anywhere.

@solnic
Copy link
Collaborator

solnic commented Oct 18, 2024

Could you grab sentry from solnic/make-closing-thread-safe branch and see if the crash is gone?

@MrSerth
Copy link
Contributor

MrSerth commented Oct 19, 2024

Could you grab sentry from solnic/make-closing-thread-safe branch and see if the crash is gone?

I tried the branch, but since we modified the codebase (where this issue normally occurred) a lot recently, I wasn't able to confirm the fix is working. Sorry for not being able to help out here. Maybe it's still valid to merge this PR using the safe navigation operator?

@ixti
Copy link
Contributor Author

ixti commented Oct 27, 2024

We also took a workaround way by disabling session tracking completely. And even if we go back, the bug is pretty sporadic, to being able to tell if the patch in solnic/make-closing-thread-safe fixes the bug or no.

@solnic
Copy link
Collaborator

solnic commented Oct 28, 2024

Thanks folks. Would it be possible for you to disable your workarounds after we release a new version and see if it helped. The proposed fix here comes with a risk that it will silence actual failures where the session_flusher should really be there. It's kinda hard to tell which option would be worse but given that the fix addresses an edge case by making a change that may result in breaking more common cases (an actual misconfiguration of Sentry), then it's probably better not to apply the fix.

@MrSerth
Copy link
Contributor

MrSerth commented Oct 28, 2024

As written above, I already tested the branch and partially reverted some of the code changes I considered to have the most impact on this issue.

While I understand your concern, this issue (and especially the one discussed in #2428) may hide actually bugs from our code base as Sentry customers -- that happened at least for us. We would normally trust in the Sentry error reporting, but when this bug occurs, we lack any notification and insights on actual defects of our software. Hence, from our perspective, it is far less critical to miss some sessions than to miss the actual bug that occurred.

@solnic
Copy link
Collaborator

solnic commented Oct 28, 2024

As written above, I already tested the branch and partially reverted some of the code changes I considered to have the most impact on this issue.

Gotcha, thank you!

While I understand your concern, this issue (and especially the one discussed in #2428) may hide actually bugs from our code base as Sentry customers -- that happened at least for us. We would normally trust in the Sentry error reporting, but when this bug occurs, we lack any notification and insights on actual defects of our software. Hence, from our perspective, it is far less critical to miss some sessions than to miss the actual bug that occurred.

Alright, given that Hub is going away eventually (see #1495 (comment)) we can just rely on the workarounds here and in your PR for the time being. Thanks for all your input and help with this btw!

@@ -248,7 +248,7 @@ def end_session

return unless session
session.close
Sentry.session_flusher.add_session(session)
Sentry.session_flusher&.add_session(session)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment pointing to the related issue so that it's clear why we're doing this? 🙏🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do so shortly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@ixti
Copy link
Contributor Author

ixti commented Oct 29, 2024

Would it be possible for you to disable your workarounds after we release a new version and see if it helped.

Sure, we will give it a try!

@ixti ixti force-pushed the fix-end-session-on-sessionless-hub branch from 666db39 to 7bf8c9e Compare November 7, 2024 18:41
@ixti ixti requested a review from solnic November 13, 2024 01:22
@Vagab
Copy link
Contributor

Vagab commented Nov 13, 2024

hey guys. So we've been running into NoMethodError add_session for nil for quite some time now, but the errors were intermittent.
Recently we were able to reproduce it with one seed and 3 specs. I haven't tried to reproduce it in a blank app/script yet, but the general problem I'm seeing is that if we're running setup_sentry_test before any request spec is ran then for every request spec that follows we'll get this error. If any request spec runs before the spec that does setup_sentry_test then we don't get the error at all. Which explains why it was intermittent for us. But it also tells me that the issue is not with threads, I think(cc @solnic), since it will always happen if the specs are ran in the order I mentioned above.
I have no idea if there's any connection between when sentry is initialized and request specs, or if the connection is unique to our project.
As a workaround, this:

before do
  setup_sentry_test { |it| it.auto_session_tracking = false }
end

works, but meh.
I should also mention that the issue persists on master branch.
I will try to create a simple reproduction and/or a fix, but just wanted to provide some info in case it can help anyone 👍

Let me know if I can be of any further assistance here

edit: another bit of info. It seems that if we remove config.enabled_environments = %w[production] from sentry.rb then all is well somehow 🤔

edit2: 2 specs are enough too

@solnic
Copy link
Collaborator

solnic commented Nov 14, 2024

@Vegab oh these are great insights, thank you! I'll look into this again

Copy link
Collaborator

@solnic solnic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK let's get this workaround merged. I could not reproduce this so we still don't really know what the culprit is. I've been running a sample test suite with auto_session_tracking for hours now and it just keeps passing 🙃 These type of issues will go away eventually once Hub situation is solved.

Copy link

codecov bot commented Nov 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.17%. Comparing base (ceba0da) to head (7bf8c9e).
Report is 12 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2396      +/-   ##
==========================================
+ Coverage   98.13%   98.17%   +0.04%     
==========================================
  Files         126      128       +2     
  Lines        4761     4829      +68     
==========================================
+ Hits         4672     4741      +69     
+ Misses         89       88       -1     
Components Coverage Δ
sentry-ruby 98.57% <100.00%> (+0.05%) ⬆️
sentry-rails 97.08% <ø> (+0.02%) ⬆️
sentry-sidekiq 96.96% <ø> (ø)
sentry-resque 92.85% <ø> (ø)
sentry-delayed_job 95.65% <ø> (ø)
sentry-opentelemetry 99.31% <ø> (ø)
Files with missing lines Coverage Δ
sentry-ruby/lib/sentry/hub.rb 98.87% <100.00%> (ø)

... and 6 files with indirect coverage changes

@solnic solnic merged commit 3d2e5e6 into getsentry:master Dec 5, 2024
136 of 141 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NoMethodError: undefined method add_session for nil
5 participants