Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: SDKs report file I/O on the main thread #40

Merged
merged 21 commits into from
Dec 15, 2022

Conversation

philipphofmann
Copy link
Member

@philipphofmann philipphofmann commented Nov 28, 2022

This RFC aims to revisit the decision that Ingest creates file I/O on the main thread performance issues and not the SDKs.

Rendered RFC

The aim of this RFC is to revisit the decision that
Ingest creates performance issues, not SDKs.
@philipphofmann philipphofmann marked this pull request as draft November 28, 2022 17:39
text/0039-sdk-performance-issue-creation.md Outdated Show resolved Hide resolved
text/0039-sdk-performance-issue-creation.md Outdated Show resolved Hide resolved
text/0039-sdk-performance-issue-creation.md Outdated Show resolved Hide resolved
text/0039-sdk-performance-issue-creation.md Outdated Show resolved Hide resolved
Co-authored-by: Adam McKerlie <adammckerlie@gmail.com>
- Address some of Adam's comments
text/0039-sdk-performance-issue-creation.md Outdated Show resolved Hide resolved
text/0039-sdk-performance-issue-creation.md Outdated Show resolved Hide resolved
text/0039-sdk-performance-issue-creation.md Outdated Show resolved Hide resolved
text/0039-sdk-performance-issue-creation.md Outdated Show resolved Hide resolved
text/0039-sdk-performance-issue-creation.md Outdated Show resolved Hide resolved
2. Double dipping quotas, sending the transaction and the error created within that transaction.
3. Not able to use dynamic thresholds and configurations, code changes would be required to update settings.
4. Stack traces are inherently expensive to process
5. Detection is mixed between ingest and SDK

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling is detection is going to happen in many different ways over time (starting as soon as we want to detect something that doesn't fit the span detector model) - this might be inevitable (and therefore not a large concern).


1. Users have to enable performance monitoring.
2. A transaction is required to be running while a performance issue is happening.
3. Transactions are high volume which means we may send too many stack traces, so we can't have stacktraces only the span that caused the issue. Which makes it harder to identify the root cause.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have stacktraces at all in transactions, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The protocol would support it, though, as transactions are events.

@philipphofmann philipphofmann changed the title rfc: SDK Perfomance Issue Creation rfc: SDKs report file I/O on the main thread Nov 29, 2022
@philipphofmann
Copy link
Member Author

I changed the RFC quite a bit by adding an extra option. I have a hard time finding pros for option 2. @mjq-sentry, as you advocated for that (#40 (comment)), could you please point out some pros?

@philipphofmann philipphofmann marked this pull request as ready for review November 29, 2022 16:05

### Cons <a name="option-2-cons"></a>

1. Need for per-SDK rollout.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case is that just 2 SDKs (Android and Cocoa) or would it be more? (don't know if e.g. react-native would need its own implementation as well)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That applies to every mobile or desktop SDK. As far as I know, RN, Flutter, and .NET would also need implementations.

@silent1mezzo
Copy link
Contributor

So, there are a few things that were outlined in the original decision that never made it to long-term storage that I think is necessary for this discussion.

  1. Performance issues differ from error issues in the sense that they don't cause an exception or otherwise stop the code from running and have a perceived performance impact on end-users.
  2. Stack traces aren't available on most of our supported platforms, and even when it is, its generally too expensive to run against transactions. It's also not required since we can show spans to point people to the issue.
  3. It's not possible to scale performance issues across the SDKs we support. Having it in Ingest (and eventually Relay) means there's a single point to build these detectors.

While we are thinking of FIOMT primarily on mobile it's also possible on desktop applications and javascript (node). Building the detector at the Ingest layer means it could detect any SDK that sends a span with blocked_main_thread: true as outlined in #36.

@philipphofmann
Copy link
Member Author

Thanks, @silent1mezzo, I added your input to the RFC with 75962ad

@silent1mezzo
Copy link
Contributor

silent1mezzo commented Dec 2, 2022

After reading through the RFC and discussion with a number of engineers the best option for File I/O on the main thread performance issues is Option 3: Ingest reports FIOMT as performance issues.

Option 1 was ruled out as this is not an error issue as per the definition I mentioned.

Option 3 was chosen over Option 2 because we can easily detect it via spans (PR is already complete) and the problem exists across multiple SDKs (iOS, Android, RN, Flutter, .NET and possibly more). There's also a likely future where FIOMT is detected via profiles and running detection in the ingestion pipeline will help combine Performance and Profiling issues.

@philipphofmann
Copy link
Member Author

@silent1mezzo, please review the documented decision and either give feedback or approve this PR. Thanks 🙏

Copy link
Contributor

@silent1mezzo silent1mezzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good

@philipphofmann philipphofmann merged commit 8acac5f into main Dec 15, 2022
@philipphofmann philipphofmann deleted the rfc/sdk-perfomance-issues branch December 15, 2022 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants