-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ITTAPI hooks to jl_mutex_wait #49434
Conversation
ceb3d71
to
ff5dc51
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@topolarity you might want to take a look at this for tracy integration
@pchintalapudi can you split this into two PRs? |
63a9d43
to
bac140d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pchintalapudi Can you also share some of the screenshots we looked at earlier?
The profiling events have been split into #49448. |
Thanks for this work! The bulk of the changes look great to me, although I did request a bit of re-factoring to better align with Nice to see some example data already too 🙂 Out of curiosity/ignorance, are any of these mutexes considerably "hot" or are they rare enough that we essentially don't need to worry about overhead? P.S. Can you think of any significant locks that this change leaves out? I believe I/O has non-jl_mutex_t locks, or am I mistaken? |
src/threading.c
Outdated
|
||
#ifdef USE_ITTAPI | ||
|
||
#define profile_lock_init(lock, name) __itt_sync_create(lock, "jl_mutex_t", name, __itt_attr_mutex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to move these definitions into timing.c/.h
?
Ideally this would follow the conventions established there to provide backend-specific implementations (in particular, it should be possible to turn on multiple back-end implementations at once)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also OK if these stay local to threading.c
, but I would like to try to support multiple back-ends being enabled at once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe @vchuravy preferred them as macros here specifically to avoid them showing up in the profile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would a static inline function fix that problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah... Let's go with the static_inline
I/O and task switching have uv_mutex_t locks, but VTune at least picks those up automatically by tapping pthread_mutex_wait. |
WRT hot locks, this workload was an intentionally bad one for the codegen lock, and I've seen the method's write lock get heavily contended when type inference runs in parallel trying to infer the same method with different arguments. |
I think I'm less worried about contention (which I'd want the profiler to show) and more worried about hot locks that are uncontended, where the profiler overhead might be significant relative to the lock / critical section |
I can't think of a case where we acquire and release a lock in a hot loop, but if there was such a case I think it would be better to fix the acquire and release in the user code? We could also just insert a single trial of the cmpxchg that doesn't run the profiler to skip out on the uncontended case as well. |
Yeah, that's a nice idea - I think there's a lot of effective tricks we could do to catch exceptional events for very hot locks. If our existing locks aren't that hot though, let's just record all the events for now. If the overhead becomes suspect later, we can go back and try to use some tricks to reduce it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good on my end 👍
This allows identifying our
jl_mutex_t
s as mutexes by VTune