-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Fix the segfault from Opencensus upon shutdown #36906
Conversation
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
@@ -146,7 +146,6 @@ CoreWorkerProcessImpl::CoreWorkerProcessImpl(const CoreWorkerOptions &options) | |||
|
|||
CoreWorkerProcessImpl::~CoreWorkerProcessImpl() { | |||
RAY_LOG(INFO) << "Destructing CoreWorkerProcessImpl. pid: " << getpid(); | |||
RAY_LOG(DEBUG) << "Stats stop in core worker."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we added the equivalent logs to stats::Shutdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we do benchmarks on it?
And looks like from the stacktrace, the SIGSEV is on the request object? Might be an opencensus bug?
We are sending a request actually, and the problem is from protobuf, so it is unlikely related to Opencensus. It should be something shutdown ordering issue. I am not sure if my PR will actually fix the following issue, but client is not protected properly, and we should fix it anyway. |
Seems like client_ in the metrics exporter is not lock-protected, but it is accessed by 2 thread (by a metrics export thread + task execution thread when ExportNow is called). I added a lock. I am not 100% sure if this will fix the issue because I was unable to reproduce the issue, but we can ask the user to try after merging this PR.
I am a bit concerned of the potential perf impact from the locks? |
@rickyyx the report code happens in a separate thread, and the main thread is only locked upon exit, so it should be okay. Let me know if you have other parts that could be problematic... |
@rkooo567 is there a cherry-pick PR? Thanks |
) Seems like client_ in the metrics exporter is not lock-protected, but it is accessed by 2 thread (by a metrics export thread + task execution thread when ExportNow is called). I added a lock. I am not 100% sure if this will fix the issue because I was unable to reproduce the issue, but we can ask the user to try after merging this PR.
created here; #37311 |
Seems like client_ in the metrics exporter is not lock-protected, but it is accessed by 2 thread (by a metrics export thread + task execution thread when ExportNow is called). I added a lock. I am not 100% sure if this will fix the issue because I was unable to reproduce the issue, but we can ask the user to try after merging this PR.
) Seems like client_ in the metrics exporter is not lock-protected, but it is accessed by 2 thread (by a metrics export thread + task execution thread when ExportNow is called). I added a lock. I am not 100% sure if this will fix the issue because I was unable to reproduce the issue, but we can ask the user to try after merging this PR. Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
Why are these changes needed?
Seems like client_ in the metrics exporter is not lock-protected, but it is accessed by 2 thread (by a metrics export thread + task execution thread when ExportNow is called). I added a lock. I am not 100% sure if this will fix the issue because I was unable to reproduce the issue, but we can ask the user to try after merging this PR.
Related issue number
Fixes #36885
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.