Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN EP] Fix multithread sync bug in ETW callback #23156

Merged
merged 1 commit into from
Dec 23, 2024

Conversation

adrianlizarraga
Copy link
Contributor

@adrianlizarraga adrianlizarraga commented Dec 19, 2024

Description

Fixes crash in QNN dlls when an ETW callback tries to change the QNN log level. This is caused by a function that does not lock a mutex before modifying the QNN log level.

Motivation and Context

An ETW callback into QNN EP leads to a crash within QNN SDK dlls. It happens approximately 1 out of 3 full QNN unit tests runs.

The cause is a multithreading synchronization bug in QNN EP. We're not always locking a mutex when ETW calls QNN EP to notify of ETW config change.

There are two branches in the QNN EP callback function that try to update the QNN log handle. One branch correctly locks a mutex, but other does not lock it at all. This causes crashes within QNN dlls.

The fix is to lock the mutex in both paths.

@adrianlizarraga adrianlizarraga marked this pull request as ready for review December 19, 2024 20:50
@adrianlizarraga
Copy link
Contributor Author

Hi @ivberg, would appreciate your review for this ETW bug fix for QNN EP if able.

@adrianlizarraga adrianlizarraga merged commit 81cd6ea into main Dec 23, 2024
96 checks passed
@adrianlizarraga adrianlizarraga deleted the adrianl/qnn-etw-log-callback-thread-sync branch December 23, 2024 18:02
tarekziade pushed a commit to tarekziade/onnxruntime that referenced this pull request Jan 10, 2025
### Description

Fixes crash in QNN dlls when an ETW callback tries to change the QNN log
level. This is caused by a function that does not lock a mutex before
modifying the QNN log level.

### Motivation and Context
An ETW callback into QNN EP leads to a crash within QNN SDK dlls. It
happens approximately 1 out of 3 full QNN unit tests runs.

The cause is a multithreading synchronization bug in QNN EP. We're not
always locking a mutex when ETW calls QNN EP to notify of ETW config
change.
 
There are two branches in the QNN EP callback function that try to
update the QNN log handle. One branch correctly locks a mutex, but other
does not lock it at all. This causes crashes within QNN dlls.
- Does not lock mutex:
[onnxruntime/onnxruntime/core/providers/qnn/qnn_execution_provider.cc at
main ·
microsoft/onnxruntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/qnn_execution_provider.cc#L426)
- Locks mutex:
[onnxruntime/onnxruntime/core/providers/qnn/qnn_execution_provider.cc at
main ·
microsoft/onnxruntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/qnn_execution_provider.cc#L442)

The fix is to lock the mutex in both paths.
@adrianlizarraga adrianlizarraga added the ep:QNN issues related to QNN exeution provider label Feb 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:QNN issues related to QNN exeution provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants