-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QRY_PROFILE_LIST_MUTEX blocking still a problem #3210
Comments
So if I understand you correctly, the SET LOCK TIMEOUT in the query isn't being honored by Microsoft SQL Server. What steps would you like me to take? |
Also, we notice that while a SET LOCK_TIMEOUT 1000 was added to sp_BlitzWho, it was not added to sp_BlitzFirst, which also calls sys.dm_exec_query_statistics_xml in several places. (While there is a SET LOCK_TIMEOUT 1000 statement, and also a SET LOCK_TIMEOUT 0 (no wait at all), these are set after the execution of the sys.dm_exec_query_statistics_xml). |
OK, can you try adding it to your version, where you think it's needed, and if that alleviates the problem, check in a pull request or let me know which specific query you're talking about? |
sp_BlitzFirst line 2295 and line 2346. IF EXISTS (SELECT * FROM sys.all_objects WHERE name = 'dm_exec_query_statistics_xml') |
Sure, cool, go ahead and try adding it to your version, where you think it's needed, and if that alleviates the problem? Thanks! |
Did you add it to your version, and did it alleviate the problem? |
I'm on vacation,I'll let you know in a few weeks. Thanks for checking back.
…On Sat, Jan 21, 2023 at 8:15 AM Brent Ozar ***@***.***> wrote:
Did you add it to your version, and did it alleviate the problem?
—
Reply to this email directly, view it on GitHub
<#3210 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APNFWATGMYTKIHI54YESSWTWTQRSHANCNFSM6AAAAAATVZWYOQ>
.
You are receiving this because you authored the thread.Message ID:
<BrentOzarULTD/SQL-Server-First-Responder-Kit/issues/3210/1399303831@
github.com>
|
OK, going to close this for now. |
SET LOCK_TIMEOUT 1000 when APPLY sys.dm_exec_query_statistics_xml to prevent mutex issues. Set GitHub BrentOzarULTD#3210
Brent, we have had this code in production for 3 weeks with no further mutex issues. |
Thanks for the pull request! Merged into the dev branch, will be in the next release with credit to you in the release notes. |
Version of the script
8.11
What is the current behavior?
We are still seeing the issues described in this closed bug report:
#2907
This is an occurance of blocking locks where sp_Blitz is not the lead blocker, but one of the processes being blocked. In this case, the lead blocker was the FormattedAddress function spid 160. It blocked our Saturday job to do index maintenance, spid 57. Spid 57 then blocked other processes including sp_Blitz spid 104. You can see that spid 160 is waiting on QRY_PROFILE_LIST_MUTEX.
To address the issue, the normal first action is to kill the lead blocker. "KILL 160" did not do anything. Since we have seen this before, we knew that killing sp_Blitz does get the jobs unstuck. So "KILL 104" released the blocking locks and got the processes unstuck.
When this lock occurs, it represents a serious system issue. All Sql Agent jobs stop operating. We had attempted to address the issue by checking for sp_Blitz job running for more than 10 minutes. However, since all jobs stopped executing once this block took place, it did not work around the issue.
If the current behavior is a bug, please provide the steps to reproduce.
Reproducing this issue is not easy. We have sp_BlitzFirst running every 15 minutes on about 90 servers, and we see this error occur roughly once per week. The consequence of the error is large so when it happens it is a crisis.
What is the expected behavior?
It is expected that the lock timeout added in the original issue would close the connection and free the mutex if the timeout expires. That is apparently not happening.
Which versions of SQL Server and which OS are affected by this issue? Did this work in previous versions of our procedures?
SQL Server Standard (and Enterprise) 2019, Windows Server 2016.
The problem has occurred sporatically since installing the sp_Blitz job about 3 months back.
The text was updated successfully, but these errors were encountered: