-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large amounts of warnings from FASTER.KV #383
Comments
This is coming from within the Dictionary when the FASTER session is adding a pending IO context to it. Could two threads be using the session at the same time? |
We're just using Netherite as a backend; no knowledge of how it is used, or what we, at the user level could be doing that would cause that. We appreciate any hints, thanks. |
Whoops, I meant that for @sebastianburckhardt |
Hi @davidmrdavid not to give you more work, but we see this one a lot, and at warning level, so unless we want to hide "warning" logs from Netherite, it will amp up our logs considerably. Furthermore, "perhaps" it is something that should cause concern ;) Thank you. |
the job is the job :) . @TedHartMS: regarding this -
The calling code on our end seems to be here (and it matches eric's stack trace above): durabletask-netherite/src/DurableTask.Netherite/StorageLayer/Faster/FasterKV.cs Lines 50 to 71 in f8b5634
From what I can see - the sessions are obtained from a |
Update; becoming more critical as we are getting something like 50k messages in 30 minutes in production related to this. @davidmrdavid can this possibly be fixed by the upgrade of faster core? |
Hi @ericleigh007: I sync'ed with @TedHartMS and may have an explanation and fix for this error. The details are here: #397 In short - this may be a Netherite bug, an issue with how it's calling FASTER. Please see the details in the PR, which I'll try to get reviewed soon. |
@ericleigh007: I'll wait for the CI to pass in #397 and, if it does, I'll look to issue you with a private release. I have a suspicion that the change I've made in that PR is correct, but possibly not as performant (which may be convenient in your case if you're having throttling issues?). In any case - if we notice a significant perf drop, I have ideas for how to address that without affecting correctness, though the PR will be more complex (which is fine). I'll keep you posted. |
@ericleigh007: I have published the PR above as a private package here: https://durabletaskframework.visualstudio.com/Durable%20Task%20Framework%20CI/_artifacts/feed/durabletask/NuGet/Microsoft.Azure.DurableTask.Netherite/overview/1.5.2-clientQueryFix.1 To install it, you'll need to add the following key to your nuget.config:
I suspect the PR may not be as performant as the original implementation (though that suffered from race conditions, we think). Would you be willing to give it a try and let us know if at least you see this error go away? From there, if performance is a concern, I can refactor the PR into something more complex that should bring any lost perf back. |
IN OUR production environment, we still get the IndexOutOfRange exception. However in this smaller environment where I can do the tests, we have not been able to duplicate that one. In the test, the scale is smaller, we only get the ObjectDisposedException. The changes in packages in the smaller environment are shown below:
BEFORE 25k traces with severity level warning I have decided to send you the traces in your email, as they have internal information such as names of the resources in them. |
Thanks @ericleigh007: So, if I understand correctly, without the private package you get ~25k warnings and with the package you get ~7k; but given the difference in traffic between the two environments, we can't be certain that reduction in warnings is due to the private package. Do I have that right? Also, is the |
Hey @ericleigh007 - I took a look at those |
@ericleigh007: here's a new package: https://durabletaskframework.visualstudio.com/Durable%20Task%20Framework%20CI/_artifacts/feed/durabletask/NuGet/Microsoft.Azure.DurableTask.Netherite/overview/1.5.2-clientQueryFix.2 same instructions as before, but the version is now |
We have transitioned to Netherite in our production program and for the most part things are going well, although we don't have anywhere near as much experience as with the standard backend.
We have found in some cases, we experience a good number of errors and warnings. The following seem to be generated from FASTER, not Netherite itself.
We've received these with Part02, Part05, Part00, and Part13 as the prefix, over a short period, and then the errors were not seen again.
What can be the cause of this please?
-thank you
The text was updated successfully, but these errors were encountered: