Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry: Add mutex to avoid push during recalc and other races #1845

Merged
merged 1 commit into from
Sep 30, 2018

Conversation

Spudz76
Copy link
Contributor

@Spudz76 Spudz76 commented Sep 22, 2018

Add simple locking to telemetry class, since both push and calc methods modify the data buckets and pointers, this avoids race conditions and bucket corruption.

Hoping this solves occasional "missing stats" or inaccuracy that could not be tracked down to anything but "thread timing mystery". It does seem to standardize my hashrates as shown which I take to mean it's more accurate.

Regardless it doesn't seem to hurt.

@Spudz76
Copy link
Contributor Author

Spudz76 commented Sep 24, 2018

Bisecting but this patch might hang Windows+CUDA exiting after benchmark. That or Windows+CUDA no longer exits after benchmark mode? It does in OpenCL or just CPU modes.

As far as I understand the mutex should destruct just fine in any state, so I'm not sure why this would hang the main executor loop. But also there seems to be some locking inside executor so this may be superfluous. I was previously unsure if reporting events were locked vs the incoming pushes in all cases.

Also stats are accurate (match pool shown rate exactly), running this on every one of my current XMR miners (all CPUs). I do believe they are more stable and accurate.

@psychocrypt
Copy link
Collaborator

Bisecting but this patch might hang Windows+CUDA exiting after benchmark. That or Windows+CUDA no longer exits after benchmark mode?

Does this means this PR can currently not used together with CUDA?

IF so we need to fix that first.

@psychocrypt psychocrypt self-requested a review September 24, 2018 06:40
@psychocrypt psychocrypt self-assigned this Sep 24, 2018
@Spudz76
Copy link
Contributor Author

Spudz76 commented Sep 24, 2018

Confirmed my build does not exit even without this patch, so it is GOOD.

Reasonably sure there are several missing win_exit() calls in the benchmark section causing this other behavior. But does not particularly answer how it exits fine to prompt (no press key) under the other backends... anyway it's definitely not this.

@Spudz76
Copy link
Contributor Author

Spudz76 commented Sep 24, 2018

example of rock solid hashrates from a rig running for the last day or so:

HASHRATE REPORT - CPU
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |   17.7 |   17.7 |   17.7 |  1 |   17.7 |   17.7 |   17.7 |
|  2 |   17.7 |   17.7 |   17.7 |  3 |   17.7 |   17.7 |   17.7 |
|  4 |   17.7 |   17.7 |   17.7 |  5 |   17.7 |   17.7 |   17.7 |
|  6 |   17.7 |   17.7 |   17.7 |  7 |   17.7 |   17.7 |   17.7 |
Totals (CPU):   141.8  141.8  141.7 H/s
-----------------------------------------------------------------
Totals (ALL):    141.8  141.8  141.7 H/s
Highest:   142.0 H/s
-----------------------------------------------------------------

This same rig without this mutex always had variances everywhere (+-0.7 per thread)
Pool side 6h avg hashrate is the same perhaps better but unknown if that smoothness is due to luck variances. However the pool side rate matches much closer to the report while before it showed a bit high in the report (especially the 10s column) but lower on the pool (15m was much closer to reality).

HASHRATE REPORT - CPU
| ID |    10s |    60s |    15m | ID |    10s |    60s |    15m |
|  0 |   17.8 |   17.8 |   17.5 |  1 |   17.9 |   17.9 |   17.0 |
|  2 |   17.9 |   17.9 |   17.0 |  3 |   17.9 |   17.9 |   17.0 |
|  4 |   17.9 |   17.8 |   17.3 |  5 |   17.9 |   17.9 |   17.3 |
|  6 |   17.9 |   17.9 |   17.4 |  7 |   17.9 |   17.9 |   17.3 |
Totals (CPU):   143.0  142.9  137.7 H/s
-----------------------------------------------------------------
Totals (ALL):    143.0  142.9  137.7 H/s
Highest:   143.1 H/s
-----------------------------------------------------------------

non-AES cpu

@psychocrypt psychocrypt merged commit 952d244 into fireice-uk:dev Sep 30, 2018
psychocrypt added a commit to psychocrypt/xmr-stak that referenced this pull request Oct 15, 2018
With fireice-uk#1845 a race condition during the telemetry update is solved.
The problem is that the used mutex is blocking all threads from updating the metrics during
the statistics are calculated.

- introduce a mutex per miner thread
gnagel pushed a commit to gnagel/xmr-stak that referenced this pull request Mar 23, 2019
With fireice-uk#1845 a race condition during the telemetry update is solved.
The problem is that the used mutex is blocking all threads from updating the metrics during
the statistics are calculated.

- introduce a mutex per miner thread
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants