-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release 2.7.0 #2117
Merged
Merged
release 2.7.0 #2117
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
port OpenCl optimized division to CUDA Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
Use only the half AES matrix and compute the other half in place. This PR increases the possible occupancy.
port optimizations from OpenCL. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
Reduce local memory foot print to increase the occupancy. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
- remove useless `clFinish` - avoid download num threads for skein&co and start always as much threads as in all other kernel (terminate useless threads)
CUDA: optimize cn-heavy div
- change a few 64bit variables into 32bit. - provide defines type quallified
optimize cryptonight_heavy diff
CUDA: optimize cn-v8 div
OpenCL: reduce local mem footprint
…otprint CUDA: reduce cn-v8 shared mem footprint
- optimize division
OpenCL: optimize cn-heavy div
small optimization for non cryptonight_v8 algorithms
OpenCL reduce API overhead
AMD: use more 32bit operations
OpenCl: optimize cn-v8 div
OpenCL: cnv8 optimization
Add new striding index where the memory is chunked by the size of the work group (worksize). Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
Use `mul24` to speedup the scratchpad index calculation. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
OpenCL: add strided_index 3
…Index1 OpenCl: optimize strided index 1
If two threads are using the same GPU device the start time of each hash round is optimized based on the average time needed to calculate a bunch of hashes. This way to optimize the hash rate was first introduced by @SChernykh. This implementation based on the implementation in xmrig but differen in the details. - introduce a new config option `interleave` - implement thread interleaving
- `monero` - remove fork from cn-v7 to cn-v8 - remove dev pool fork from cn-v7 to cn-8
OpenCL: thread interleaving
update currencies
Disable compatibility mode if intensity is a multiple of worksize. In that case enabled compaibility mode will only slow down the miner.
use for non clang (Rocm) OpenCL a optimized reciprocal calculation without lookup table. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
OpenCL: comp mode optimization
Please add Cryptonight-Superfast
OpenCL: opimize reciprocal calculation
- add image to describe interleave better - add tuning description
Due to a wrong implementation clamp was not working.
The auto config generates for AMD devices now by default two threads per GPU. - remove the savety 128MiB memory now only from the max available GPU memory not from the avaialble memory for one alloc call - extend the memory documentation in amd.txt
fix clamp implementation
OpenCL: auto config two threads per GPU
add interleave documentation
- fix broken compile: change used `ULL` to `UL` because `UL` is defined as 64bit - fix memory copy to shared memory via vload8 (somehow it create wrong access)
Add an option to brute force intensity settings and lock in at the intensity with the highest hashrate. - update decumentation of the `interleave` option to mention the side effect with `auto-tune` - disable `interleave` auto adjustment if `auto-tune` is enabled - jconf: add `auto-tune` as optional option
NVIDIA is using clang as device compiler so the reciprocal optimizations was disabled with #2104. - re-enable optimized reciprocal calculation
Cleanup missing change from #2101
OpenCl: fix NVIDIA
OpenCL: auto tuning option
remove usage of cn_v7 if cn_v8 is enabled
The default value for interleave was wrongly set to 50. Remove the value and take the devault from the default constructor instead of side channeling it from the json parser.
fix default interleave value
increase version to 2.7.0
fireice-uk
approved these changes
Dec 3, 2018
gnagel
pushed a commit
to gnagel/xmr-stak
that referenced
this pull request
Mar 23, 2019
release 2.7.0
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Release 2.7.0 changes
Changelog: