Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release 2.7.0 #2117

Merged
merged 49 commits into from
Dec 3, 2018
Merged

release 2.7.0 #2117

merged 49 commits into from
Dec 3, 2018

Conversation

psychocrypt
Copy link
Collaborator

@psychocrypt psychocrypt commented Dec 3, 2018

Release 2.7.0 changes

Changelog:

psychocrypt and others added 30 commits November 19, 2018 20:23
port OpenCl optimized division to CUDA

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
Use only the half AES matrix and compute the other half in place.
This PR increases the possible occupancy.
port optimizations from OpenCL.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
Reduce local memory foot print to increase the occupancy.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
- remove useless `clFinish`
- avoid download num threads for skein&co and start always as much threads as in all other kernel (terminate useless threads)
- change a few 64bit variables into 32bit.
- provide defines type quallified
optimize cryptonight_heavy diff
…otprint

CUDA: reduce cn-v8 shared mem footprint
- optimize division
small optimization for non cryptonight_v8 algorithms
Add new striding index where the memory is chunked by the size of the work group (worksize).

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
Use `mul24` to speedup the scratchpad index calculation.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
If two threads are using the same GPU device the start time of each hash round is optimized based on the average time needed to calculate a bunch of hashes.

This way to optimize the hash rate was first introduced by @SChernykh. This implementation based on the implementation in xmrig but differen in the details.

- introduce a new config option `interleave`
- implement thread interleaving
- `monero` - remove fork from cn-v7 to cn-v8
- remove dev pool fork from cn-v7 to cn-8
Disable compatibility mode if intensity is a multiple of worksize. In that case enabled compaibility mode will only slow down the miner.
use for non clang (Rocm) OpenCL a optimized reciprocal calculation without lookup table.

Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>
fireice-uk and others added 19 commits December 1, 2018 19:23
Please add Cryptonight-Superfast
- add image to describe interleave better
- add tuning description
Due to a wrong implementation clamp was not working.
The auto config generates for AMD devices now by default two threads per GPU.

- remove the savety 128MiB memory now only from the max available GPU memory not from the avaialble memory for one alloc call
- extend the memory documentation in amd.txt
OpenCL: auto config two threads per GPU
- fix broken compile: change used `ULL` to `UL` because `UL` is defined as 64bit
- fix memory copy to shared memory via vload8 (somehow it create wrong access)
Add an option to brute force intensity settings and lock in at the intensity with the highest hashrate.

- update decumentation of the `interleave` option to mention the side effect with `auto-tune`
- disable `interleave` auto adjustment if `auto-tune` is enabled
- jconf: add `auto-tune` as optional option
NVIDIA is using clang as device compiler so the reciprocal optimizations was disabled with #2104.

- re-enable optimized reciprocal calculation
remove usage of cn_v7 if cn_v8 is enabled
The default value for interleave was wrongly set to 50.

Remove the value and take the devault from the default constructor instead of side channeling it from the json parser.
@fireice-uk fireice-uk merged commit 70b8193 into master Dec 3, 2018
gnagel pushed a commit to gnagel/xmr-stak that referenced this pull request Mar 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants