release 2.7.0 #2117

psychocrypt · 2018-12-03T16:19:41Z

Release 2.7.0 changes

Changelog:

OpenCL: auto tuning option OpenCL: auto tuning option #2114
OpenCl: fix NVIDIA backend amd bug OpenCl: fix NVIDIA #2113
OpenCL: auto config two threads per GPU OpenCL: auto config two threads per GPU #2110
fix clamp implementation backend amd bug
OpenCL: opimize reciprocal calculation OpenCL: opimize reciprocal calculation #2104
OpenCL: comp mode optimization OpenCL: comp mode optimization #2102
update currencies update currencies #2101
OpenCL: thread interleaving OpenCL: thread interleaving #2100 fix default interleave value #2115 add interleave documentation #2105
OpenCl: optimize strided index 1 OpenCl: optimize strided index 1 #2089
OpenCL: add strided_index 3 OpenCL: add strided_index 3 #2088
OpenCL: cnv8 optimization OpenCL: cnv8 optimization #2087
OpenCl: optimize cn-v8 div OpenCl: optimize cn-v8 div #2086
OpenCL: optimize cn-heavy div OpenCL: optimize cn-heavy div #2085
AMD: use more 32bit operations AMD: use more 32bit operations #2084
OpenCL reduce API overhead OpenCL reduce API overhead #2081
OpenCL: reduce local mem footprint OpenCL: reduce local mem footprint #2080
CUDA: optimize cn-v8 div CUDA: optimize cn-v8 div #2079
CUDA: reduce cn-v8 shared mem footprint CUDA: reduce cn-v8 shared mem footprint #2078
CUDA: optimize cn-heavy div CUDA: optimize cn-heavy div #2077

port OpenCl optimized division to CUDA Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

Use only the half AES matrix and compute the other half in place. This PR increases the possible occupancy.

port optimizations from OpenCL. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

Reduce local memory foot print to increase the occupancy. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

- remove useless `clFinish` - avoid download num threads for skein&co and start always as much threads as in all other kernel (terminate useless threads)

CUDA: optimize cn-heavy div

- change a few 64bit variables into 32bit. - provide defines type quallified

optimize cryptonight_heavy diff

CUDA: optimize cn-v8 div

OpenCL: reduce local mem footprint

…otprint CUDA: reduce cn-v8 shared mem footprint

- optimize division

OpenCL: optimize cn-heavy div

small optimization for non cryptonight_v8 algorithms

OpenCL reduce API overhead

AMD: use more 32bit operations

OpenCl: optimize cn-v8 div

OpenCL: cnv8 optimization

Add new striding index where the memory is chunked by the size of the work group (worksize). Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

Use `mul24` to speedup the scratchpad index calculation. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

OpenCL: add strided_index 3

…Index1 OpenCl: optimize strided index 1

@SChernykh

If two threads are using the same GPU device the start time of each hash round is optimized based on the average time needed to calculate a bunch of hashes. This way to optimize the hash rate was first introduced by @SChernykh. This implementation based on the implementation in xmrig but differen in the details. - introduce a new config option `interleave` - implement thread interleaving

- `monero` - remove fork from cn-v7 to cn-v8 - remove dev pool fork from cn-v7 to cn-8

OpenCL: thread interleaving

update currencies

Disable compatibility mode if intensity is a multiple of worksize. In that case enabled compaibility mode will only slow down the miner.

use for non clang (Rocm) OpenCL a optimized reciprocal calculation without lookup table. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

OpenCL: comp mode optimization

Please add Cryptonight-Superfast

OpenCL: opimize reciprocal calculation

- add image to describe interleave better - add tuning description

Due to a wrong implementation clamp was not working.

The auto config generates for AMD devices now by default two threads per GPU. - remove the savety 128MiB memory now only from the max available GPU memory not from the avaialble memory for one alloc call - extend the memory documentation in amd.txt

fix clamp implementation

OpenCL: auto config two threads per GPU

add interleave documentation

- fix broken compile: change used `ULL` to `UL` because `UL` is defined as 64bit - fix memory copy to shared memory via vload8 (somehow it create wrong access)

Add an option to brute force intensity settings and lock in at the intensity with the highest hashrate. - update decumentation of the `interleave` option to mention the side effect with `auto-tune` - disable `interleave` auto adjustment if `auto-tune` is enabled - jconf: add `auto-tune` as optional option

NVIDIA is using clang as device compiler so the reciprocal optimizations was disabled with #2104. - re-enable optimized reciprocal calculation

Cleanup missing change from #2101

OpenCl: fix NVIDIA

OpenCL: auto tuning option

remove usage of cn_v7 if cn_v8 is enabled

The default value for interleave was wrongly set to 50. Remove the value and take the devault from the default constructor instead of side channeling it from the json parser.

fix default interleave value

increase version to 2.7.0

release 2.7.0

psychocrypt and others added 30 commits November 19, 2018 20:23

CUDA: optimize cn-heavy div

0c1d805

port OpenCl optimized division to CUDA Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

CUDA: reduce cn-v8 shared mem footprint

ae8ba7f

Use only the half AES matrix and compute the other half in place. This PR increases the possible occupancy.

CUDA: optimize cn-v8 div

4a7fde1

port optimizations from OpenCL. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

OpenCL: reduce local mem footprint

6f28392

Reduce local memory foot print to increase the occupancy. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

OpenCL reduce API overhead

6c563c9

- remove useless `clFinish` - avoid download num threads for skein&co and start always as much threads as in all other kernel (terminate useless threads)

Merge pull request #2077 from psychocrypt/topic-optimizeCUDAHeavyDiv

6a95f0b

CUDA: optimize cn-heavy div

AMD: use more 32bit operations

f40c54e

- change a few 64bit variables into 32bit. - provide defines type quallified

OpenCL: optimize cn-heavy div

9813e1c

optimize cryptonight_heavy diff

Merge pull request #2079 from psychocrypt/topic-cnv8OptimizeDiv

a7e30eb

CUDA: optimize cn-v8 div

Merge pull request #2080 from psychocrypt/topic-reduceSharedMemUsage

2683009

OpenCL: reduce local mem footprint

Merge pull request #2078 from psychocrypt/topic-cudaReduceSharedMemFo…

b7ffd6b

…otprint CUDA: reduce cn-v8 shared mem footprint

OpenCl: optimize cn-v8 div

bff5b00

- optimize division

Merge pull request #2085 from psychocrypt/topic-amdOptimizeDiv

1b2b4d3

OpenCL: optimize cn-heavy div

OpenCL: cn1 optimization

33e5825

small optimization for non cryptonight_v8 algorithms

Merge pull request #2081 from psychocrypt/topic-reduceAPIOverhead

7b7d449

OpenCL reduce API overhead

Merge pull request #2084 from psychocrypt/topic-amd32bit

b06747f

AMD: use more 32bit operations

Merge pull request #2086 from psychocrypt/topic-amdOptimizeCNv8Div

c6846a8

OpenCl: optimize cn-v8 div

Merge pull request #2087 from psychocrypt/topic-cn1Optimization

11387f7

OpenCL: cnv8 optimization

OpenCL: add strided_index 3

3c9442c

Add new striding index where the memory is chunked by the size of the work group (worksize). Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

OpenCl: optimize strided index 1

39fa7c6

Use `mul24` to speedup the scratchpad index calculation. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

Merge pull request #2088 from psychocrypt/topic-newStridedIndex

ff204b2

OpenCL: add strided_index 3

Merge pull request #2089 from psychocrypt/topic-OpenCLOptimizeStrided…

76f0de7

…Index1 OpenCl: optimize strided index 1

update currencies

159e695

- `monero` - remove fork from cn-v7 to cn-v8 - remove dev pool fork from cn-v7 to cn-8

Merge pull request #2100 from psychocrypt/topic-threadInterleaving

0ca76d9

OpenCL: thread interleaving

Merge pull request #2101 from psychocrypt/topic-updateCurrencyAlgorithms

1cb4f5e

update currencies

Added Cryptonight-Superfast

053190b

OpenCL: comp mode optimization

307dda8

Disable compatibility mode if intensity is a multiple of worksize. In that case enabled compaibility mode will only slow down the miner.

OpenCL: opimize reciprocal calculation

bc91088

use for non clang (Rocm) OpenCL a optimized reciprocal calculation without lookup table. Co-authored-by: SChernykh <sergey.v.chernykh@gmail.com>

Merge pull request #2102 from psychocrypt/topic-compModeOpti

6335998

OpenCL: comp mode optimization

fireice-uk and others added 19 commits December 1, 2018 19:23

Merge pull request #2097 from LPHuynh/cn_superfast

1933141

Please add Cryptonight-Superfast

Merge pull request #2104 from psychocrypt/topic-optimizev2Reciprocal

a8d0960

OpenCL: opimize reciprocal calculation

add interleave documentation

c60d050

- add image to describe interleave better - add tuning description

increase version to 2.7.0

e69f101

fix clamp implementation

b606304

Due to a wrong implementation clamp was not working.

Merge pull request #2107 from psychocrypt/fix-clamp

a39e63f

fix clamp implementation

Merge pull request #2110 from psychocrypt/topic-twoThreadsPerGPU

5eada9b

OpenCL: auto config two threads per GPU

Merge pull request #2105 from psychocrypt/topic-interleaveDocu

35fb646

add interleave documentation

OpenCl: fix NVIDIA

1b27f0f

- fix broken compile: change used `ULL` to `UL` because `UL` is defined as 64bit - fix memory copy to shared memory via vload8 (somehow it create wrong access)

OpenCL: enable cn_v8 optimization for NVIDIA

ab19d37

NVIDIA is using clang as device compiler so the reciprocal optimizations was disabled with #2104. - re-enable optimized reciprocal calculation

remove usage of cn_v7 if cn_v8 is enabled

54b71ca

Cleanup missing change from #2101

Merge pull request #2113 from psychocrypt/fix-OpenCLNvidia

b1d8b55

OpenCl: fix NVIDIA

Merge pull request #2114 from psychocrypt/topic-openclAutoTuning

5b53956

OpenCL: auto tuning option

Merge pull request #2116 from psychocrypt/topic-cleanCurrencyDef

240b97e

remove usage of cn_v7 if cn_v8 is enabled

fix default interleave value

05b4976

The default value for interleave was wrongly set to 50. Remove the value and take the devault from the default constructor instead of side channeling it from the json parser.

Merge pull request #2115 from psychocrypt/fix-defaultInterleave

6765ef4

fix default interleave value

Merge pull request #2106 from psychocrypt/topic-versionIncrease

a3329ac

increase version to 2.7.0

psychocrypt assigned fireice-uk Dec 3, 2018

psychocrypt requested a review from fireice-uk December 3, 2018 16:19

fireice-uk approved these changes Dec 3, 2018

View reviewed changes

fireice-uk merged commit 70b8193 into master Dec 3, 2018

gnagel pushed a commit to gnagel/xmr-stak that referenced this pull request Mar 23, 2019

Merge pull request fireice-uk#2117 from fireice-uk/dev

e12523f

release 2.7.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release 2.7.0 #2117

release 2.7.0 #2117

psychocrypt commented Dec 3, 2018 •

edited

Loading

release 2.7.0 #2117

release 2.7.0 #2117

Conversation

psychocrypt commented Dec 3, 2018 • edited Loading

Changelog:

psychocrypt commented Dec 3, 2018 •

edited

Loading