Skip to content
This repository has been archived by the owner on Jan 10, 2023. It is now read-only.

Consolidate reduction ops #306

Merged
merged 1 commit into from
Dec 6, 2018
Merged

Consolidate reduction ops #306

merged 1 commit into from
Dec 6, 2018

Conversation

ringgaard
Copy link
Contributor

I have consolidated the code generation of reduction ops, i.e. add/mul/min/max of all elements in a vectors register. These are now implemented as macro instructions. This also fixes a few bugs, so all myelin tests now pass on all supported CPUs (sse,avx,avx2,avx512).

Bonus:

  • Better error reporting for unsupported kernels and types
  • Deleted legacy kernels for AVXFltTanh, AVXFltExp, AVXFltSigmoid
  • A few optimizations for singleton MatMuls
  • Support for running a single myelin test or test a reduced set of combinations

@ringgaard ringgaard self-assigned this Dec 5, 2018
Copy link
Contributor

@anders-sandholm anders-sandholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less code. More awesomeness.
Congrats on passing all myelin tests on all supported CPUs!

@ringgaard ringgaard merged commit d51bac6 into google:master Dec 6, 2018
@ringgaard ringgaard deleted the reduc branch December 6, 2018 11:15
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants