Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorized (SIMD) Numerical Schemes #1022

Merged
merged 112 commits into from
Sep 30, 2020
Merged
Changes from 1 commit
Commits
Show all changes
112 commits
Select commit Hold shift + click to select a range
82b9edf
add basic simd type
pcarruscag Jun 10, 2020
058a5b8
begin prototype of simd numerics
pcarruscag Jun 14, 2020
2d2961a
Merge branch 'feature_quasi_newton_adjoint' into feature_simd_numerics
pcarruscag Jun 17, 2020
1bfea72
Merge branch 'iteration_class' into feature_simd_numerics
pcarruscag Jun 17, 2020
e41c6f6
optimize least squares gradients when periodic comms are not needed
pcarruscag Jun 17, 2020
07d325f
use CRTP for static polymorphism
pcarruscag Jun 17, 2020
8eeac65
fix search/replace mistakes
pcarruscag Jun 17, 2020
c24cd1c
fix LS gradients preacc
pcarruscag Jun 17, 2020
20a7edc
add iterators to C2DContainer, fix compiler errors
pcarruscag Jun 18, 2020
b72ac57
Merge branch 'feature_quasi_newton_adjoint' into feature_simd_numerics
pcarruscag Jun 18, 2020
d43c6ad
add SIMD set methods to CSysVector and CSysMatrix, fix 1000 compilati…
pcarruscag Jun 18, 2020
ff0ea0b
codefactor
pcarruscag Jun 18, 2020
7e47d1a
make new numerics compatible with non-SIMD types (for AD)
pcarruscag Jun 19, 2020
06956b6
fetching edge nodes needs gather due to coloring, add C3DContainerDec…
pcarruscag Jun 20, 2020
112bf04
improving and cleaning re-orientation checks
pcarruscag Jun 20, 2020
b2db9ba
optimize least squares gradients when periodic comms are not needed
pcarruscag Jun 17, 2020
a711554
fix LS gradients preacc
pcarruscag Jun 17, 2020
d905213
Merge branch 'cleanup_orientation_checks' into feature_simd_numerics
pcarruscag Jun 20, 2020
80b9453
Merge branch 'cleanup_orientation_checks' into feature_simd_numerics
pcarruscag Jun 21, 2020
c2b7049
use scale factor in vector and matrix updates as a mask to handle "re…
pcarruscag Jun 22, 2020
88a6c33
template mechanism for static decorator pattern
pcarruscag Jun 22, 2020
5fedf08
small LS cleanups and comments
pcarruscag Jun 22, 2020
6054e08
small LS cleanups and comments
pcarruscag Jun 22, 2020
4d88298
Merge branch 'cleanup_orientation_checks' into feature_simd_numerics
pcarruscag Jun 22, 2020
691df95
fix UB, 0*Nan is Nan
pcarruscag Jun 22, 2020
4f42705
need explicit SIMD directive for gcc to vectorize
pcarruscag Jun 24, 2020
31bf94a
fix a couple bugs, replace iterators by "bulk get" methods (gather is…
pcarruscag Jun 24, 2020
e01268b
hack to run simd numerics for simple Euler+Roe problems
pcarruscag Jun 24, 2020
22cd824
fix a bug, wrong vector used for reduction strategy
pcarruscag Jun 24, 2020
1c8c66d
viscous numerics implementation
pcarruscag Jun 26, 2020
97328c9
Merge branch 'feature_quasi_newton_adjoint' into feature_simd_numerics
pcarruscag Jun 26, 2020
0c52649
reduce duplication between 2D and 3D containers
pcarruscag Jun 26, 2020
6078a64
Merge branch 'feature_quasi_newton_adjoint' into feature_simd_numerics
pcarruscag Jun 26, 2020
ac0853f
Merge branch 'feature_quasi_newton_adjoint' into feature_simd_numerics
pcarruscag Jun 26, 2020
f9f743b
i/j update in CSysVector, fix a bug in viscous flux jacobian
pcarruscag Jun 27, 2020
a4b0228
AVX specialization for Array of 4 doubles
pcarruscag Jun 27, 2020
20cd051
small manual optim of some functions, remove get prefix from functions
pcarruscag Jun 27, 2020
2b9e5bb
make a directory for container types
pcarruscag Jun 27, 2020
debea7a
implement low dissip Roe and QCR in viscous fluxes
pcarruscag Jun 28, 2020
3cded55
split files, factory method
pcarruscag Jun 29, 2020
4c26d72
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Jun 29, 2020
23c2a93
fix overload resolution issues in gcc 5
pcarruscag Jun 30, 2020
0dec543
Merge branch 'small_least_squares_improvement' into feature_simd_nume…
pcarruscag Jul 2, 2020
0a060b0
Merge branch 'cleanup_orientation_checks' into feature_simd_numerics
pcarruscag Jul 2, 2020
2d2ae09
optimize gradients and limiters for 2D 3D, reduce Rmatrix storage ove…
pcarruscag Jul 2, 2020
5820eb3
fix bug, clean a few lines
pcarruscag Jul 2, 2020
bdc61d3
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Jul 2, 2020
446c042
fix ILU allocation logic for DEF/DOT
pcarruscag Jul 3, 2020
cc3ac7d
assume contiguous edges when loading iPoint jPoint
pcarruscag Jul 3, 2020
835f903
a little range for syntax sugar
pcarruscag Jul 3, 2020
c3d125a
small bug GetPoints -> GetEdges
pcarruscag Jul 4, 2020
acc8035
little cleanup
pcarruscag Jul 5, 2020
4783798
cleanup MKL JIT if def's
pcarruscag Jul 6, 2020
de0eb13
automatic preaccumulation when gathering variables
pcarruscag Jul 6, 2020
03f3a9d
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Jul 6, 2020
12b80c6
option to use vectorization
pcarruscag Jul 6, 2020
9f71420
clean up of mixed precision ifdefs, replaced by sfinae
pcarruscag Jul 7, 2020
af01f5a
gradient unit tests
pcarruscag Jul 7, 2020
56a2706
missing file in previous commit
pcarruscag Jul 7, 2020
0d931b3
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Jul 7, 2020
891fa3d
change some tests to use vectorization, tweak LS grads a bit more for…
pcarruscag Jul 7, 2020
a6d0b5f
one AD testcase
pcarruscag Jul 8, 2020
c7b65ec
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Jul 9, 2020
68cb364
update tests
pcarruscag Jul 9, 2020
f23f4b6
avx512 specialization, general case using expression templates
pcarruscag Jul 10, 2020
f8a973a
use expression templates in CSysVector
pcarruscag Jul 10, 2020
540e58a
fix mem leak, format code
pcarruscag Jul 11, 2020
b1a10d8
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Jul 11, 2020
9be871e
allow default construction of CSysVector in parallel
pcarruscag Jul 12, 2020
b0dfd9c
use the simd type also for AD
pcarruscag Jul 14, 2020
20d616d
fix directdiff build
pcarruscag Jul 14, 2020
248acea
fix icc compilation issue
pcarruscag Jul 17, 2020
9ae8974
cleanup enable_if syntax, SSE SIMD specialization, make vector expres…
pcarruscag Jul 19, 2020
c36c722
fix unit test build
pcarruscag Jul 19, 2020
7183a09
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Jul 19, 2020
8cd90f8
change a name
pcarruscag Jul 20, 2020
26bafaf
Merge remote-tracking branch 'jblueh/codi_medi_update' into feature_s…
pcarruscag Jul 20, 2020
80f33c0
vectorized central schemes, cleanup static polymorphism mechanism
pcarruscag Jul 20, 2020
4e42b7c
optimize muscl logic to allow data reuse when computing viscous fluxe…
pcarruscag Jul 21, 2020
2f43aff
Merge branch 'fix_jst_ke' into feature_simd_numerics
pcarruscag Jul 21, 2020
a81a76d
Merge branch 'cleanup_flow_solver_duplication' into feature_simd_nume…
pcarruscag Jul 21, 2020
9932c05
implement logical SIMD operations
pcarruscag Jul 22, 2020
5f1af74
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Jul 22, 2020
33e5445
unnecessary cast
pcarruscag Jul 22, 2020
230b047
move some methods to FVMBase
pcarruscag Jul 23, 2020
0491d4b
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Jul 23, 2020
5554134
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Jul 24, 2020
5a5778b
fix issue with min/max expressions, unit tests for SIMD-type
pcarruscag Jul 24, 2020
f01c820
fix AD build
pcarruscag Jul 24, 2020
1623055
JST scheme with matrix dissipation
pcarruscag Jul 27, 2020
b288045
add config options
pcarruscag Jul 27, 2020
3743af9
fix bug with instantiation of Lax
pcarruscag Jul 27, 2020
0031e79
Merge branch 'feature_simd_numerics' into feature_jst_matrix
pcarruscag Jul 27, 2020
826e330
Merge branch 'develop' into feature_simd_numerics
pcarruscag Aug 3, 2020
d4602a7
fix leak introduced in #877
pcarruscag Aug 4, 2020
87b2d94
Merge branch 'feature_simd_numerics' into feature_jst_matrix
pcarruscag Aug 4, 2020
f57f4b5
add topology outputs
pcarruscag Aug 7, 2020
4ea0628
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Aug 7, 2020
6621d1d
fix clang issues
pcarruscag Aug 8, 2020
1f8bddb
Merge branch 'feature_simd_numerics' into feature_jst_matrix
pcarruscag Aug 8, 2020
4ced288
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Aug 11, 2020
2efb124
fix clang debug AD build issue
pcarruscag Aug 11, 2020
beb8688
re update testcases after merge with develop
pcarruscag Aug 11, 2020
10b56e9
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Sep 4, 2020
94cd99d
Merge branch 'feature_simd_numerics' into feature_jst_matrix
pcarruscag Sep 4, 2020
c98b530
Merge branch 'develop' into feature_simd_numerics
pcarruscag Sep 5, 2020
802d3ef
Merge branch 'develop' into feature_simd_numerics
pcarruscag Sep 7, 2020
1414f3f
Merge branch 'feature_simd_numerics' into feature_jst_matrix
pcarruscag Sep 8, 2020
99988b5
Merge remote-tracking branch 'upstream/develop' into feature_jst_matrix
pcarruscag Sep 27, 2020
d43a50b
address PR comments, fix iDim==3 issues
pcarruscag Sep 27, 2020
bd8d88f
Merge remote-tracking branch 'upstream/develop' into feature_simd_num…
pcarruscag Sep 27, 2020
6ea6116
update config_template
pcarruscag Sep 29, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 212 additions & 0 deletions Common/include/parallelization/vectorization.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
/*!
* \file vectorization.hpp
* \brief Implementation of a portable SIMD type.
* \author P. Gomes
* \version 7.0.5 "Blackbird"
*
* SU2 Project Website: https://su2code.github.io
*
* The SU2 Project is maintained by the SU2 Foundation
* (http://su2foundation.org)
*
* Copyright 2012-2020, SU2 Contributors (cf. AUTHORS.md)
*
* SU2 is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public
* License as published by the Free Software Foundation; either
* version 2.1 of the License, or (at your option) any later version.
*
* SU2 is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
* Lesser General Public License for more details.
*
* You should have received a copy of the GNU Lesser General Public
* License along with SU2. If not, see <http://www.gnu.org/licenses/>.
*/

#pragma once

#include "../basic_types/datatype_structure.hpp"
#include "../omp_structure.hpp"
#include <initializer_list>
#include <cmath>
#include <algorithm>

namespace simd {

/*--- Detect preferred SIMD size (bytes). ---*/
#if defined(__AVX512F__)
constexpr size_t SIMD_SIZE = 64;
#elif defined(__AVX__)
constexpr size_t SIMD_SIZE = 32;
#elif defined(__SSE__)
constexpr size_t SIMD_SIZE = 16;
#else
constexpr size_t SIMD_SIZE = 8;
#endif

/*!
* \brief Convert the SIMD size (bytes) to a lenght (num elems).
*/
template<class T>
constexpr size_t simdLen() { return SIMD_SIZE / sizeof(T); }
template<>
constexpr size_t simdLen<su2double>() { return SIMD_SIZE / sizeof(passivedouble); }

/*!
* \class Array
* \brief A simple SIMD type relying mostly on implicit vectorization, i.e. done by
* the compiler, as opposed to explicit (done via intrinsics or inline assembly).
*/
template<class Scalar_t, size_t N = simdLen<Scalar_t>()>
class Array {
pcarruscag marked this conversation as resolved.
Show resolved Hide resolved
#define FOREACH SU2_OMP_SIMD for(size_t k=0; k<N; ++k)

public:
using Scalar = Scalar_t;
enum : size_t {Size = N};
enum : size_t {Align = SIMD_SIZE};

private:
alignas(Align) Scalar x_[N];

template<class T>
FORCEINLINE void bcast(T x) { FOREACH x_[k] = x; }

public:
/*--- Constructors ---*/

FORCEINLINE Array() = default;

// broadcast
FORCEINLINE Array(Scalar x) { bcast(x); }
FORCEINLINE Array& operator= (Scalar x) { bcast(x); return *this; }

// initialize with given values
FORCEINLINE Array(std::initializer_list<Scalar> vals) {
auto it = vals.begin(); FOREACH { x_[k] = *it; ++it; }
}

// load
FORCEINLINE Array(const Scalar* ptr) { load(ptr); }

// gather
template<class T>
FORCEINLINE Array(const Scalar* base_ptr, const T& offsets) { gather(base_ptr,offsets); }

// copy / assign
template<class T>
FORCEINLINE Array(const Array<T,N>& other) { FOREACH x_[k] = other[k]; }

template<class T>
FORCEINLINE Array& operator= (const Array<T,N>& other) { FOREACH x_[k] = other[k]; return *this; }

/*--- Accessors. ---*/

FORCEINLINE Scalar& operator[] (size_t k) { return x_[k]; }

FORCEINLINE const Scalar& operator[] (size_t k) const { return x_[k]; }

FORCEINLINE void load(const Scalar* ptr) { FOREACH x_[k] = ptr[k]; }

FORCEINLINE void store(Scalar* ptr) const { FOREACH ptr[k] = x_[k]; }

template<class T>
FORCEINLINE void gather(const Scalar* base_ptr, const T& offsets) { FOREACH x_[k] = base_ptr[offsets[k]]; }

template<class T>
FORCEINLINE void scatter(Scalar* base_ptr, const T& offsets) const { FOREACH base_ptr[offsets[k]] = x_[k]; }

/*--- Compound math operators, "this" is not returned because it generates poor assembly. ---*/

#define MAKE_COMPOUND(OP)\
FORCEINLINE void operator OP (Scalar x) { FOREACH x_[k] OP x; }\
FORCEINLINE void operator OP (const Array& other) { FOREACH x_[k] OP other.x_[k]; }
MAKE_COMPOUND(+=)
MAKE_COMPOUND(-=)
MAKE_COMPOUND(*=)
MAKE_COMPOUND(/=)
#undef MAKE_COMPOUND

/*--- Reductions. ---*/

FORCEINLINE Scalar sum() const { Scalar s(0); FOREACH s += x_[k]; return s; }

FORCEINLINE Scalar dot(const Array& other) const { return (*this * other).sum(); }

};
#undef FOREACH
#define FOREACH SU2_OMP_SIMD for(size_t k=0; k<T::Size; ++k)

/*--- Math, logical, and relational operators, with arrays and scalars. ---*/

#define MAKE_OPERATOR(OP)\
template<class T>\
FORCEINLINE T operator OP (const T& a, const T& b) {\
T res; FOREACH res[k] = a[k] OP b[k]; return res;\
}\
template<class T>\
FORCEINLINE T operator OP (const T& a, typename T::Scalar b) {\
T res; FOREACH res[k] = a[k] OP b; return res;\
}\
template<class T>\
FORCEINLINE T operator OP (typename T::Scalar b, const T& a) {\
T res; FOREACH res[k] = b OP a[k]; return res;\
}

MAKE_OPERATOR(+)
MAKE_OPERATOR(-)
MAKE_OPERATOR(*)
MAKE_OPERATOR(/)
MAKE_OPERATOR(==)
MAKE_OPERATOR(!=)
MAKE_OPERATOR(>)
MAKE_OPERATOR(<=)
MAKE_OPERATOR(<)
MAKE_OPERATOR(>=)
MAKE_OPERATOR(&)
MAKE_OPERATOR(|)

#undef MAKE_OPERATOR

/*--- Functions of one argument, first macro param is the name of
* the created function, IMPL is the scalar implementation. ---*/

#define MAKE_UNARY_FUN(NAME,IMPL)\
template<class T>\
FORCEINLINE T NAME(const T& x) {\
T res; FOREACH res[k] = IMPL(x[k]); return res;\
}

MAKE_UNARY_FUN(sqrt,::sqrt)

#undef MAKE_UNARY_FUN

/*--- Functions of two arguments, with arrays and scalars. ---*/

#define MAKE_BINARY_FUN(NAME,IMPL)\
template<class T>\
FORCEINLINE T NAME(const T& a, const T& b) {\
T res; FOREACH res[k] = IMPL(a[k], b[k]); return res;\
}\
template<class T>\
FORCEINLINE T NAME(const T& a, typename T::Scalar b) {\
T res; FOREACH res[k] = IMPL(a[k], b); return res;\
}\
template<class T>\
FORCEINLINE T NAME(typename T::Scalar b, const T& a) {\
T res; FOREACH res[k] = IMPL(b, a[k]); return res;\
}\

MAKE_BINARY_FUN(max,std::max)
MAKE_BINARY_FUN(min,std::min)
MAKE_BINARY_FUN(pow,::pow)

#undef MAKE_BINARY_FUN

#undef FOREACH
} // namespace

using su2vdouble = simd::Array<su2double>;
using su2vulong = simd::Array<unsigned long>;