Skip to content

Commit

Permalink
Fix Metal accuracy problem caused by <dtype>3 vectors usage
Browse files Browse the repository at this point in the history
On example of float3 datatype:
Using of float3 data type for loading of data cuncurrently into dense array shared
between all threads in Metal threading group can lead to data race between threads.
float3 datatype has size and and alignment eq to 16 bytes while kernel assumes to
copy 12 bytes in arbitrary not aligned places.
Using of packed_float3 datatypes solves the issue
  • Loading branch information
elvin-n committed Apr 13, 2021
1 parent f38ae65 commit b483dda
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions src/target/source/codegen_metal.cc
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,17 @@ void CodeGenMetal::PrintType(DataType t, std::ostream& os) { // NOLINT(*)
}
bool fail = false;
if (t.is_float()) {
// Need to care about sizes and alignment of half3/float3 because tir representation might not
// be aware of Metal half3/float3 details and can treat them as just three elements,
// while sizes and alignmnents of half3/float3 are one element more (half3-8 bytes/
// float13 - 16bytes).
// Example of problematic pattern: filling of threadgroup packed array using float3 elements
// by threads concurrently can lead to datarace and wrong data in threadgroup shared array.
// packed_(half3/float3) are exactly datatypes dealing with 3 elements and per-element
// alignment
if (lanes == 3) {
os << "packed_";
}
switch (t.bits()) {
case 16:
os << "half";
Expand Down

0 comments on commit b483dda

Please sign in to comment.