forked from KhronosGroup/webcl-validator
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathISSUES.txt
219 lines (171 loc) · 8.47 KB
/
ISSUES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
This document describes issues in Clang and OpenCL drivers that we
have encountered while developing the WebCL Validator.
OpenCL Drivers
==============
This section describes OpenCL driver related compiler and run-time
issues. Refer to TESTING.txt for more detailed test equipment setups.
OpenCL Compiler Differences
---------------------------
We have seen some differences in OpenCL drivers. These aren't
necessarily bugs, the differences just illustrate how different
vendors interpret the OpenCL specification either more strictly or
more loosely. Each of the cases below are accepted by some drivers,
but rejected by at least one driver.
Issue C1: Intel and AMD compilers support "__local int array[]" as
function parameter. This is not required by the OpenCL
specification and causes compilation errors on POCL. On POCL
one needs to use "__local int *array" instead.
Workaround: The validator could transform "array[]" to "*array".
Issue C2: POCL compiler supports "size_t" type as kernel
parameter. This is forbidden by the OpenCL specification and
causes compilation errors with AMD and Intel.
Workaround: The validator uses "ulong" when it needs to generate size
parameters for memory objects. The validator could
transform "size_t" to "ulong" for other kernel parameters.
Issue C3: Intel and POCL compilers allow vectors to be indexed with
constant and non-constant integers. This is not required by
the OpenCL specification and causes compilation errors on
AMD. On AMD one needs to use "v.x" instead of "v[0]" to
access the first element of a vector.
Workaround: The validator could transform "v[0]" to "v.x". The
validator could emit an error message for "v[i]".
Issue C4: Intel and POCL compilers allow vector fields to be accessed
with a vector pointer. This is not required by the OpenCL
specification and causes compilation errors on AMD. On AMD
one needs to use "(*p).x" instead of "p->x" to access the
first element of a vector pointer.
Workaround: The validator could transform "p->x" to "(*p).x".
Issue C5: POCL and AMD compilers support "event_t" as a kernel
parameter. This is forbidden by the OpenCL specification and
causes compilation errors on Intel.
Workaround: The validator could emit and error message for "event_t".
OpenCL Compiler Bugs
--------------------
Issue B1: AMD compiler segfaults if array is indexed with
"index[array]" instead of "array[index]".
Workaround: The validator transforms both forms to "*(array + index)".
Issue B2: AMD compiler segfaults if structure initialization refers to
itself:
__constant struct { int v; __constant int *p; } c = { 0, &c.v };
Workaround: ?
Issue B3: AMD compiler segfaults if kernel contains nothing but a
single barrier call:
__kernel void dummy(int d) { barrier(CLK_LOCAL_MEM_FENCE); }
Workaround: It's unlikely that anyone wants to use this kind of kernel
in practice.
Issue B4: POCL compiler requires kernel called 'main' to return int
instead of void. This is a general problem with Clang based
tools, since OpenCL support is not complete in Clang.
Workaround: ?
Issue B5: POCL compiler may crash when structure pointers are cast to
character pointers in function call arguments. This is
required in check-empty-memory test where address space
limits are passed to local memory initialization
routine. AMD and Intel compilers work fine.
Workaround: Sometimes casting indirectly via pointers to larger types
may help, but not always. For example, type* -> char* may
crash but type* -> int* -> char* may work.
Issue B6: Intel compiler crashes when the validated version of test
case "remove-variables.cl" is compiled.
Workaround: ?
Issue B7: AMD compiler crashes when the validated versions of test
cases "short-kernel-id.cl", "remove-variables.cl" and
"relocate-variables.cl" are compiled.
Workaround: ?
Issue B8: Apple i5 CPU driver compilation crashes if local memory
zeroing macro is replaced with following:
#define WCL_LOCAL_RANGE_INIT(begin, end) \
for(__local char* cur = (__local char*)begin; cur != (__local char*)end; cur++) { \
*cur = _WCL_FILLCHAR; \
}
Workaround: Zeroing is written differently.
ISSUE B9: Intel compiler crashes on image typedefs:
typedef __read_only image2d_t my_image2d_t;
Workaround: It's not necessary to use image typedefs.
OpenCL Driver Bugs
------------------
We have also found nasty run-time bugs in OpenCL drivers. For this
reason we suggest that Intel or POCL driver is used for now as the
binary drivers of some hardware vendors seem to cause a lot of
problems.
Issue D1: POCL 0.8 driver doesn't support image3d_t.
Workaround: The validator already rejects image3d_t use because it's
not supported in WebCL. This is only a minor issue in our
test cases.
Issue D2: AMD driver compiles and executes our radix-sort test
case. However, the results are incorrect because certain
vector variable contents become corrupted when barriers are
used. At least Intel and POCL drivers produce correct
results.
Workaround: ?
Issue D3: Apple CPU driver wasn't able to run radix-sort test case at
all when unsortedValues_ buffer is created with
CL_MEM_USE_HOST_PTR. Calling clEnqueueNDRangeKernel() fails
stating CL_INVALID_WORK_GROUP_SIZE. If the same buffer is
created with CL_MEM_COPY_HOST_PTR, then running the kernel
causes CL_OUT_OF_RESOURCES error.
Workaround: ?
Issue D4: AMD driver crashes when test cases "private_mem_zeroing.cl",
"radixsort/output.cl" and "reading_invalid_addresses" are
run.
Workaround: ?
Clang
=====
This section describes limitations of Clang regarding OpenCL C
code. We should try to upstream relevant fixes at some point.
Builtin Types
-------------
Not yet fixed locally:
Trunk of Clang supports OpenCL image and event types. Support isn't
present in 3.2. We should import these changes from upstream.
Image Access Qualifiers
-----------------------
Fixed locally, but not in upstream:
Clang parser supports __read_only, __write_only and __read_write image
access qualifiers. However, these qualifiers aren't associated with
types of parameter variable declarations and they are thrown away when
AST is built.
Optional Function Qualifiers
----------------------------
Not yet fixed locally:
The __attribute__((vec_type_hint(<type>))) has been defined but isn't
currently handled properly.
Printing of AST
---------------
These fixes can be abandoned. We don't need them anymore since we have
decided to use Clang's rewriting mechanism instead of printing the AST
with Clang's AST node printers.
Fixed locally, not in upstream:
Image access qualifiers aren't printed, because they are missing from
the AST.
Not yet fixed locally:
The __attribute__((vec_type_hint(<type>))) qualifier isn't printed
because it's missing from the AST.
Fixed locally, not in upstream:
Function qualifiers are printed after function parameters instead of
printing them before the return type.
Fixed locally, not in upstream:
The __kernel qualifier is printed as
__attribute__((opencl_kernel_function)).
Preprocessing
-------------
Fixed locally, not in upstream:
Clang supports a bunch of preprosessor callbacks. This mechanism needs
to be extended with OpenCL extension pragmas.
Array Indexing
--------------
Fixed in validator tool, not in upstream:
Clang doesn't handle indexing of arrays with large integer constant
expressions (>= 2^63) properly.
Static vector initialisers are not working properly in clang 3.2
----------------------------------------------------------------
http://clang-developers.42468.n3.nabble.com/getSourceRange-for-InitListExpr-td4033036.html
Also opened bug report: http://llvm.org/bugs/show_bug.cgi?id=16560
For now the bug can be circumvented by having vector initializations
wrapped inside parentheses: (float4)(0,0,0,0) -> ((float4)(0,0,0,0)).
Vector element access doesn't preserve address space
----------------------------------------------------
If we have "__global float4 *vec", the type of "(*vec).x" has correct
address space but "vec->x" doesn't. This can be worked around by
determining the correct address space indirectly from the type of
"vec" or by transforming "vec->x" to "(*vec).x".