Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should dilated pooling be supported #180

Open
fujunwei opened this issue Jun 8, 2021 · 7 comments
Open

Should dilated pooling be supported #180

fujunwei opened this issue Jun 8, 2021 · 7 comments

Comments

@fujunwei
Copy link

fujunwei commented Jun 8, 2021

OpenVINO doesn't support dilated pooling because there are no attributes to specify the dilation, for example MaxPooling.

For DML, only MAX_POOLING2 support with adding an additional constant array Dilations, but AVERAGE_POOLING and LP_POOLING still don't support dilated pooling.

System API only has limit capabliity to support dilated pooling, should it be defined in WebNN Spec?

@wchao1115
Copy link
Collaborator

This is already supported. Please take a look at pool2d operations in the current spec.

@huningxin
Copy link
Contributor

I suppose @fujunwei 's question is whether WebNN spec should not support dilated pooling given WebNN-native implementation feedbacks on OpenVINO and DirectML. According to the following table, only max pool of DirectML supports dilation.

native API Average Pooling Max Pooling L2 Pooling
DirectML No dilations
DML_AVERAGE_POOLING_OPERATOR_DESC
Support dilations
DML_MAX_POOLING2_OPERATOR_DESC
No dilations
DML_LP_POOLING_OPERATOR_DESC
NNAPI No dilations
ANEURALNETWORKS_AVERAGE_POOL_2D
No dilations
ANEURALNETWORKS_MAX_POOL_2D
No dilations
ANEURALNETWORKS_L2_POOL_2D
OpenVINO No dilations
AvgPool
No dilations
MaxPool
No L2 pooling op

@huningxin huningxin reopened this Jul 12, 2021
@wchao1115
Copy link
Collaborator

wchao1115 commented Jul 13, 2021

TensorFlow, PyTorch, and ONNX all support dilated pooling as the feature is used in real models e.g. the one described in this paper, with TensorFlow supporting it on both AVG and MAX pool.

https://www.tensorflow.org/api_docs/python/tf/nn/pool
https://github.com/onnx/onnx/blob/master/docs/Operators.md#MaxPool
https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html#torch.nn.MaxPool2d

It makes sense since dilation has been supported in convolution for a long time. A follow-up pooling operation might as well support it too.

A general question here is what to do if the underlying platform doesn't support a particular feature. I think in such case, the graph builder should probably fail fast at graph construction time to give the user a chance to either fail or recover from it.

This situation, hopefully, won't be common since WebNN is most likely just a backend of a framework used by the user; if the framework doesn't support this feature to begin with, it isn't likely to cause the WebNN backend to fail on it.

@huningxin
Copy link
Contributor

@wchao1115 , thanks for the pointers, they are helpful.

A general question here is what to do if the underlying platform doesn't support a particular feature. I think in such case, the graph builder should probably fail fast at graph construction time to give the user a chance to either fail or recover from it.

I understand a WebNN implementation should support the spec defines, otherwise it would not pass the conformance tests. Given the dilated pooling is not widely available on native platform ML APIs yet, should we exclude it for now and support it in the future?

@wchao1115
Copy link
Collaborator

wchao1115 commented Jul 14, 2021

My concern is that a webnn backend may be limited in their support for this feature for the frameworks that do support it, like the major frameworks I cited above i.e. there will be no way to implement TensorFlow, PyTorch, or ONNX dilated pooling, for examples, through webnn backend, if we remove this feature from the spec.

FWIW, the way we approach conformance in DirectML is that there are two classes of features -- the required and optional features. A required feature must be implemented to be fully conformed, but the implementation needs not be native i.e. it may be emulated. This is the implementation strategy I suggested for batchNormalization in #187, particularly because the operation can in fact be emulated as a composite of other existing operations.

An optional feature, however, is something that is either not possible or cannot be easily emulated, but also not required in all scenarios. This type of features requires a capability flag so that the caller can independently probe if it's actually supported by the implementation. I think of dilated pooling in this category.

We could make dilated pooling an optional feature with a capability flag exposed in the context e.g. context.isDilatedPoolingSupported. This way for the frameworks that do support this feature, it can detect if it is actually implemented in the backend and fail accordingly if not. And for the frameworks that do not support it, then it's a non-issue.

@huningxin
Copy link
Contributor

According to the current Chromium prototype, dilated pooling are not widely supported by targeting backends:

Because dilated pooling ops cannot be easily emulated. We probably should consider either removing it from the spec or making it as a detectable feature in opSupportLimits.

Thoughts?

/cc @fdwr @philloooo @inexorabletash

@fdwr
Copy link
Collaborator

fdwr commented Jul 29, 2024

Because dilated pooling ops cannot be easily emulated.

To convolve, reduce, or pool a tensor is a very similar operation, so much so that if you can rearrange one problem to any of the others, it's likely emulatable.

pooling reducing convoluting
averagePool reduceAverage conv2d(...) / windowElementCount
l2pool reduceL2 sqrt(conv2d(sqr()...))
maxPool reduceMax NA
  • Pool averages: It's equivalent to calling conv2d with a respectively weighted uniform filter. So for dilations > 1 and a 2x2 pooling window size, use a convolution filter of all 1's [[1, 1], [1, 1]] and divide the convolution output by 4. For a less precise answer but fewer operations, prefold the factor into the filter to [[0.25, 0.25], [0.25, 0.25]] (filter = div(expand(scalarConstantOne, windowSize), windowElementCount)).
  • Another way to pool averages via reduction (for backends capable of strides) is to reshape temporarily to a higher ND space and use overlapping strides for the windows. For example, for older DirectML (before FL6.2 which takes dilations directly), you can pad (when padding is specified), adjust the tensor description from 4D tensor to a 6D tensor with overlapped strides for the trailing window dimensions, and call averagePool's sibling reduceAverage - I verified this locally returns the same result, albeit more slowly than with direct dilations (happy to share the DxDispatch data file). If the API cannot take explicit tensor strides directly, there are other potential options like PyTorch's unfold/as_strided or TF's ExtractImagePatches ¹ to project it into a temporary form which average reduction can work upon. Though, just using conv2d should be faster.
  • Pool Lebesgue 2-norms: Like above, you call sqrt(conv2d(pow(input, 2), filterOfOnes, {dilations...})).
  • Pool maximums: This one is tougher. DML < FL6.2 should be able to achieve this with the reshape stride trick above, calling PADDING (when needed) and REDUCE_FUNCTION_MAX. TF could use ExtractImagePatches, assuming it's in the TFLite list ¹, but I'm not seeing a matching counterpart for CoreML ² 🤔. Maybe SlidingWindows followed by ReduceMax? Can CoreML create tensors with explicit strides to then use ReduceMax?

¹ For TF, should I be using this list or this list or another, to know which ops Chromium may call?
² For CoreML, should I be using this list or this list?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants