Rewrite lerp operator to use TensorIterator and support compile-time vectorization. #22038

VitalyFedyunin · 2019-06-20T20:11:02Z

Get benefit from the compile time vectorization and multi-threading.

Before:

In [1]: import torch                                                                                                                                                                                       
In [2]: x = torch.randn(1000000)                                                                                                                                                                           
In [3]: y = torch.randn(1000000)                                                                                                                                                                           
In [4]: w = 0.7                                                                                                                                                                                            
In [5]: timeit torch.lerp(x, y, w)                                                                                                                                                                         
2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

After:

In [1]: import torch                                                                                                                                                                                       
In [2]: x = torch.randn(1000000)                                                                                                                                                                           
In [3]: y = torch.randn(1000000)                                                                                                                                                                           
In [4]: w = 0.7                                                                                                                                                                                            
In [5]: timeit torch.lerp(x, y, w)                                                                                                                                                                         
452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

After with multi-processing:

In [1]: import torch                                                                                                                                                                                       
In [2]: x = torch.randn(1000000)                                                                                                                                                                           
In [3]: y = torch.randn(1000000)                                                                                                                                                                           
In [4]: w = 0.7                                                                                                                                                                                            
In [5]: timeit torch.lerp(x, y, w)                                                                                                                                                                         
167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

aten/src/ATen/native/cpu/LerpKernel.cpp

cpuhrsch · 2019-06-20T21:16:03Z

@pytorchbot retest this please

facebook-github-bot

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang

Thanks!

cpuhrsch

For bonus points we'd add this to more benchmark infra.

facebook-github-bot · 2019-06-21T19:09:41Z

@VitalyFedyunin merged this pull request in fe580e8.

…vectorization. (pytorch#22038) Summary: Get benefit from the compile time vectorization and multi-threading. Before: ```python In [1]: import torch In [2]: x = torch.randn(1000000) In [3]: y = torch.randn(1000000) In [4]: w = 0.7 In [5]: timeit torch.lerp(x, y, w) 2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After: ```python In [1]: import torch In [2]: x = torch.randn(1000000) In [3]: y = torch.randn(1000000) In [4]: w = 0.7 In [5]: timeit torch.lerp(x, y, w) 452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After with multi-processing: ```python In [1]: import torch In [2]: x = torch.randn(1000000) In [3]: y = torch.randn(1000000) In [4]: w = 0.7 In [5]: timeit torch.lerp(x, y, w) 167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Pull Request resolved: pytorch#22038 Differential Revision: D15941468 Pulled By: VitalyFedyunin fbshipit-source-id: fa8a5126187df4e6c849452e035b00b22be25739

…vectorization. (#22038) Summary: Get benefit from the compile time vectorization and multi-threading. Before: ```python In [1]: import torch In [2]: x = torch.randn(1000000) In [3]: y = torch.randn(1000000) In [4]: w = 0.7 In [5]: timeit torch.lerp(x, y, w) 2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After: ```python In [1]: import torch In [2]: x = torch.randn(1000000) In [3]: y = torch.randn(1000000) In [4]: w = 0.7 In [5]: timeit torch.lerp(x, y, w) 452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After with multi-processing: ```python In [1]: import torch In [2]: x = torch.randn(1000000) In [3]: y = torch.randn(1000000) In [4]: w = 0.7 In [5]: timeit torch.lerp(x, y, w) 167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Pull Request resolved: pytorch/pytorch#22038 Differential Revision: D15941468 Pulled By: VitalyFedyunin fbshipit-source-id: fa8a5126187df4e6c849452e035b00b22be25739

VitalyFedyunin added 7 commits May 28, 2019 10:17

Super simple (and incorrect) switch to the TensorIterator

c559319

Move lerp to the cpu folder

bcf9b1f

Add Vec256 instructions

83a23e7

Introduce ternary TensorIterator

9491157

Remove older changes

09efc59

Merge remote-tracking branch 'origin/master' into tensor_iterator_lerp_2

db67ec9

Leaving broadcasting up to the TensorIterator

51c68c5

pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: operators labels Jun 20, 2019

VitalyFedyunin requested review from cpuhrsch and ezyang June 20, 2019 20:11

cpuhrsch reviewed Jun 20, 2019

View reviewed changes

aten/src/ATen/native/cpu/LerpKernel.cpp Outdated Show resolved Hide resolved

cpuhrsch reviewed Jun 20, 2019

View reviewed changes

aten/src/ATen/native/cpu/LerpKernel.cpp Outdated Show resolved Hide resolved

Narrow lambda vars

05badda

VitalyFedyunin requested a review from cpuhrsch June 21, 2019 15:06

facebook-github-bot reviewed Jun 21, 2019

View reviewed changes

ezyang approved these changes Jun 21, 2019

View reviewed changes

cpuhrsch approved these changes Jun 21, 2019

View reviewed changes

facebook-github-bot closed this in fe580e8 Jun 21, 2019

facebook-github-bot added the merged label Jun 21, 2019

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite lerp operator to use TensorIterator and support compile-time vectorization. #22038

Rewrite lerp operator to use TensorIterator and support compile-time vectorization. #22038

VitalyFedyunin commented Jun 20, 2019 •

edited

Loading

cpuhrsch commented Jun 20, 2019

facebook-github-bot left a comment

ezyang left a comment

cpuhrsch left a comment •

edited

Loading

facebook-github-bot commented Jun 21, 2019

Rewrite lerp operator to use TensorIterator and support compile-time vectorization. #22038

Rewrite lerp operator to use TensorIterator and support compile-time vectorization. #22038

Conversation

VitalyFedyunin commented Jun 20, 2019 • edited Loading

cpuhrsch commented Jun 20, 2019

facebook-github-bot left a comment

Choose a reason for hiding this comment

ezyang left a comment

Choose a reason for hiding this comment

cpuhrsch left a comment • edited Loading

Choose a reason for hiding this comment

facebook-github-bot commented Jun 21, 2019

VitalyFedyunin commented Jun 20, 2019 •

edited

Loading

cpuhrsch left a comment •

edited

Loading