Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite lerp operator to use TensorIterator and support compile-time vectorization. #22038

Closed

Conversation

VitalyFedyunin
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin commented Jun 20, 2019

Get benefit from the compile time vectorization and multi-threading.

Before:

In [1]: import torch                                                                                                                                                                                       
In [2]: x = torch.randn(1000000)                                                                                                                                                                           
In [3]: y = torch.randn(1000000)                                                                                                                                                                           
In [4]: w = 0.7                                                                                                                                                                                            
In [5]: timeit torch.lerp(x, y, w)                                                                                                                                                                         
2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

After:

In [1]: import torch                                                                                                                                                                                       
In [2]: x = torch.randn(1000000)                                                                                                                                                                           
In [3]: y = torch.randn(1000000)                                                                                                                                                                           
In [4]: w = 0.7                                                                                                                                                                                            
In [5]: timeit torch.lerp(x, y, w)                                                                                                                                                                         
452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

After with multi-processing:

In [1]: import torch                                                                                                                                                                                       
In [2]: x = torch.randn(1000000)                                                                                                                                                                           
In [3]: y = torch.randn(1000000)                                                                                                                                                                           
In [4]: w = 0.7                                                                                                                                                                                            
In [5]: timeit torch.lerp(x, y, w)                                                                                                                                                                         
167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

@pytorchbot pytorchbot added module: cpu CPU specific problem (e.g., perf, algorithm) module: operators labels Jun 20, 2019
@cpuhrsch
Copy link
Contributor

@pytorchbot retest this please

@VitalyFedyunin VitalyFedyunin requested a review from cpuhrsch June 21, 2019 15:06
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@cpuhrsch cpuhrsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bonus points we'd add this to more benchmark infra.

@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in fe580e8.

iotamudelta pushed a commit to ROCm/pytorch that referenced this pull request Jun 21, 2019
…vectorization. (pytorch#22038)

Summary:
Get benefit from the compile time vectorization and multi-threading.

Before:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

After:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

After with multi-processing:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
Pull Request resolved: pytorch#22038

Differential Revision: D15941468

Pulled By: VitalyFedyunin

fbshipit-source-id: fa8a5126187df4e6c849452e035b00b22be25739
zdevito pushed a commit to zdevito/ATen that referenced this pull request Jun 21, 2019
…vectorization. (#22038)

Summary:
Get benefit from the compile time vectorization and multi-threading.

Before:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
2.29 ms ± 23.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

After:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
452 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
```

After with multi-processing:

```python
In [1]: import torch
In [2]: x = torch.randn(1000000)
In [3]: y = torch.randn(1000000)
In [4]: w = 0.7
In [5]: timeit torch.lerp(x, y, w)
167 µs ± 48.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```
Pull Request resolved: pytorch/pytorch#22038

Differential Revision: D15941468

Pulled By: VitalyFedyunin

fbshipit-source-id: fa8a5126187df4e6c849452e035b00b22be25739
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: cpu CPU specific problem (e.g., perf, algorithm)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants