Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable 64-bit vectorization #8452

Merged
merged 2 commits into from
Sep 23, 2014
Merged

Enable 64-bit vectorization #8452

merged 2 commits into from
Sep 23, 2014

Conversation

ArchRobison
Copy link
Contributor

Yes, the title is correct despite the innocent appearance of the diff. The patch represents jl_array_t more accurately in LLVM. I implemented only the data and length fields since there was no clear benefit for representing the rest of the fields.

Julia's pointer casting was confusing stride computations in LLVM's vectorizer, so it mistook unit-stride load/store as high cost gather/scatter. I had been wondering why the vectorizer was pricing a vector load at 51 clock cycles! I don't know why the problem did not exist for Float32, but it seemed some LLVM transform was pushing casts around differently when the size of the array element type matched sizeof(jl_value_t*). So maybe title is wrong for 32-bit targets :-)

Though we could try to get LLVM fixed, I think it's better for the code generator to "say what it means" and generate code closer to what Clang generates, since Clang is a prime client of LLVM.

With LLVM 3.3, the 64-bit versions of the benchmarks in test/perf/simd ran 1.6x to 3.9x faster for me with the patch than without, using an Intel 4th Generation Core i7 processor ("Haswell").

@quinnj
Copy link
Member

quinnj commented Sep 23, 2014

Sweet!

@JeffBezanson
Copy link
Member

Awesome! cc @JuliaBackports

I wonder if this is related to the performance difference reported in JuliaLang/LinearAlgebra.jl#141 ?

@ViralBShah
Copy link
Member

Amazing!

#endif
};
Type* jl_array_llvmt =
StructType::create(getGlobalContext(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be jl_LLVMContext.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. In earlier lines, I see jl_LLVMContext is used sometimes, and getGlobalContext is used other times. What's the difference? Or is it just house-cleaning that hasn't been done yet?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently there's no functional difference because we only use one context. I imagine with the threading work progressing we may at some point have more than one, so jl_LLVMContext is preferred.

@timholy
Copy link
Member

timholy commented Sep 23, 2014

🍰

Keno added a commit that referenced this pull request Sep 23, 2014
@Keno Keno merged commit fa362dc into JuliaLang:master Sep 23, 2014
@ivarne
Copy link
Member

ivarne commented Sep 27, 2014

I can't solve the conflict that arises when cherry-picking this PR to release-0.3 with proper confidence.

If it is wanted, someone with a better understanding of the c++ files needs to do git cherry-pick dc2bca9 and git cherry-pick 4039a800.

@ArchRobison
Copy link
Contributor Author

I'll take a look.

ArchRobison pushed a commit that referenced this pull request Sep 29, 2014
@ArchRobison
Copy link
Contributor Author

Pushed into release-0.3 as commit 84e90c5c19afa24ec7ed4e7b21340d4b8ae7ae85

@ArchRobison ArchRobison deleted the adr/array branch December 9, 2014 22:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants