Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specialized codegen for opaque closure calls #49337

Merged
merged 7 commits into from
May 2, 2023
Merged

Specialized codegen for opaque closure calls #49337

merged 7 commits into from
May 2, 2023

Conversation

Keno
Copy link
Member

@Keno Keno commented Apr 12, 2023

Benchmark:

using Base.Experimental: @opaque
f() = @opaque (x::Float64)->x+1.0
vec = [f() for i = 1:10_000];
g((x,f),) = f(Float64(x))

Before:

julia> @time mapreduce(g, +, enumerate(vec))
  0.001928 seconds (30.00 k allocations: 781.297 KiB)
5.0015e7

After:

julia> @time mapreduce(g, +, enumerate(vec))
  0.000085 seconds (3 allocations: 48 bytes)
5.0015e7

@aviatesk
Copy link
Member

Nice improvements!
I tried out this pull request for a while and found this error:

julia> cls = x::Float64 -> x + 1.0
#3 (generic function with 1 method)

julia> opc = Base.Experimental.@opaque x::Float64 -> x + 1.0
(::Float64)::Any->◌

julia> callcls(cls, x) = cls(x)
callcls (generic function with 1 method)

julia> callcls(cls, 1.0)
2.0

julia> callcls(opc, 1.0)

[15898] signal (11.2): Segmentation fault: 11
in expression starting at REPL[5]:1
unknown function (ip: 0x0)
Allocations: 636559 (Pool: 635390; Big: 1169); GC: 1
[1]    15898 segmentation fault  ./usr/bin/julia

@chriselrod
Copy link
Contributor

chriselrod commented Apr 13, 2023

I can reproduce aviatesk's report above:

julia> using Base.Experimental: @opaque

julia> fo = @opaque (x::Float64)->(x+1.0)
(::Float64)::Any->◌

julia> f = (x::Float64)->(x+1.0)
#4 (generic function with 1 method)

julia> g(f,x) = f(x)
g (generic function with 1 method)

julia> g(f, 5.0)
6.0

julia> g(fo, 5.0)

[13851] signal (11.1): Segmentation fault
in expression starting at REPL[6]:1
unknown function (ip: (nil))
Allocations: 2000414 (Pool: 1998635; Big: 1779); GC: 4
segmentation fault (core dumped)

OpaqueClosures have long been touted as a potential FunctionWrappers replacement.
Two issues currently:

  1. segmentation fault
  2. what's the syntax for specifying the return type? Is that reasonable to infer from a concrete argument type?

EDIT:
I do see that

julia> f()
(::Float64)::Float64->

so, when @opaque is used inside f, it does infer the return type.

@Keno
Copy link
Member Author

Keno commented Apr 13, 2023

  • segmentation fault

I did say WIP right in the pull request title ;)

  • what's the syntax for specifying the return type? Is that reasonable to infer from a concrete argument type?

Currently, the macro infers the return type, but the underlying mechanism has the capability to assert it. We could add a function that restricts the return, but for the time being, if you need to assert it, I would use a return type declaration (which converts rather than asserts, but should be good enough).

@Keno Keno changed the title WIP: Specialized codegen for opaque closure calls Specialized codegen for opaque closure calls Apr 28, 2023
@Keno
Copy link
Member Author

Keno commented Apr 28, 2023

Updated. I think this is complete now. Could maybe use another test or two for the tricky cases, but I think the implementation is pretty much there.

@aviatesk
Copy link
Member

analyzegc is complaining

ANALYZE src/clang-tidy-opaque_closure
--
  | /cache/build/default-amdci4-5/julialang/julia-master/src/opaque_closure.c:61:19: error: Implicit Atomic seq_cst synchronization [concurrency-implicit-atomics,-warnings-as-errors]
  | specptr = ci->specptr.fptr;
  | ^
  | /cache/build/default-amdci4-5/julialang/julia-master/src/opaque_closure.c:98:19: error: Implicit Atomic seq_cst synchronization [concurrency-implicit-atomics,-warnings-as-errors]
  | specptr = ci->specptr.fptr;
  | ^
  | make: *** [Makefile:498: clang-tidy-opaque_closure] Error 1

@@ -235,6 +235,8 @@ typedef jl_value_t *(*jl_fptr_sparam_t)(jl_value_t*, jl_value_t**, uint32_t, jl_
extern jl_call_t jl_fptr_interpret_call;
JL_DLLEXPORT extern jl_callptr_t jl_fptr_interpret_call_addr;

JL_DLLEXPORT extern jl_callptr_t jl_f_opaque_closure_call_addr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need support in all of the other places we declare "custom" calling conventions? (e.g. jl_invoke_api, staticdata.c) It is probably quite awkward currently to add new ones unfortunately.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. The only place this goes is the codeinstance invoke in the builtin mt, but that already has a codeinstance in it with this exact same ->invoke, so how does that work now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that does seem unclear. The magic strings in decls.functionObject do have weird handling in various places though, so I don't know if this needs it too.

Comment on lines +104 to +118
oc->invoke = invoke;
oc->specptr = specptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to assume that specptr is always specsig, but sometimes codegen may put a different token into invoke, in which case the object it stored into specptr might not be the requisite object. It seems like we might need 2 fields: one for the closure data given by invoke, and one for the guaranteed specsig (assuming specsig is valid), which might have been allocated by emit_cfunc_invalidate to be a reverse-trampoline

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code above is supposed to check for those special cases and rewrite them into something that works for OC. This does not have the exact same semantics as the corresponding codeinstance fields.

JeffBezanson and others added 5 commits April 29, 2023 03:44
Benchmark:
```
using Base.Experimental: @opaque
f() = @opaque (x::Float64)->x+1.0
vec = [f() for i = 1:10_000];
g((x,f),) = f(Float64(x))
```

Before:
```
julia> @time mapreduce(g, +, enumerate(vec))
  0.001928 seconds (30.00 k allocations: 781.297 KiB)
5.0015e7
```

After:
```
julia> @time mapreduce(g, +, enumerate(vec))
  0.000085 seconds (3 allocations: 48 bytes)
5.0015e7
```
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
@Keno Keno merged commit 70ebadb into master May 2, 2023
@Keno Keno deleted the jb/OCcall branch May 2, 2023 01:48
@vtjnash
Copy link
Member

vtjnash commented May 2, 2023

This commit message seems to be completely wrong after the squash merge. Please try to be accurate, as otherwise bisecting and backporting and blaming become confusing to discuss why we are working with a commit described as "WIP codegen for opaque closure calls", since WIP should not be on master.

@ChrisRackauckas
Copy link
Member

What's the final performance here vs FunctionWrappers?

@oscardssmith
Copy link
Member

I'm seeing ~15% faster (assuming this is a valid benchmark)

julia> using Base.Experimental: @opaque
julia> import FunctionWrappers: FunctionWrapper

julia> f() = @opaque (x::Float64)->x+1.0
f (generic function with 1 method)

julia> vec = [f() for i = 1:10_000];

julia> g((x,f),) = f(Float64(x))
g (generic function with 1 method)

julia> @time mapreduce(g, +, enumerate(vec))
  0.042754 seconds (70.03 k allocations: 4.816 MiB, 99.60% compilation time)
5.0015e7

julia> vec = [f() for i = 1:10_000_000];

julia> @time mapreduce(g, +, enumerate(vec))
  0.051242 seconds (3 allocations: 48 bytes)
5.0000015e13

julia> f() = FunctionWrapper{Float64, Tuple{Float64,}}(x->x+1.0)
f (generic function with 1 method)

julia> vec = [f() for i = 1:10_000_000];

julia> @time mapreduce(g, +, enumerate(vec))
  0.140326 seconds (95.18 k allocations: 6.408 MiB, 56.31% compilation time)
5.0000015e13

julia> @time mapreduce(g, +, enumerate(vec))
  0.060598 seconds (3 allocations: 48 bytes)
5.0000015e13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants