Make `Dagger.finish_stream()` propagate downstream #579

JamesWrigley · 2024-11-16T20:10:35Z

Previously a streaming task calling Dagger.finish_stream() would only stop the caller, but now it will also stop all downstream tasks. This is done by:

Getting the output handler tasks to close their RemoteChannel when exiting.
Making the input handler tasks close their buffers when the RemoteChannel is closed.
Exiting stream!() when an input buffer is closed.

One question, what exactly are the DropBuffer tests for? Are they just for testing DropBuffer (its usefulness is not clear to me), or to test that streaming tasks should throw exceptions when fetch()'d after cancelling? For now I've just disabled those tests because with the new behaviour a task stopping for any reason will also stop the downstream tasks. So currently in the tests the x task will reach max_evals and finish, which also cause A to finish, which means everything stops gracefully and A is never cancelled so it doesn't throw an exception.

return_type() is kinda broken in v1.10, see: JuliaLang/julia#52385 In any case Base.promote_op() is the official public API for this operation so we should use it anyway.

This always us to handle all the other kinds of task specs.

This should get the docs building again.

Because it doesn't actually do anything now.

Using `myid()` with `workers()` meant that when the context was initialized with a single worker the processor list would be: `[OSProc(1), OSProc(1)]`. `procs()` will always include PID 1 and any other workers, which is what we want.

This is a bit nicer than commenting/uncommenting a line in the code.

Otherwise it may spin (see comments for details). Also refactored it into a while-loop instead of using a @goto.

This is useful for testing and benchmarking.

This is currently necessary for the streaming branch, we'll have to change this later but it's good to have CI working for now.

This works by converting the output buffers into a safely-serializeable container and sending that to the new node.

This makes them be displayed as if they were running under the original task.

This makes the tests a little easier to control.

… tasks

Switch from RemoteFetcher to RemoteChannelFetcher Pass object rather than type to `stream_{push,pull}_values!` ProcessRingBuffer: Don't exit on graceful interrupt when non-empty

- Added some whitespace. - Deleted the unused `rand_finite()` methods. - Allow passing the `timeout` to `test_finishes()` - Fix bug in one of the tests where we weren't waiting for all the tasks to finish, which would occasionally cause test failures because of the race condition.

Previously a streaming task calling `Dagger.finish_stream()` would only stop the caller, but now it will also stop all downstream tasks. This is done by: - Getting the output handler tasks to close their `RemoteChannel` when exiting. - Making the input handler tasks close their buffers when the `RemoteChannel` is closed. - Exiting `stream!()` when an input buffer is closed.

`unwrap_nested_exception()` now supports `DTaskFailedException` so we can match against the real exceptions thrown.

jpsamaroo · 2024-12-03T23:55:43Z

I'm cherry-picking portions of this PR into #463. I'll also be implementing a waitany-based DAG teardown feature that should accomplish the same as this PR's intention but also propagating upstream too.

Thanks for this!

JamesWrigley · 2024-12-04T09:51:17Z

Sounds good :) I'll close this then.

jpsamaroo and others added 30 commits November 15, 2024 17:03

Add metadata to EagerThunk

90a974f

Sch: Allow occupancy key to be Any

cbac605

Add streaming API

e441bd0

Reference Dagger.EAGER_THUNK_STREAMS explicitly

17096fa

Use Base.promote_op() instead of Base._return_type()

563f646

return_type() is kinda broken in v1.10, see: JuliaLang/julia#52385 In any case Base.promote_op() is the official public API for this operation so we should use it anyway.

Special-case StreamingFunction in EagerThunkMetadata() constructor

2f29be7

This always us to handle all the other kinds of task specs.

Fix reference to task-queues.md in the docs

d25d6c1

This should get the docs building again.

Delete Dagger.cleanup()

aa07cc9

Because it doesn't actually do anything now.

streaming: Show thunk ID in logs

f8d0b8b

streaming: Add tests

1be1a41

Use procs() when initializing EAGER_CONTEXT

f58404a

Using `myid()` with `workers()` meant that when the context was initialized with a single worker the processor list would be: `[OSProc(1), OSProc(1)]`. `procs()` will always include PID 1 and any other workers, which is what we want.

streaming: Fix concurrency issues

ee11f3f

Add a --verbose option to runtests.jl

0d70835

This is a bit nicer than commenting/uncommenting a line in the code.

Ensure that stream_fetch_values!() yields in its loop

27392f0

Otherwise it may spin (see comments for details). Also refactored it into a while-loop instead of using a @goto.

Add support for limiting the evaluations of a streaming DAG

717ceb7

This is useful for testing and benchmarking.

Dev the migration-helper branch of MemPool.jl

a0c0805

This is currently necessary for the streaming branch, we'll have to change this later but it's good to have CI working for now.

Minor style cleanup

f4d709c

Use DTaskFailedException and increase the default timeout

a7bdfdb

Initial support for robustly migrating streaming tasks

770a241

This works by converting the output buffers into a safely-serializeable container and sending that to the new node.

Inherit the top-level testsets in the streaming tests

0b968d6

This makes them be displayed as if they were running under the original task.

Replace rand_finite() with a deterministic Producer functor

0268b7e

This makes the tests a little easier to control.

fixup! Initial support for robustly migrating streaming tasks

0dbdab3

fixup! fixup! Initial support for robustly migrating streaming tasks

1cf99b8

task-tls: Refactor into DTaskTLS struct

71ee854

fixup! task-tls: Refactor into DTaskTLS struct

79ee021

cancellation: Add cancel token support

09e5826

streaming: Handle cancellation

3911a73

fixup! cancellation: Add cancel token support

f71f604

fixup! fixup! fixup! Initial support for robustly migrating streaming…

b930a42

… tasks

Sch: Add unwrap_nested_exception for DTaskFailedException

16d73c9

jpsamaroo added 15 commits November 15, 2024 17:03

ProcessRingBuffer: Allow closure

d5c27ab

RemoteFetcher: Only collect values up to free buffer space

fbae73f

streaming: Close buffers on closing StreamStore

bf53117

task-tls: Tweaks and fixes, task_id helper

b9e3c70

task-tls: Add task_cancel!

8908478

streaming: max_evals cannot be specified as 0

1f21693

streaming: Small tweaks to migration and cancellation

c4bc7b2

dagdebug: Always yield to avoid heisenbugs

51e1606

tests: Revamp streaming tests

4ea09c4

tests: Add offline mode

8bf5fbf

dagdebug: Add JULIA_DAGGER_DEBUG config variable

07ba8b1

cancellation: Add graceful vs. forced

3aba122

cancellation: Wrap InterruptException in DTaskFailedException

6ac140c

options: Add internal helper to strip all options

f60cb77

streaming: Get tests passing

b3b70e1

Switch from RemoteFetcher to RemoteChannelFetcher Pass object rather than type to `stream_{push,pull}_values!` ProcessRingBuffer: Don't exit on graceful interrupt when non-empty

JamesWrigley requested a review from jpsamaroo November 16, 2024 20:10

JamesWrigley self-assigned this Nov 16, 2024

JamesWrigley added 3 commits November 16, 2024 23:53

Bump MemPool compat

efc80be

Fully lock StreamStore in close(::StreamStore)

e6a504d

JamesWrigley force-pushed the finish-stream branch from 4ea4fbb to e1ccbfe Compare November 16, 2024 22:54

JamesWrigley added 2 commits November 17, 2024 15:26

Fix @test_throws_unwrap tests

999bdd7

`unwrap_nested_exception()` now supports `DTaskFailedException` so we can match against the real exceptions thrown.

JamesWrigley force-pushed the finish-stream branch from bf1a4ee to 999bdd7 Compare November 17, 2024 14:27

jpsamaroo force-pushed the jps/stream2 branch from 245d06a to e772af0 Compare November 26, 2024 21:17

jpsamaroo force-pushed the jps/stream2 branch 2 times, most recently from a0b4a0c to d259f57 Compare December 4, 2024 00:31

JamesWrigley closed this Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `Dagger.finish_stream()` propagate downstream #579

Make `Dagger.finish_stream()` propagate downstream #579

JamesWrigley commented Nov 16, 2024

jpsamaroo commented Dec 3, 2024

JamesWrigley commented Dec 4, 2024

Make Dagger.finish_stream() propagate downstream #579

Make Dagger.finish_stream() propagate downstream #579

Conversation

JamesWrigley commented Nov 16, 2024

jpsamaroo commented Dec 3, 2024

JamesWrigley commented Dec 4, 2024

Make `Dagger.finish_stream()` propagate downstream #579

Make `Dagger.finish_stream()` propagate downstream #579