Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: streaming output of CAR contents #69

Merged
merged 4 commits into from
Feb 8, 2023
Merged

feat: streaming output of CAR contents #69

merged 4 commits into from
Feb 8, 2023

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Feb 6, 2023

Ref: ipld/go-car#363

I've ditched the blockstore interface for CAR usage (CLI and http) in favour of the new go-ipld-prime/storage style interface I've implemented over in go-car (link above).

For HTTP we have the following:

  • We set up our initial linksystem with a new thing I've introduced here called StreamingStore that does lazily initialisation. It sets up a read/write CAR as a temporary file only once a get/has/put operation is called. So a no-candidates retrieval won't even touch a file.
  • StreamingStore has a primary read/write CAR storage so graphsync can do its thing, full put/get/has functionality. It also has a second write-only CAR that it attaches as soon as a put is first called. This is done by invoking a callback to get an io.Writer, so we have a chance to intercept the point where we start receiving blocks. We give it the HTTP response object and it sets up a CARv1 streaming writer to it and every Put() called on the StreamingStore is written to both the read/write and the write-only CAR interfaces. It'll block until both sub-Put() operations complete so we have appropriate backpressure as required by disk or network, but we also get full error handling of course (!).
  • There's an error callback from StreamingStore so we can handle errors directly rather than them all having to bubble up through the main retrieval call. This is mainly useful for logging but it's also useful (I think) for early-error returns to the client. (We need some HTTP testing to figure out if this is even worthwhile, it may be that the lassie retrieve error is enough; there are some internal error possibilities in StreamingStore but they may bubble up through the storage interfaces).
  • Returning from the HTTP handler function ends the connection, so I believe that in the case of an error, as long as everything shuts down properly, the client should get a disconnect, regardless of where it happens. Likewise the reverse should be true. The client should be able to end prematurely and the context cancellation should flow through all of this.
  • I couldn't use Lassie#Fetch(), sorry, we just need too much control over this so rather than messing with that nice API I introduced a Lassie#Retrieve(). The main headache is wanting the retrieval ID before it happens, but once you make that an argument for Fetch() then there's not much left! Probably something we need to discuss.

@codecov-commenter
Copy link

codecov-commenter commented Feb 6, 2023

Codecov Report

Merging #69 (fff9334) into main (cb1eb2f) will increase coverage by 1.53%.
The diff coverage is 36.69%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #69      +/-   ##
==========================================
+ Coverage   42.83%   44.37%   +1.53%     
==========================================
  Files          28       29       +1     
  Lines        2423     2544     +121     
==========================================
+ Hits         1038     1129      +91     
- Misses       1350     1369      +19     
- Partials       35       46      +11     
Impacted Files Coverage Δ
cmd/lassie/fetch.go 0.00% <0.00%> (ø)
cmd/lassie/internal/wrappedstore.go 0.00% <0.00%> (ø)
pkg/lassie/lassie.go 0.00% <0.00%> (ø)
server/http/ipfs.go 0.00% <0.00%> (ø)
internal/streamingstore/streamingstore.go 71.65% <71.65%> (ø)

Copy link
Collaborator

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I missing anything cause of being tired, but all this seems reasonable. As does the go-car PR. This PR does a few weird code moves, but it looks like it does them for good reason.

internal/streamingstore/streamingstore.go Outdated Show resolved Hide resolved
server/http/ipfs.go Show resolved Hide resolved
@@ -0,0 +1,204 @@
package streamingstore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if server is migrating to pkg, feels like this should be pkg/internal rather than top level internal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

server/http/ipfs.go:11:2: use of internal package github.com/filecoin-project/lassie/pkg/internal/streamingstore not allowed (compile)

will have to wait till server is moved since only packages under pkg can use it once it's in there.

cmd/lassie/fetch.go Show resolved Hide resolved
@rvagg rvagg force-pushed the rvagg/streaming-car branch from 49546c8 to ddbdb78 Compare February 8, 2023 05:08
@rvagg
Copy link
Member Author

rvagg commented Feb 8, 2023

rebased to master

somehow a bunch of other dependencies got upgraded in this, I don't recall doing that but it doesn't seem bad

@rvagg rvagg force-pushed the rvagg/streaming-car branch from ddbdb78 to 657dc28 Compare February 8, 2023 05:12
@rvagg rvagg merged commit 5ce14b7 into main Feb 8, 2023
@rvagg rvagg deleted the rvagg/streaming-car branch February 8, 2023 06:11
@rvagg rvagg mentioned this pull request Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants