Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 multipart only showing files in bucket after all upload #4326

Closed
2 tasks done
tadejsv opened this issue Feb 21, 2023 · 9 comments · Fixed by #4542
Closed
2 tasks done

S3 multipart only showing files in bucket after all upload #4326

tadejsv opened this issue Feb 21, 2023 · 9 comments · Fixed by #4542
Assignees
Labels

Comments

@tadejsv
Copy link

tadejsv commented Feb 21, 2023

Initial checklist

  • I understand this is a bug report and questions should be posted in the Community Forum
  • I searched issues and couldn’t find anything (or linked relevant results below)

Link to runnable example

No response

Steps to reproduce

When I upload multiple files using S3 multipart upload, they only show up in S3 after they all upload - even if uppy reports individuals files have already uploaded.

Expected behavior

I would want to see the uploaded files appear in S3 as they finish uploading.

Actual behavior

Files only appear in S3 after they all finish uploading

@aduh95
Copy link
Contributor

aduh95 commented Mar 13, 2023

What version of @uppy/aws-s3-multipart are you using? How big are the files you are trying to upload? What do you mean by "files showing up in S3", how are you testing that?

@tadejsv
Copy link
Author

tadejsv commented Mar 13, 2023

@aduh95 I am using @uppy/aws-s3-multipart version 3.1.1. Files are ~500MB each. I am testing if files show up in s3 using aws s3 ls, but any method using AWS API will give you the same result.

Also, I did some further tests - files start showing in S3 in batches (e.g. after first X are uploaded, they all show up), not necessarily when they all upload (so I am seeing nothing till (let's say) 6 files upload - then they suddenly all appear, then nothing new till the next batch of 6 files uploads, etc.)

@Murderlon
Copy link
Member

I think this is because our priority queue works with files, but a file can consistent of many requests and we prioritise the upload requests higher than the finish ones. I don't think this is a problem. Other than noticing it, do you actually have a problem with this?

@tadejsv
Copy link
Author

tadejsv commented Mar 14, 2023

@Murderlon Sorrt, not sure we are talking about the same thing: the problem I have is that even after uppy reports that a file is uploaded, it only shows up in S3 after a batch of files has been uploaded - so the first uploaded file will only show up after the next 5 have also finished uploading.

Well, my issue with this is that I want to monitor the upload process on the backend, and kick off some processes once a file finishes uploading - this introduces a noise in the process. Nothing serious, but I think if this can be fixed easily it should be.

@tadejsv
Copy link
Author

tadejsv commented Mar 14, 2023

With "finish" request, are you refering to CompleteMultipartUpload? If so, I think it would make sense to give this request a priority over further upload requests.

@Murderlon
Copy link
Member

I see! I think we are talking about the same thing. Imagine a queue of two files, one file can have all of its bytes uploaded, but since we prioritise upload requests higher they will came in front of the queue, meaning the finish request (which is a separate request, required by the S3 spec) will come later.

I agree that there could be some improvement there. What do you think @aduh95?

@tadejsv
Copy link
Author

tadejsv commented Mar 14, 2023

Okay, good! Just to make my complaint a bit more concrete: the sooner the file shows up in S3, the sooner I can do some processing on it. If I have to wait for all files to upload, this causes the whole workflow to complete more slowly - especially if I have some capacity constraints for the backend processing job, as now the files all queue up at once.

I think from your perspective giving priority to finish requests shouldn't be an issue - it's a request with almost no payload, unless someone is uploading thousands of files it shouldn't really slow down the uploads

@nmbrgts
Copy link

nmbrgts commented Apr 12, 2023

I just wanted to chime in and try to give this ticket a nudge 🙂

This can be a pretty unfortunate bug to run into under the right conditions. The impact to upload speed is really felt for larger batches since uploads that are basically done can end up taking a significantly longer time depending on the total size of the remaining batch.

This also extends to any uploads that are started with existing uploads in flight -- which can feel very counterintuitive. For example: If a user adds a new upload before an existing upload would end, that existing upload is now waiting for the new upload to finish. This scenario can feel pretty frustrating haha

Is there anything that I can do to help move this ticket forward?

I tried messing with the priorities defined in packages/@uppy/aws-s3-multipart/src/index.js#L54-L78 but, that didn't seem to help. So it seems like this might not be an issue with prioritization... Could it be an issue with when completeMultipartUpload() is added to the queue? ... Turns out it does work and it's a very small change. Maybe you'll see a PR for this 🙂

@aduh95
Copy link
Contributor

aduh95 commented May 25, 2023

@nmbrgts could you share the changes you made? Was it a matter of assigning some priority for the completeMultipartUpload call?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants