Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handler: support range requests for GET /upload #1048

Closed
wants to merge 1 commit into from

Conversation

markspolakovs
Copy link

@markspolakovs markspolakovs commented Dec 14, 2023

Where the store supports it, use http.ServeContent to serve the contents of the upload, thereby adding support for range requests:

The main benefit of ServeContent over io.Copy is that it handles Range requests properly, sets the MIME type, and handles If-Match, If-Unmodified-Since, If-None-Match, If-Modified-Since, and If-Range requests.

This helps optimise downloading large files directly from tusd.

@Acconut
Copy link
Member

Acconut commented Dec 15, 2023

Thank you for this PR! Do you know which stores return an io.ReadSeeker? It's probably only the file store.

@markspolakovs
Copy link
Author

Do you know which stores return an io.ReadSeeker? It's probably only the file store.

From a cursory look that appears to be the case - though I imagine that in theory we could wrap e.g. s3store with a ReadSeeker that uses range requests. For now though this would only work for the file store, the Range header would be ignored for all the others.

@@ -646,6 +646,7 @@ func (handler *UnroutedHandler) HeadFile(w http.ResponseWriter, r *http.Request)
} else {
resp.Header["Upload-Length"] = strconv.FormatInt(info.Size, 10)
resp.Header["Content-Length"] = strconv.FormatInt(info.Size, 10)
resp.Header["Accept-Ranges"] = "bytes"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to check if the underlying store returns a ReadSeeker before advertising this

@Acconut
Copy link
Member

Acconut commented Dec 21, 2023

though I imagine that in theory we could wrap e.g. s3store with a ReadSeeker that uses range requests.

Yes, that would be possible, but likely not very efficient. From briefly looking through the code behind http.ServeContent, it's apparent that the function performs multiple seek operations to determine the file size, content type, and reading the requested range. Each of these seek operations would result in one range request to S3. So a single range request to tusd can trigger multiple requests to S3.

A better approach would be to pass the range request directly to the datastore, so it can fetch the requested range from the underlying storage provider. In tusd, we enabled storages to implement such optional features through additional interfaces. For example, a storage can implement the TerminaterDataStore to provide cancellable uploads:

type TerminatableUpload interface {
// Terminate an upload so any further requests to the upload resource will
// return the ErrNotFound error.
Terminate(ctx context.Context) error
}
// TerminaterDataStore is the interface which must be implemented by DataStores
// if they want to receive DELETE requests using the Handler. If this interface
// is not implemented, no request handler for this method is attached.
type TerminaterDataStore interface {
AsTerminatableUpload(upload Upload) TerminatableUpload
}

We could define a similar interface for fetching ranges. For the s3store, the implementation would then send a range request to S3. For the filestore, it could just seek to the desired position and read from there. The handler can then also detect whether the storage supports range requests at all and set the Accept-Ranges header accordingly.

This approach would require more effort to get it working for your use case, but also allows this feature to be provided by other storages in the future. Maybe the interface could even be extended to also allow handling the If-* headers.

What do you think?

@markspolakovs
Copy link
Author

I like that approach - it's a lot more effort, but it'd mean that we can extend the support to the other stores (I didn't realise that that's how ServeContent operates...).

Out of interest, I did some searching and found #432 where you were against implementing Range support in tusd - have you changed your mind on this? For context, my use case for adding this was to optimise another internal service downloading files from tusd - in the past when it failed it'd need to restart from the beginning. (I've worked around this by having this service read directly from minio, but it does mean I lose the flexibility of moving between backing stores and keeping tusd as a "generic" API).

In any case, we should probably move this conversation to an issue to sketch out API design if you're in favour of this in principle - closing this PR for now.

@Acconut
Copy link
Member

Acconut commented Jan 8, 2024

Out of interest, I did some searching and found #432 where you were against implementing Range support in tusd - have you changed your mind on this?

Yes, I did change my opinion here. We have recently been involved in the HTTP working group with the goal of making a IETF standard for resumable uploads. In the end, we hope that resumable uploads are one component in the HTTP toolbox that interacts well with the existing HTTP mechanism. Since then I am more interested in making sure that tusd works better with concepts, such as range requests.

In any case, we should probably move this conversation to an issue to sketch out API design if you're in favour of this in principle - closing this PR for now.

Great idea, feel free to open an issue if you are still interested in working on this :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants