Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemon closes response stream prematurely when reaching MaxBlocks limit #329

Closed
bajtos opened this issue Jun 23, 2023 · 5 comments
Closed

Comments

@bajtos
Copy link
Contributor

bajtos commented Jun 23, 2023

Steps to reproduce:

  1. Start the daemon with a MaxBlocks limit.

    ❯ go run ./cmd/lassie -- daemon --port 3000 --maxblocks 10
    
  2. Retrieve content that has more than 10 blocks. For example, the full XKCD Archives.

    ❯ curl 'http://127.0.0.1:3000/ipfs/QmdmQXB2mzChmMeKY47C43LxUdg1NDJ5MWcKMKxDu7RgQm' > slice.car
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100  168k    0  168k    0     0   183k      0 --:--:-- --:--:-- --:--:--  184k
    curl: (18) transfer closed with outstanding read data remaining
    
  3. As you can see, curl reports an error. I am getting a similar error when making the HTTP request using Rust's ureq crate:

    cannot read response body: Custom { kind: InvalidInput, error: DecoderError }
    

What is the expected behaviour in this case?

@rvagg
Copy link
Member

rvagg commented Jun 23, 2023

What would you like the behaviour to be? If we close without error then how would you know whether the DAG should be 10 blocks or more without doing the traversal locally? I guess it's certainly an approach we could take with it if that was important.

"max blocks" was always a bit of a hack, I don't believe it's used by Saturn anymore. The main problem is that because we do depth-first, the blocks you get are unlikely to be all that useful; you might be better off making a cancellation decision on the consumer side.

The reason it errors is that the couple of places where we implement it, we do it as LinkBudget in a traversal, which will exhaust and error: https://github.com/ipld/go-ipld-prime/blame/499e9b9c202423ae1dcbe52b4320637e82915317/traversal/walk.go#L292

We could potentially catch that, I'm just not sure whether it should be considered clean or not. Maybe if you're explicitly opting in to it then it should? It'd be interesting to know the use case here.

@willscott
Copy link
Contributor

followup block requests from saturn are translated into requests with max blocks set still fwiw

@bajtos
Copy link
Contributor Author

bajtos commented Jun 26, 2023

Thank you for explaining the reasoning that led to the current implementation. As I thought more about this, I saw how the current solution makes sense. It would be nice if the documentation could describe this behaviour so that users know it's intentional and expected.

I discovered today that a similar (or the same) error is reported when the retrieval request times out after the response headers were sent.

Lassie args:

❯ lassie daemon --port 3000 --global-timeout 20s
Lassie daemon listening on address 127.0.0.1:3000
Hit CTRL-C to stop the daemon

Retrieval timing out:

❯ curl 'http://127.0.0.1:3000/ipfs/bafybeibwng6645cvuszkwgb7w2bqybryo5haefdqebvzikhb7hd4dc3exe?protocols=bitswap&providers=%2Fdns4%2Felastic.dag.house%2Ftcp%2F443%2Fwss%2Fp2p%2FQmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC' > slice.car
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12.4M    0 12.4M    0     0   638k      0 --:--:--  0:00:20 --:--:-- 1734k
curl: (18) transfer closed with outstanding read data remaining

When I run Lassie with the default config:

❯ curl 'http://127.0.0.1:3000/ipfs/bafybeibwng6645cvuszkwgb7w2bqybryo5haefdqebvzikhb7hd4dc3exe?protocols=bitswap&providers=%2Fdns4%2Felastic.dag.house%2Ftcp%2F443%2Fwss%2Fp2p%2FQmQzqxhK82kAmKvARFZSkUVS6fo9sySaiogAnx5EnZ6ZmC' > slice.car
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  231M    0  231M    0     0  2150k      0 --:--:--  0:01:50 --:--:-- 2731k

@bajtos
Copy link
Contributor Author

bajtos commented Jun 26, 2023

It'd be interesting to know the use case here.

I opened a new GH issue, I feel the topic is different enough:

If we close without error then how would you know whether the DAG should be 10 blocks or more without doing the traversal locally? I guess it's certainly an approach we could take with it if that was important.
We could potentially catch that, I'm just not sure whether it should be considered clean or not. Maybe if you're explicitly opting in to it then it should?

Yeah, I think this depends on how smart the clients must be. Your current approach is great because the client can be dumb; it does not need to understand CAR format. The client can simply treat the response as an opaque stream and trust Lassie to verify correctness and completeness.

One option that comes to my mind is to leverage HTTP/2 Trailer. It won't work for HTTP/1.1 clients. Also, the developers writing the client code must take care to check the Trailer to see if there was any error since the response status code will indicate success. IMO, it's safer to indicate errors via HTTP/2 Trailer only when clients explicitly ask for this behaviour.

For us (Zinnia & Filecoin Station), we are fine with the current version that aborts the response in a way that triggers a client-side reading error.

@rvagg
Copy link
Member

rvagg commented Jun 27, 2023

For us (Zinnia & Filecoin Station), we are fine with the current version that aborts the response in a way that triggers a client-side reading error.

OK, so I'll close this for now, but for anyone else reading this who might have a different use case for which the dirty abort is not satisfactory, we could relitigate this and come up with a solution - perhaps a --no-dirty or something when used in conjunction with a max-blocks, or timeout. There's a clean(ish) path to that for max-blocks by putting that value into the Etag, or even the default filename, i.e. this request includes a max-blocks, the non-max request is a different thing. For timeout it's a bit different since that depends on conditions outside of our control.

@rvagg rvagg closed this as completed Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants