From 2ccf5015d52be0508e2df1b04aef0328d3dca701 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Wed, 4 Dec 2024 14:56:16 +0100 Subject: [PATCH 1/8] FRC: Retrieval Checking Requirements MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- FRCs/frc-retrieval-checking-requirements.md | 259 ++++++++++++++++++++ 1 file changed, 259 insertions(+) create mode 100644 FRCs/frc-retrieval-checking-requirements.md diff --git a/FRCs/frc-retrieval-checking-requirements.md b/FRCs/frc-retrieval-checking-requirements.md new file mode 100644 index 00000000..9fb18c07 --- /dev/null +++ b/FRCs/frc-retrieval-checking-requirements.md @@ -0,0 +1,259 @@ +--- +fip: "" +title: Retrieval Checking Requirements +author: "Miroslav Bajtoš (@bajtos)" +discussions-to: https://github.com/filecoin-project/FIPs/discussions/1086 +status: Draft +type: FRC +created: 2024-12-02 +# spec-sections: +# - +# - +# requires (*optional): +# replaces (*optional): +--- + + + +# FIP-Number: Retrieval Checking Requirements + +## Simple Summary + + + +In order to make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure, and therefore, we need to measure quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements: + +1. Link on-chain MinerId and IPNI provider identity ([spec](#link-on-chain-minerid-and-ipni-provider-identity)). +2. Provide retrieval service using the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/). +3. Advertise retrievals to IPNI. +4. In IPNI advertisements, construct the `ContextID` field from `(PieceCID, PieceSize)` ([spec](#construct-ipni-contextid-from-piececid-piecesize)) + +Meeting these requirements needs support in software implementations like Boost, Curio & Venus Droplet but potentially also updates in settings configured by the individual SPs. + +## Abstract + + + +When we set out to build [Spark](https://filspark.com), a protocol for testing whether _payload_ of Filecoin deals can be retrieved back, we designed it based on how [Boost](https://github.com/filecoin-project/boost) worked at that time (mid-2023). Soon after FIL+ allocator compliance started to use Spark retrieval success score (Spark RSR) in mid-2024, we learned that [Venus](https://github.com/filecoin-project/venus) [Droplet](https://github.com/ipfs-force-community/droplet), an alternative miner software, is implemented slightly differently and requires tweaks to support Spark. Things evolved quite a bit since then. We need to overhaul most of the Spark protocol to support Direct Data Onboarding deals. We will need all miner software projects (Boost, Curio, Venus) to accommodate the new requirements imposed by the upcoming Spark v2 release. + +This FRC has the following goals: +1. Document the retrieval process based on IPFS/IPLD. +2. Specify what Spark needs from miner software. +3. Collaborate with the community to tweak the requirements to work well for all parties involved. +4. Let this spec and the building blocks like [IPNI Reverse Index](https://github.com/filecoin-project/devgrants/issues/1781) empower other builders to design & implement their own retrieval-checking networks as alternatives to Spark. + +## Change Motivation + + +At the moment, the retrieval process for downloading (public) data stored in Filecoin deals is lacking specification and there is very little documentation for SPs on how to correctly configure their operation to provide a good retrieval service. + +The current architecture of Filecoin components does not expose enough data to enable independent 3rd-party networks to sample all data stored in Filecoin deals and check the quality of retrieval service provided by storage providers for data they are persisting. + +Our motivation is to close these gaps by documenting the current IPFS/IPLD-based retrieval process and the additional requirements needed by checker networks to measure retrieval-related service level indicators. + +> [!IMPORTANT] +> We fully acknowledge that the current IPFS/IPLD-based retrieval process may not be sufficient to support all kinds of retrieval clients. For example, warm-storage/CDN offerings may prefer to retrieve a range of bytes in a given Piece CID instead. +> +> Documenting alternative retrieval processes and the requirements for checking service level indicators of such alternatives is out of scope of this FRC. + +## Specification + + +### Retrieval Process + +Let's say we have a public dataset stored on Filecoin, packaged as UnixFS archive with CID `bafybei(...)` and stored on Filecoin in a piece with `PieceCID=baga...` and some `PieceSize`. + +The scope of this document is to support the following retrieval process: + +1. A client wanting to download the dataset identified by CID `bafybei(...)` queries an IPNI instance like [cid.contact](https://cid.contact) to find the nodes providing retrievals service for this dataset. + +2. The client picks a retrieval provider that supports the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/). + +3. The client requests the content for CID `bafybei(...)` at the URL (multiaddr) specified by the [IPNI provider result](https://github.com/ipni/specs/blob/12482e4e1bd92a7c6c079bf23f2533a4ddb9e363/IPNI.md#json-find-response) of the selected provider. + +Example IPNI `ProviderResult` describing retrieval provider offering IPFS Trustless HTTP Gateway retrievals: + +```json +{ + "MultihashResults": [{ + "Multihash": "EiAT38UKZPlJfhyZQH8cAMNjUPeKBfQn6HMdiqGZ2xJicA==", + "ProviderResults": [{ + "ContextID": "ZnJpc2JpaQ==", + "Metadata": "oBIA", + "Provider": { + "ID": "12D3KooWC8gXxg9LoJ9h3hy3jzBkEAxamyHEQJKtRmAuBuvoMzpr", + "Addrs": [ + "/dns/frisbii.fly.dev/tcp/443/https" + ] + } + }] + }] +} + +``` + +### Retrieval Requirements + +1. Whenever a deal is activated, the SP MUST advertise all IPFS/IPLD payload block CIDs found in the Piece to IPNI. See the [IPNI Specification](https://github.com/ipni/specs/blob/main/IPNI.md) and [IPNI HTTP Provider](https://github.com/ipni/specs/blob/main/IPNI_HTTP_PROVIDER.md) for technical details. + +2. Whenever SP stops storing a Piece (e.g. because the last deal for the Piece has expired or was slashed), the SP SHOULD advertise removal of all payload block CIDs included in this Piece. + +3. The SP MUST provide retrieval of the IPFS/IPLD payload blocks via the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/). + +### Retrieval Checking Requirements + +In addition to the above [retrieval requirements](#retrieval-requirements), SPs are asked to meet the following: + +#### Link on-chain MinerId and IPNI provider identity + +Storage providers are requires to use the same libp2p peer ID for their block-chain identity as returned by `Filecoin.StateMinerInfo` and for the index provider identity used when communicating with IPNI instances like [cid.contact](https://cid.contact). + +In particular, the value in the IPNI CID query response field `MultihashResults[].ProviderResults[].Provider.ID` must match the value of the `StateMinerInfo` response field `PeerId`. + +> [!NOTE] +> This is open to extensions in the future, we can support more than one form of linking +index-provides to filecoin-miners. See e.g. [ipni/spec#33](https://github.com/ipni/specs/issues/33). + +**Example: miner `f01611097`** + +MinerInfo state: + +```json5 +{ + // (...) + "PeerId": "12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys", + // (...) +} +``` + +IPNI provider status ([query](https://cid.contact/providers/12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys)): + +```json5 +{ + // (...) + "Publisher": { + "ID": "12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys", + "Addrs": [ + "/ip4/76.219.232.45/tcp/24887/http" + ] + }, + // (...) +} +``` + +Example CID query response for IPFS/IPLD payload block stored by this miner ([query](https://cid.contact/cid/bafyreiat37cquzhzjf7bzgkap4oabq3dkd3yubpue7uhghmkugm5wetcoa)): + +```json5 +{ + "MultihashResults": [ + { + "Multihash": "EiAT38UKZPlJfhyZQH8cAMNjUPeKBfQn6HMdiqGZ2xJicA==", + "ProviderResults": [ + // (...) + { + "ContextID": "AXESIFVcxmAvWdc3BbQUKlYcp2Z2DuO2w5Fo4jmIC8IbMX00", + "Metadata": "oBIA", + "Provider": { + "ID": "12D3KooWPNbkEgjdBNeaCGpsgCrPRETe4uBZf1ShFXStobdN18ys", + "Addrs": [ + "/dns/cesginc.com/tcp/443/https" + ] + } + } + ] + } + ] +} +``` + +#### Construct IPNI `ContextID` from `(PieceCID, PieceSize)` + +The advertisements for IPNI must deterministically construct the `ContextID` field from the public deal metadata - the tuple `(PieceCID, PieceSize)` - as follows: + +- Use DAG-CBOR encoding ([DAG-CBOR spec](https://ipld.io/specs/codecs/dag-cbor/spec/)) +- The piece information is serialised as an array with two items: + 1. The first item is the piece size represented as `uint64` + 2. The second item is the piece CID represented as a custom tag `42` +- In places where the ContextID is represented as a string, convert the CBOR bytes to string using the hex encoding. + _Note: the Go module https://github.com/ipni/go-libipni handles this conversion automatically._ + +A reference implementation of this serialization algorithm in Go is maintained in [https://github.com/filecoin-project/go-state-types/](https://github.com/filecoin-project/go-state-types/blob/32f613e4d4450b09da3c81982dd6d7dba9c6f6f2/abi/cbor_gen.go#L23-L48). + +**Example** + +Input: + +```json5 +{ + "PieceCID": "baga6ea4seaqpyzrxp423g6akmu3i2dnd7ymgf37z7m3nwhkbntt3stbocbroqdq", + "PieceSize": 34359738368 // 32 GiB +} +``` + +Output - ContextID (hex-encoded, split into two lines for readability): + +``` +821B0000000800000000D82A5828000181E203922020FC66377F35 +B3780A65368D0DA3FE1862EFF9FB36DB1D416CE7B94C2E1062E80E +``` + +Annotated version as produced by https://cbor.me: + +``` +82 # array(2) + 1B 0000000800000000 # unsigned(34359738368) + D8 2A # tag(42) + 58 28 # bytes(40) + 000181E203922020FC66377F35B3780A65368D0DA3FE1862EFF9FB36DB1D416CE7B94C2E1062E80E +``` + +## Design Rationale + + +**_TBD_** + +## Backwards Compatibility + + +[Retrieval Requirements](#retrieval-requirements) document the current status minus Graphsync and Bitswap protocols. + +[Retrieval Checking Requirements](#retrieval-checking-requirements) introduce the following breaking changes: miner software must construct IPNI `ContextID` values in a specific way. Because ContextIDs are scoped per piece (not per deal), miner software must de-duplicate advertisements for deals storing the same piece. + +## Test Cases + + + +Not applicable, but see the examples in [Specification](#specification). + +## Security Considerations + + +_TODO: add more details._ + +We trust SPs to honestly advertise Piece payload blocks to IPNI. Attack vector: a malicious SP can always advertise the same payload block for all pieces persisted. + +Free-rider problem when a piece is stored with more than one SP. +Attack vector: When a piece is stored with SP1 and SP2, then SP1 can advertise retrievals with metadata pointing to SP2's multiaddr. + +## Incentive Considerations + + +_TBD_ + +## Product Considerations + + +_TBD_ + +## Implementation + + +_TBD_ + +## TODO + + +_TBD_ + +## Copyright +Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From 85ee7bf22bded949a95908be2d41aab9cd565ca4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Wed, 4 Dec 2024 14:58:41 +0100 Subject: [PATCH 2/8] README: add a link to the new RFC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index f9077d2c..6009ffe0 100644 --- a/README.md +++ b/README.md @@ -134,3 +134,4 @@ This improvement protocol helps achieve that objective for all members of the Fi | [0096](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0096.md) | Convert fundraising remainder address(es) to keyless account actor(s) | FIP | @Fatman13 | Draft | | [0097](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0097.md) | Add Support for EIP-1153 (Transient Storage) in the FEVM | FIP | Michael Seiler (@snissn), Steven Allen (@stebalien) | Draft | | [0098](https://github.com/filecoin-project/FIPs/blob/master/FIPS/fip-0098.md) | Simplify termination fee calculation to a fixed percentage of initial pledge | FIP | Jonathan Schwartz (@Schwartz10), Alex North (@anorth), Jim Pick (@jimpick) | Draft | +| [XXXX](https://github.com/filecoin-project/FIPs/blob/master/FRCs/frc-retrieval-checking-requirements.md) | Retrieval Checking Requirements | FRC | Miroslav Bajtoš (@bajtos) | Draft | From 483fa94a7efd8f5c723ca188612f188fbbc6d3b5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Fri, 6 Dec 2024 10:17:19 +0100 Subject: [PATCH 3/8] improve compatibility section, add incentives MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- FRCs/frc-retrieval-checking-requirements.md | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/FRCs/frc-retrieval-checking-requirements.md b/FRCs/frc-retrieval-checking-requirements.md index 9fb18c07..1b914107 100644 --- a/FRCs/frc-retrieval-checking-requirements.md +++ b/FRCs/frc-retrieval-checking-requirements.md @@ -215,9 +215,17 @@ Annotated version as produced by https://cbor.me: ## Backwards Compatibility -[Retrieval Requirements](#retrieval-requirements) document the current status minus Graphsync and Bitswap protocols. +[Retrieval Requirements](#retrieval-requirements) document the current status and remove Graphsync and Bitswap protocols. Existing miner operations need to enable/configure [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/) retrievals to meet the new requirements. -[Retrieval Checking Requirements](#retrieval-checking-requirements) introduce the following breaking changes: miner software must construct IPNI `ContextID` values in a specific way. Because ContextIDs are scoped per piece (not per deal), miner software must de-duplicate advertisements for deals storing the same piece. +|Miner Software|Supports HTTP retrievals|Notes +|-|:-:|-| +|Boost|✅| Manual setup required: [docs](https://boost.filecoin.io/retrieving-data-from-filecoin/http-retrieval#payload-retrievals-car-and-raw). +|Curio|✅| ? +|Venus Droplet| ? | ? + +[Retrieval Checking Requirements](#retrieval-checking-requirements) introduce the following breaking changes: +- Miner software must construct IPNI `ContextID` values in a specific way. +- Because such ContextIDs are scoped per piece (not per deal), miner software must de-duplicate advertisements for deals storing the same piece. ## Test Cases @@ -238,7 +246,11 @@ Attack vector: When a piece is stored with SP1 and SP2, then SP1 can advertise r ## Incentive Considerations -_TBD_ +Reliable retrieval (data availability) is a necessary condition for Filecoin to reach product-market fit. We need tools to measure and report service-level indicators related to data availability (retrieval success rate, time to first byte, and so on) to allow storage deal clients understand the quality of service offered by different SPs (and Filecoin in general). + +The data produced by retrieval checker networks like Spark can be integrated into existing and new incentive mechanisms like FIL+, Filecoin Web Services, paid storage deals, and more. + +In mid-2024, the FIL+ allocator compliance process started to require SPs to meet a certain threshold in Spark Retrieval Success Rate score. Since then, we have seen a steady increase in the retrieval success rate as measured by Spark. In May 2024, less than 2% of retrieval requests performed by the Spark network succeeded. In early December 2024, more than 15% retrievals succeeded. The number of SPs that are serving payload retrievals has increased from 60 in June 2024 to more than 200 in early December 2024. ## Product Considerations From fe10040057235a1ecd2b467b4ace9c9ee5a0ffb8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Wed, 11 Dec 2024 09:39:50 +0100 Subject: [PATCH 4/8] finish the document MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- FRCs/frc-retrieval-checking-requirements.md | 81 +++++++++++++++++++-- 1 file changed, 74 insertions(+), 7 deletions(-) diff --git a/FRCs/frc-retrieval-checking-requirements.md b/FRCs/frc-retrieval-checking-requirements.md index 1b914107..15736565 100644 --- a/FRCs/frc-retrieval-checking-requirements.md +++ b/FRCs/frc-retrieval-checking-requirements.md @@ -21,7 +21,7 @@ created: 2024-12-02 -In order to make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure, and therefore, we need to measure quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements: +To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. To allow 3rd-party networks like [Spark](https://filspark.com) to sample active deals from the on-chain activity and check whether the SP is serving retrievals for the stored content, we need SPs to meet the following requirements: 1. Link on-chain MinerId and IPNI provider identity ([spec](#link-on-chain-minerid-and-ipni-provider-identity)). 2. Provide retrieval service using the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/). @@ -210,7 +210,52 @@ Annotated version as produced by https://cbor.me: ## Design Rationale -**_TBD_** +The major challenge of sampling data stored on Filecoin is how to discover the payload CIDs. By +convention, clients making StorageMarket (`f05`) deals should put the payload root CID into +`DealProposal.Label` field. After the introduction of direct data onboarding that's optimized for +low gas fees, there is no longer such a field. + +Secondly, the information about the root CID is not sufficient. It allows retrieval checkers have to either +fetch the root IPLD block only or the entire content. Downloading the entire content is impractical +for light clients and puts too much load on the storage/retrieval provider. We want checkers to sample the data, not perform stress testing of SPs. + +To meet the above requirements, we need a solution for sampling from payload blocks in a given deal. + +The natural option is to scan the piece bytes to find individual CARv1 blocks and extract payload +block CIDs. This requires the checker node to fetch the piece and implement a CAR scanner. Storage +providers are already scanning pieces for payload CIDs to be advertised to IPNI, running another +scan in every retrieval checker node is redundant and wasteful. It also makes it more involved to +implement alternate retrieval checker networks. + +IPNI already contains a list of payload CIDs contained in every advertised Filecoin deal, therefore +we propose leveraging this existing infrastructure. + +- Storage Providers keep advertising payload CIDs to IPNI as they do now. +- IPNI implements reverse index lookup allowing retrieval checkers to sample CIDs in a given deal. +- Retrieval checkers can easily get a sample of payload CIDs and then retrieve only the selected CID(s). +- Retrieval checking does not add any extra network bandwidth overhead to SPs beyond the actual retrieval. + +### Alternatives considered + +Assuming there is a limit on the size of any CAR block in a Piece, it's possible to sample one payload block using the following algorithm: + +1. Let's assume the maximum CAR block size is 4 MB and we have deal's `PieceCID` and `PieceSize`. +2. Pick a random offset $o$ in the piece so that $0 <= o <= PieceSize - 2*4 MB$. +3. Send an HTTP range-retrieval request to retrieve the bytes in the range`(o, o+2*4MB)`. +4. Parse the bytes to find a sequence that looks like a CARv1 block header. +5. Extract the CID from the CARv1 block header. +6. Hash the block's payload bytes and verify that the digest equals to the CID. + +The reasons why we rejected this approach: + + 1. It's inefficient. + + 1. Each retrieval check requires two requests - one to download ~8MB chunk of a piece, the second one to download the payload block found in that chunk. + + 1. Spark typically repeats every retrieval check 40-100 times. Scanning CAR byte range 40-100 times does not bring enough value to justify the network bandwidth & CPU cost. + + 1. It's not clear how can retrieval checkers discover the address where the SP serves piece retrievals. + ## Backwards Compatibility @@ -220,8 +265,8 @@ Annotated version as produced by https://cbor.me: |Miner Software|Supports HTTP retrievals|Notes |-|:-:|-| |Boost|✅| Manual setup required: [docs](https://boost.filecoin.io/retrieving-data-from-filecoin/http-retrieval#payload-retrievals-car-and-raw). -|Curio|✅| ? -|Venus Droplet| ? | ? +|Curio|✅| TODO: OOTB or manual setup? +|Venus Droplet| ? | TODO: OOTB or manual setup? [Retrieval Checking Requirements](#retrieval-checking-requirements) introduce the following breaking changes: - Miner software must construct IPNI `ContextID` values in a specific way. @@ -255,17 +300,39 @@ In mid-2024, the FIL+ allocator compliance process started to require SPs to mee ## Product Considerations -_TBD_ +To make Filecoin a usable data storage offering, we need the content to be retrievable. It's difficult to improve what you don't measure; therefore, we need to measure the quality of retrieval service provided by each storage provider. + +This FRC enables retrieval checks based on payload sampling with support for all kinds of storage deals (f05, direct data onboarding, etc.). + +The service-level indicators produced by retrieval checker networks can be integrated into incentive mechanisms like FIL+ or paid storage deals to drive improvements in the availability and reliability of retrieval service offered by individual SPs and Filecoin as a whole. + ## Implementation -_TBD_ +### Storage Provider Software + +|Requirement|Boost|Curio|Venus +|-|:-:|:-:|:-: +|Advertises payload retrieval to IPNI|✅|✅|✅ +|Trustless HTTP GW retrievals|✅|✅|?| +|Link on-chain MinerId and IPNI provider identity|✅|❌|✅ +|Construct IPNI ContextID from (PieceCID, PieceSize)|❌|✅|❌ + +### IPNI Reverse Index + +Status: design phase + +### Spark Retrieval Checkers + +Status: not started yet ## TODO -_TBD_ +How do we want to mitigate the following attack vectors? +- We trust SPs to honestly advertise Piece payload blocks to IPNI. +- Free-rider problem when a piece is stored with more than one SP. ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From f758e5b0cc5f778ec0fcd8d6d4def7bb4fbd93c7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Wed, 11 Dec 2024 09:47:06 +0100 Subject: [PATCH 5/8] add link to Spark v2 milestone issue MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- FRCs/frc-retrieval-checking-requirements.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/FRCs/frc-retrieval-checking-requirements.md b/FRCs/frc-retrieval-checking-requirements.md index 15736565..c6619cb9 100644 --- a/FRCs/frc-retrieval-checking-requirements.md +++ b/FRCs/frc-retrieval-checking-requirements.md @@ -325,7 +325,8 @@ Status: design phase ### Spark Retrieval Checkers -Status: not started yet +- Status: design phase +- Progress tracking: https://github.com/space-meridian/roadmap/issues/115 ## TODO From 71ed7d401abcedde4b69394dda260731d247938c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Thu, 12 Dec 2024 17:23:50 +0100 Subject: [PATCH 6/8] updates MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- FRCs/frc-retrieval-checking-requirements.md | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/FRCs/frc-retrieval-checking-requirements.md b/FRCs/frc-retrieval-checking-requirements.md index c6619cb9..9b9142c3 100644 --- a/FRCs/frc-retrieval-checking-requirements.md +++ b/FRCs/frc-retrieval-checking-requirements.md @@ -265,7 +265,7 @@ The reasons why we rejected this approach: |Miner Software|Supports HTTP retrievals|Notes |-|:-:|-| |Boost|✅| Manual setup required: [docs](https://boost.filecoin.io/retrieving-data-from-filecoin/http-retrieval#payload-retrievals-car-and-raw). -|Curio|✅| TODO: OOTB or manual setup? +|Curio|✅| Works out of the box |Venus Droplet| ? | TODO: OOTB or manual setup? [Retrieval Checking Requirements](#retrieval-checking-requirements) introduce the following breaking changes: @@ -281,13 +281,23 @@ Not applicable, but see the examples in [Specification](#specification). ## Security Considerations -_TODO: add more details._ + We trust SPs to honestly advertise Piece payload blocks to IPNI. Attack vector: a malicious SP can always advertise the same payload block for all pieces persisted. +TODO: describe our plan to mitigate this risk. + Free-rider problem when a piece is stored with more than one SP. Attack vector: When a piece is stored with SP1 and SP2, then SP1 can advertise retrievals with metadata pointing to SP2's multiaddr. +We don't view this as a problem. Spark is testing that the provider is able to serve the content +from a deal on behalf of the network. IPFS and Filecoin is based on content addressing, which is +about the network’s ability to serve content, not about the ability to fetch it from a specific +location. However, clients need to know which node to at least ask for the hot copy. This is what we +can get from IPNI. What's more, this fact leaves space for SPs to try to save costs on hot storage - +they can cooperate with other SPs to guarantee that at least one hot copy is available nearby that +can be served back to the client. + ## Incentive Considerations @@ -321,7 +331,8 @@ The service-level indicators produced by retrieval checker networks can be integ ### IPNI Reverse Index -Status: design phase +- Status: design phase +- Progress tracking: https://github.com/ipni/roadmap/issues/1 ### Spark Retrieval Checkers @@ -331,9 +342,8 @@ Status: design phase ## TODO -How do we want to mitigate the following attack vectors? +How do we want to mitigate the following attack vector(s): - We trust SPs to honestly advertise Piece payload blocks to IPNI. -- Free-rider problem when a piece is stored with more than one SP. ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/). From 3e9ecd9cb3aca9074d9ca4a1d7eef3f85b9ceda8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Thu, 12 Dec 2024 17:36:53 +0100 Subject: [PATCH 7/8] add IPNI support to compat table MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- FRCs/frc-retrieval-checking-requirements.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/FRCs/frc-retrieval-checking-requirements.md b/FRCs/frc-retrieval-checking-requirements.md index 9b9142c3..aaeb0054 100644 --- a/FRCs/frc-retrieval-checking-requirements.md +++ b/FRCs/frc-retrieval-checking-requirements.md @@ -262,11 +262,11 @@ The reasons why we rejected this approach: [Retrieval Requirements](#retrieval-requirements) document the current status and remove Graphsync and Bitswap protocols. Existing miner operations need to enable/configure [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/) retrievals to meet the new requirements. -|Miner Software|Supports HTTP retrievals|Notes -|-|:-:|-| -|Boost|✅| Manual setup required: [docs](https://boost.filecoin.io/retrieving-data-from-filecoin/http-retrieval#payload-retrievals-car-and-raw). -|Curio|✅| Works out of the box -|Venus Droplet| ? | TODO: OOTB or manual setup? +|Miner Software|Advertises payload to IPNI|Supports HTTP retrievals|Notes +|-|:-:|:-:|-| +|Boost|✅|✅| Manual setup required: [docs](https://boost.filecoin.io/retrieving-data-from-filecoin/http-retrieval#payload-retrievals-car-and-raw). +|Curio|✅|✅| Works out of the box +|Venus Droplet| ✅ | ? | TODO: OOTB or manual setup? [Retrieval Checking Requirements](#retrieval-checking-requirements) introduce the following breaking changes: - Miner software must construct IPNI `ContextID` values in a specific way. From a91383ef18da97b65092f922027e24bd3ce3b877 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Miroslav=20Bajto=C5=A1?= Date: Wed, 18 Dec 2024 13:54:13 +0100 Subject: [PATCH 8/8] add mitigation for SPs not advertising all blocks MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Miroslav Bajtoš --- FRCs/frc-retrieval-checking-requirements.md | 62 +++++++++++++++++++-- 1 file changed, 57 insertions(+), 5 deletions(-) diff --git a/FRCs/frc-retrieval-checking-requirements.md b/FRCs/frc-retrieval-checking-requirements.md index aaeb0054..c3661009 100644 --- a/FRCs/frc-retrieval-checking-requirements.md +++ b/FRCs/frc-retrieval-checking-requirements.md @@ -89,7 +89,6 @@ Example IPNI `ProviderResult` describing retrieval provider offering IPFS Trustl }] }] } - ``` ### Retrieval Requirements @@ -100,6 +99,24 @@ Example IPNI `ProviderResult` describing retrieval provider offering IPFS Trustl 3. The SP MUST provide retrieval of the IPFS/IPLD payload blocks via the [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/). + +### Retrieval Checking Process + +The on-chain state does not provide any information about payload stored in a deal. Observers like +retrieval checking networks can see only miner (provider) ID, `PieceCID` and `PieceSize`. Retrieval +checkers need a way to sample payload block in a given deal. In this FRC, we assume the following algorithm leveraging IPNI: + +1. Start with a tuple `(MinerID, PieceCID, PieceSize)` identifying an active storage deal (sector). +2. Map `MinerID` to IPNI index `ProviderID`. +3. Map `(PieceCID, PieceSize)` to IPNI `ContextID` value. +4. Query IPNI reverse index for a sample of payload blocks advertised by `ProviderID` with +`ContextID` (see the [proposed API +spec](https://github.com/ipni/xedni/blob/526f90f5a6001cb50b52e6376f8877163f8018af/openapi.yaml)). +5. Pick a payload block CID to retrieve. +6. Query IPNI to obtain retrieval providers for the selected payload CID. +7. Find the record advertised by the SP under test using the `Provider.ID` field. +8. Request the payload CID at the advertised address. + ### Retrieval Checking Requirements In addition to the above [retrieval requirements](#retrieval-requirements), SPs are asked to meet the following: @@ -281,11 +298,47 @@ Not applicable, but see the examples in [Specification](#specification). ## Security Considerations - +(1) We trust SPs to honestly advertise Piece payload blocks to IPNI. Attack vector: a malicious SP can always advertise the same payload block for all pieces persisted. -TODO: describe our plan to mitigate this risk. +We see this as a broader problem with the lack of trust & verifiability in the IPNI architecture. +How can the network verify that parties storing content (Filecoin SPs, IPFS nodes) are honestly advertising +all content? + +To flag Filecoin SPs that are not advertising all deal payload, we can build a small network of validator nodes performing the following algorithm: + +1. Let's assume the maximum CAR block size is 4 MB and we have deal's `PieceCID` and `PieceSize`. + + _(E.g. validators can walk IPNI advertisement chains and extract piece information by parsing `ContextID`.)_ + +2. Pick a random offset $o$ in the piece so that $0 <= o <= PieceSize - (2*4 MB)$. + +3. Send an HTTP range-retrieval request to retrieve the bytes in the range $[o, o+(2*4MB)]$. + + _We need the server to return an inclusion proof up to the PieceCID root._ + +4. Parse the bytes to find a sequence that looks like a CARv1 block header. + +5. Extract the CID from the CARv1 block header. + +6. Hash the block's payload bytes and verify that the digest equals to the CID. + +7. Verify that the discovered payload CID was advertised to IPNI. + +8. If the payload CID was not advertised, then submit a report flagging the SP and include the following information: + - Provider identification, PieceCID, PieceSize, + - Byte range requested + - Server response (the inclusion proof) + - Offset of the CARv1 block header and the payload CID discovered + +9. Collect and aggregate these records to produce a reputation score for each SP. + +10. Implement incentives penalizing non-compliant SPs. In the extreme, IPNI can even de-list +non-compliant index providers. + + +(2) Free-rider problem when a piece is stored with more than one SP. Attack vector: When a piece is stored with SP1 and SP2, then SP1 can advertise retrievals with metadata pointing to SP2's multiaddr. @@ -342,8 +395,7 @@ The service-level indicators produced by retrieval checker networks can be integ ## TODO -How do we want to mitigate the following attack vector(s): -- We trust SPs to honestly advertise Piece payload blocks to IPNI. +- Find out whether Venus Droplet supports retrieval using [IPFS Trustless HTTP Gateway protocol](https://specs.ipfs.tech/http-gateways/trustless-gateway/). ## Copyright Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).