IPIP-0402: Partial CAR Support on Trustless Gateways

Related Issues
ipfs/specs/issues/348
ipfs/kubo/issues/8769
History
Commit History
Feedback
GitHub ipfs/specs (inspect source, open issue)

1. Summary

Add path and partial CAR response support to the [trustless-gateway].

2. Motivation

[trustless-gateway] solves the verifiability problem in HTTP contexts, and allows for inexpensive retrieval and caching in a way that is not tied to a specific service, library or IPFS implementation.

This IPIP improves the performance related to trustless HTTP gateways by adding additional capabilities to requests for CAR files.

The goal is to enable a client capable of translating/decoding CAR files to make a single request to a trustless gateway that in most case allows them to render the same output generated via a request to a trusted gateway (or, if not in a single request, as few requests as possible).

Save round-trips, allow more efficient resume and parallel downloads.

3. Detailed design

The solution is to allow the [trustless-gateway] to support partial responses by:

Details are in [trustless-gateway].

4. Design rationale

The approach here is pragmatic: we add a minimum set of predefined CAR export scopes based on real world needs, as a product of the rough consensus across key stakeholders: gateway operators, Project Saturn, Project Rhea (ipfs.io and dweb.link), light clients such as Capyloon and IPFS in Chromium, and gateway implementation in boxo/gateway library (Go).

Terse rationale for each feature:

4.1 User benefit

4.2 Compatibility

4.2.1 CAR responses with blocks for parent path segments

In order to serve CAR requests for content paths other than just a CID root in a trustless manner, we are requiring the gateway to return intermediate blocks from the CID root to the path terminus as part of the returned CAR file.

HTTP Gateway implementations are currently only returning blocks starting at the end of the content path, which means an implementation of this IPIP will introduce additional blocks required for verifying.

As long the client was written in a trustless manner, and follows ring and was discarding unexpected blocks, this will be a backward-compatible change.

4.2.2 CAR format with entity-bytes and dag-scope parameters

These parameters are opt-in, which means no breaking changes.

Gateways ignore unknown URL parameters. A client sending them to a gateway that does not implement this IPIP will get all blocks for the requested DAG.

4.2.3 CAR roots and determinism

As of 2023-06-20, the behavior of the roots CAR field remains an unresolved item within the CARv1 specification:

Regarding the roots property of the Header block:

  • The current Go implementation assumes at least one CID when creating a CAR
  • The current Go implementation requires at least one CID when reading a CAR
  • The current JavaScript implementation allows for zero or more roots
  • Current usage of the CAR format in Filecoin requires exactly one CID

[..]

It is unresolved how the roots array should be constrained. It is recommended that only a single root CID be used in this version of the CAR format.

A work-around for use-cases where the inclusion of a root CID is difficult but needing to be safely within the "at least one" recommendation is to use an empty CID: \x01\x55\x00\x00 (zero-length "identity" multihash with "raw" codec). Since current implementations for this version of the CAR specification don't check for the existence of root CIDs (see Root CID block existence), this will be safe as far as CAR implementations are concerned. However, there is no guarantee that applications that use CAR files will correctly consume (ignore) this empty root CID.

Due to the inconsistent and non-deterministic nature of CAR implementations, the gateway specification faces limitations in providing specific recommendations. Nevertheless, it is crucial for implementations to refrain from making implicit assumptions based on the legacy behavior of individual CAR implementations.

Due to this, gateway specification changes introduced in this IPIP clarify that:

  • The CAR roots behavior is out of scope and flags that clients MAY ignore it.
  • CAR determinism is not present by default, responses may differ across requests and gateways.
  • Opt-in determinism is possible, but standardized signaling mechanism does not exist until we have IPIP-412 or similar.

4.3 Security

This IPIP allows clients to narrow down the amount of data returned as a CAR, and introduces a need for defensive programming when the feature set of the remote gateway is unknown.

To avoid denial of service, and resource starvation, clients should probe if the gateway supports features described in this IPIP before requesting data, to avoid fetching big DAGs when only a small subset of blocks is expected.

Following the robustness principle, invalid, duplicate or unexpected blocks should be discarded.

4.4 Alternatives

Below are alternate designs that were considered, but decided to be out of scope for this IPIP.

4.4.1 Arbitrary IPLD Selectors

Passing arbitrary selectors was rejected due to the implementation complexity, risks, and weak value proposition, as discussed during IPFS Thing 2022

4.4.2 Additional "Web" Scope

A request for /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki/?format=car&dag-scope=entity returns all blocks required for enumeration of the big HAMT /wiki directory, and then an additional request for index.html needs to be issued.

Website hosting use case could be made more efficient if gateway returned a CAR with index.html instead of all blocks for directory enumeration. The server already did the work: it knows the entity is a directory, already parsed it, it knows it has child entity named index.html, and everyone would pay a lower cost due to lower number of blocks being returned in a single round-trip, instead of two.

Rhea/Saturn projects requested this to be out of scope for now, but this "web" entity scope could be added in the future, as a follow-up optimization IPIP.

4.4.3 Requesting specific DAG depth

Blindly requesting specific DAG depth did not translate to any type of requests web gateways like ipfs.io or one in Brave browser have to handle.

It is impossible to know if some entity on a sub-path is a file or a directory, without sending a probe for the root block, which introduces one round-trip overhead per entity.

This problem is not present in the case of dag-scope=entity, which shifts the decision to the server, and allows for fetching unknown UnixFS entity with a single request.

5. Test fixtures

Relevant tests were added to gateway-conformance test suite in #56 and #85. A detailed list of compliance checks for dag-scope and entity-bytes can be found in v0.2.0/trustless_gateway_car_test.go or later.

Below are CIDs, CARs, and short summary of each fixture.

5.1 Testing pathing

The main utility of this scope is saving round-trips when retrieving a specific entity as a member of a bigger DAG.

To test, request a small file that fits in a single block from a sub-path. The returned CAR MUST include both the block with the file data and all blocks necessary for traversing from the root CID to the terminating element (all parents, root CID and a subdirectory below it).

Sample fixtures:

  • bafybeietjm63oynimmv5yyqay33nui4y4wx6u3peezwetxgiwvfmelutzu from subdir-with-two-single-block-files.car for testing /ipfs/dag-pb-cid/subdir/ascii.txt?format=car (UnixFS file in a subdirectory)

  • bafybeidbclfqleg2uojchspzd4bob56dqetqjsj27gy2cq3klkkgxtpn4i from single-layer-hamt-with-multi-block-files.car for testing /ipfs/dag-pb-hamt-cid/686.txt?format=car (UnixFS file on a path within HAMT-sharded parent directory)

  • bafybeia264q44a3kmfc2otctzu4egp2k235o3t7mslz2yjraymp4nv6asi from dir-with-dag-cbor-with-links.car for testing /ipfs/dag-cbor-cid/document?format=car (UnixFS file on a path with DAG-CBOR root CID)

5.2 Testing dag-scope=block

The main utility of this scope is resolving content paths. This means a CAR response with blocks related to path traversal, and the root block of the terminating entity.

To test real world use, request UnixFS file or a directory from a sub-path. The returned CAR MUST include blocks required for path traversal and ONLY the root block of the terminating entity.

Sample fixtures:

  • bafybeietjm63oynimmv5yyqay33nui4y4wx6u3peezwetxgiwvfmelutzu from subdir-with-two-single-block-files.car for testing

    • /ipfs/dag-pb-cid/subdir/ascii.txt?format=car&dag-scope=block (UnixFS file in a subdirectory)
    • /ipfs/dag-pb-cid?format=car&dag-scope=block (UnixFS directory)
  • bafybeidbclfqleg2uojchspzd4bob56dqetqjsj27gy2cq3klkkgxtpn4i from single-layer-hamt-with-multi-block-files.car for testing:

    • /ipfs/dag-pb-hamt-cid/1.txt?format=car&dag-scope=block (UnixFS multi-block file on a path within HAMT-sharded parent directory)
    • /ipfs/dag-pb-hamt-cid?format=car&dag-scope=block (UnixFS HAMT-sharded directory)

5.3 Testing dag-scope=entity

The main utility of this scope is retrieving all blocks related to a meaningful IPLD entity. Currently, the most popular entity types are:

Sample fixtures:

  • bafybeidh6k2vzukelqtrjsmd4p52cpmltd2ufqrdtdg6yigi73in672fwu from subdir-with-mixed-block-files.car for testing:

    • /ipfs/dag-pb-cid/subdir/multiblock.txt?format=car&dag-scope=entity (UnixFS multi-block file in a subdirectory)
    • /ipfs/dag-pb-cid/subdir?format=car&dag-scope=entity (UnixFS directory)
  • bafybeidbclfqleg2uojchspzd4bob56dqetqjsj27gy2cq3klkkgxtpn4i from single-layer-hamt-with-multi-block-files.car for testing:

    • /ipfs/dag-pb-hamt-cid/1.txt?format=car&dag-scope=entity (UnixFS multi-block file on a path within HAMT-sharded parent directory, returned blocks MUST be enough to deserialize the file)
    • /ipfs/dag-pb-hamt-cid?format=car&dag-scope=entity (UnixFS HAMT-sharded directory, response MUST include the minimal set of blocks required for enumeration of directory contents, and no blocks that belong to child entities)
  • bafybeia264q44a3kmfc2otctzu4egp2k235o3t7mslz2yjraymp4nv6asi from dir-with-dag-cbor-with-links.car for testing /ipfs/dag-cbor-cid/document?format=car&dag-scope=entity (DAG-CBOR document with IPLD Links must return all necessary blocks to verify the path, the document itself, but not the content behind any of the child entity IPLD Links)

5.4 Testing dag-scope=all

This is the implicit default used when dag-scope is not present, and used in the context of deserialized UnixFS TAR responses from [ipip-0288].

Sample fixtures:

  • bafybeidh6k2vzukelqtrjsmd4p52cpmltd2ufqrdtdg6yigi73in672fwu from subdir-with-mixed-block-files.car for testing:
    • /ipfs/dag-pb-cid/subdir?format=car&dag-scope=all (path parents and the entire UnixFS subdirectory, returned recursively)
    • /ipfs/dag-pb-cid/subdir/multiblock.txt?format=car&dag-scope=all (path parents and the entire UnixFS multi-block file)

5.5 Testing entity-bytes=from:to

This type of CAR response is used for facilitating HTTP Range Requests and byte seek within bigger entities.

Properly testing this type of response requires synthetic DAG that is only partially retrievable. This ensures systems that perform internal caching won't pass the test due to the entire DAG being precached, or fetched in full.

Use of the below fixture is highly recommended:

  • bafybeidh6k2vzukelqtrjsmd4p52cpmltd2ufqrdtdg6yigi73in672fwu from subdir-with-mixed-block-files.car for testing:

    • /ipfs/dag-pb-cid/subdir/multiblock.txt?format=car&dag-scope=entity&entity-bytes=0:* (path blocks and all the blocks for the multi-block UnixFS file)
    • multiblock.txt?format=car&dag-scope=entity&entity-bytes=512:1023 (path blocks and all the blocks for the the range request within multi-block UnixFS file)
    • multiblock.txt?format=car&dag-scope=entity&entity-bytes=512:-256 (path blocks and all the blocks for the the range request within multi-block UnixFS file)
    • /ipfs/dag-pb-cid/subdir?format=car&dag-scope=entity&entity-bytes=0:* (path blocks and all the blocks to enumerate UnixFS directory)
  • QmYhmPjhFjYFyaoiuNzYv8WGavpSRDwdHWe5B4M5du5Rtk from file-3k-and-3-blocks-missing-block.car for testing:

    • /ipfs/dag-pb-cid?format=car&dag-scope=entity&entity-bytes=0:1000 (only the blocks needed to fulfill the request, MUST succeed despite the fact that a block after the range is not retrievable)
    • /ipfs/dag-pb-cid?format=car&dag-scope=entity&entity-bytes=2200:* (only the blocks needed to fulfill the request, MUST succeed despite the fact that a block before the range is not retrievable)

A. References

[ipip-0288]
IPIP-0288: TAR Response Format on HTTP Gateways. Henrique Dias; Marcin Rataj. 2022-06-10. URL: https://specs.ipfs.tech/ipips/ipip-0288/
[rfc2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119
[trustless-gateway]
Trustless Gateway Specification. Marcin Rataj; Henrique Dias. 2024-04-17. URL: https://specs.ipfs.tech/http-gateways/trustless-gateway/

B. Acknowledgments

We gratefully acknowledge the following individuals for their valuable contributions, ranging from minor suggestions to major insights, which have shaped and improved this specification.

Editors
Hannah Howard GitHub
Adin Schmahmann GitHub
Rod Vagg GitHub
Marcin Rataj GitHub