IPIP-0412: Signaling Block Order in CARs on HTTP Gateways

Related Issues
ipfs/specs/issues/348
ipfs/specs/pull/330
ipfs/specs/pull/402
ipfs/specs/pull/412
History
Commit History
Feedback
GitHub ipfs/specs (inspect source, open issue)

1. Summary

Adds support for additional, optional content type options that allow the client and server to signal or negotiate a specific block order in the returned CAR.

2. Motivation

We want to make it easier to build light-clients for IPFS. We want them to have low memory footprints on arbitrary sized files. The main pain point preventing this is the fact that CAR ordering isn't specified.

This requires keeping some kind of reference either on disk, or in memory to previously seen blocks for two reasons.

  1. Blocks can arrive out of order, meaning when a block is consumed (data is read and returned to the consumer) and when it's received might not match.

  2. Blocks can be reused multiple times, this is handy for cases when you plan to cache on disk but not at all when you want to process a stream with use & forget policy.

What we really want is for the gateway to help us a bit, and give us blocks in a useful order.

The existing Trustless Gateway specification does not provide a mechanism for negotiating the order of blocks in CAR responses.

This IPIP aims to improve the status quo.

3. Detailed design

CAR content type (application/vnd.ipld.car) already supports version parameter, which allows gateway to indicate which CAR flavor is returned with the response.

The proposed solution introduces two new parameters for the content type headers in HTTP requests and responses: order and dups.

The order parameter allows the client to indicate its preference for a specific block order in the CAR response, and the dups parameter specifies whether duplicate blocks are allowed in the response.

A Client SHOULD send Accept HTTP header to leverage content type negotiation based on section 12.5.1 of [rfc9110] to get the preferred response type.

More details in Section 5. (CAR Responses) of [trustless-gateway].

4. Design rationale

The proposed specification change aims to address the limitations of the existing Trustless Gateway specification by introducing a mechanism for negotiating the block order in CAR responses.

By allowing clients to indicate their preferred block order, Trustless Gateways can cache CAR responses for popular content, resulting in improved performance and reduced network load. Clients benefit from more efficient data handling by deserializing blocks as they arrive,

We reuse exiting HTTP content type negotiation, and the CAR content type, which already had the optional version parameter.

4.1 User benefit

The proposed specification change brings several benefits to end users:

  1. Improved Performance: Gateways can decide on their implicit default ordering and cache CAR responses for popular content. In turn, clients can benefit from strong Etag in ordered (deterministic) responses. This reduces the response time for subsequent requests, resulting in faster content retrieval for users.

  2. Reduced Memory Usage: Clients no longer need to buffer the entire CAR response in memory until the deserialization of the requested entity is finished. With the ability to deserialize blocks as they arrive, users can conserve memory resources, especially when dealing with large CAR responses.

  3. Efficient Data Handling: By discarding blocks as soon as the CID is validated and data is deserialized, clients can efficiently process the data in real-time. This is particularly useful for light clients, IoT devices, mobile web browsers, and other streaming applications where immediate access to the data is required.

  4. Customizable Ordering: Clients can indicate their preferred block order in the Accept header, allowing them to prioritize specific ordering strategies that align with their use cases. This flexibility enhances the user experience and empowers users to optimize content retrieval according to their needs.

4.2 Compatibility

The proposed specification change is backward compatible with existing client and server implementations.

Trustless Gateways that do not support the negotiation of block order in CAR responses will continue to function as before, providing their existing default behavior, and the clients will be able to detect it by inspecting the Content-Type header present in HTTP response.

Clients that do not send the Accept header or do not recognize the order and dups parameters in the Content-Type header will receive and process CAR responses as they did before: buffering/caching all blocks until done with the final deserialization.

Existing implementations can choose to adopt the new specification and implement support for the negotiation of block order incrementally. This allows for a smooth transition and ensures compatibility with both new and old clients.

4.3 Security

The proposed specification change does not introduce any negative security implications beyond those already present in the existing Trustless Gateway specification. It focuses on enhancing performance and data handling without affecting the underlying security model of IPFS.

Light clients with support for order and dups CAR content type parameters will be able to detect malicious response faster, reducing risks of memory-based DoS attacks from malicious gateways.

4.4 Alternatives

Several alternative approaches were considered before arriving at the proposed solution:

  1. Implicit Server-Side Configuration: Instead of negotiating the block order, in the CAR response, the Trustless Gateway could have a server-side configuration that specifies the default order. However, this approach would limit the flexibility for clients, requiring them to have prior knowledge about order supported by each gateway.

  2. Fixed Block Order: Another option was to enforce a fixed block order in the CAR responses. However, this approach would not cater to the varying needs and preferences of different clients and use cases, and is not backward compatible with the existing Trustless Gateways which return CAR responses with Weak Etag and unspecified block order.

  3. Separate X- HTTP Header: Introduction of a separate HTTP reader was rejected because we try to use HTTP semantics where possible, and gateways already use HTTP content type negotiation for CAR version and reusing it saves a few bytes in each round-trip. Also, [rfc6648] advises against use of X- and similar constructs in new protocols.

  4. The decision to not implement a single preset pack with predefined behavior, instead of separate parameters for order and duplicates (dups), was driven by considerations of ambiguity and potential future problems when adding more determinism to responses. For instance, if we were to include a new behavior like foo=y|n alongside an existing preset like pack=orderdfs+dupsy, it would either necessitate the addition of a separate parameter or impose the adoption of a new version of every preset (e.g., orderdfs-dupsy+fooy and orderdfs+dupsy+foon). Maintaining and deploying such changes across a decentralized ecosystem, where gateways may operate on different software, becomes more complex. In contrast, utilizing separate parameters for each behavior enables easier maintenance and deployment in a decentralized ecosystem with varying gateway software.

The proposed solution of negotiating the block order through headers is future-proof, allows for flexibility, interoperability, and customization while maintaining compatibility with existing implementations.

5. Test fixtures

Implementation compliance can be determined by testing the negotiation process between clients and Trustless Gateways using various combinations of order and dups parameters.

Relevant tests were added to gateway-conformance test suite in #87, and include the below fixture.

A. References

[rfc2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119
[rfc6648]
Deprecating the "X-" Prefix and Similar Constructs in Application Protocols. P. Saint-Andre; D. Crocker; M. Nottingham. IETF. June 2012. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc6648
[rfc9110]
HTTP Semantics. R. Fielding, Ed.; M. Nottingham, Ed.; J. Reschke, Ed.. IETF. June 2022. Internet Standard. URL: https://httpwg.org/specs/rfc9110.html
[trustless-gateway]
Trustless Gateway Specification. Marcin Rataj; Henrique Dias. 2024-04-17. URL: https://specs.ipfs.tech/http-gateways/trustless-gateway/

B. Acknowledgments

We gratefully acknowledge the following individuals for their valuable contributions, ranging from minor suggestions to major insights, which have shaped and improved this specification.

Editors
Marcin Rataj (Protocol Labs) GitHub
Jorropo (Protocol Labs) GitHub