From 9e87679c39da003e9af3e57dbc1b504d3b4f0a3b Mon Sep 17 00:00:00 2001 From: Scott Reimers Date: Mon, 23 Mar 2026 16:50:40 -0400 Subject: [PATCH] Design doc: erasure-coded CDN replication section (planned) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds Section 18b documenting the planned erasure-coded shard layer for public post auto-replication. 3-of-10 scheme where CDN nodes hold sub-threshold shards that are mathematically unreconstructable alone. Re-replication via chunk-pull only — no shard ever reconstructs the full content. Connects to existing CDN tree, encryption, and ReplicationRequest infrastructure. Co-Authored-By: Claude Opus 4.6 (1M context) --- website/design.html | 52 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/website/design.html b/website/design.html index edfa8a1..8c4f6a5 100644 --- a/website/design.html +++ b/website/design.html @@ -93,6 +93,7 @@ 16. Keep-Alive Sessions 17. Content Propagation 18. Files & Storage + 18b. Erasure-Coded CDN Replication 19. Sync Protocol 20. Encryption 21. Delete Propagation @@ -953,6 +954,57 @@ FAILURE: C → B → A: AnchorProbeResult { reachable: false } + +
+

18b. Erasure-Coded CDN Replication Planned

+ +

Problem

+

The existing CDN hosting tree (Section 18) replicates full blob copies to downstream peers. This works well when the replicating node chose to pull the content — a follow relationship or explicit action establishes user consent. But the ReplicationRequest (0xE1) protocol also pushes content to infrastructure nodes that never chose to host it. A node holding a full copy of content it never reviewed faces potential liability for that content.

+

Encryption does not solve this for public posts: the content is plaintext by definition. A different mechanism is needed that makes it technically impossible for a CDN node to possess reconstructable content.

+ +

Approach: sub-threshold erasure shards

+

Instead of replicating full blobs, public post auto-replication distributes erasure-coded shards using a 3-of-10 scheme (k=3, n=10). Each shard contains 1/3 of the data. Reconstruction requires cooperation from any 3 of the 10 shard holders. A single shard is mathematically meaningless noise — not encrypted content where the full payload exists behind a key, but genuinely incomplete data that cannot be reconstructed alone.

+ +

Where sharding applies

+

The existing storage tiers each have their own liability story. Sharding only fills the gap for public auto-replication:

+ + + + + + +
TierStorageDefense
Author’s nodeFull copyPublisher responsibility (content originator)
Pulled content (follows)Full copyUser consent — explicit follow relationship
Private auto-replicationEncrypted (CEK envelope, Section 20)Replicating nodes are provably not keyring recipients — existing encryption architecture handles this
Public auto-replicationErasure-coded shardsSub-threshold shard — reconstruction impossible from any single holder
+ +

Shard assignment

+

Slot assignment is deterministic from the PostId via DHT-style hashing, carried in the existing BlobHeader metadata — no additional discovery round required. Each node enforces single-slot acceptance: it only accepts shard push offers for its assigned slot, rejecting others. This prevents a bad actor from accumulating multiple shards toward the reconstruction threshold. Slot assignment is acceptance policy, not exclusivity — transient duplicate holders for the same slot are harmless and add redundancy.

+ +

Health monitoring

+
    +
  • ≥5 live slots: healthy, no action
  • +
  • 4 live slots: trigger background re-replication, targeting the longest-dark slot first
  • +
  • <3 live slots: content at risk (requires catastrophic loss of 8+ nodes simultaneously)
  • +
+ +

Interaction with full copies

+

As content gains followers, the follow graph naturally absorbs redundancy through full-copy pull sync. The shard layer can back off:

+
    +
  • 2+ full copies in mesh: equivalent to ≥4 live shards → shard chain deprioritizes, may decay
  • +
  • 1 full copy: shard chain reformation trigger
  • +
  • 0 full copies: shard chain is sole redundancy, maintain aggressively
  • +
+

This means popular content automatically shifts from CDN shard infrastructure to the social follow graph. The shard layer only works hard for content nobody has explicitly chosen to keep — exactly the content with the highest liability exposure.

+ +

Re-replication

+

When a slot goes dark, a new shard holder is assigned via DHT. The new holder determines which chunks belong to its slot and pulls only those chunks from the live shard holders that have them. No shard holder ever reconstructs the full content — each node only ever possesses its own slot’s chunks. The pulling node identifies what it needs, requests those specific chunks, and aggressively refuses anything outside its assigned slot. The author’s node can go offline permanently once mesh replication is established.

+ +

Replication window

+

Active shard push replication applies the same 72-hour window as full-copy replication (Section 19): only posts less than 72 hours old are actively pushed to shard holders. Beyond 72 hours, the shard chain relies on natural decay and pull-based replication only.

+

Exception — share link re-promotion: When a share link is added to a post’s BlobHeader, the 72-hour active replication window resets from that event. This ensures that content being actively shared gets CDN-prioritized delivery regardless of original post age. The re-promotion window is 72 hours from the share link addition, not from the original post timestamp.

+ +

Implementation path

+

Extends the existing ReplicationRequest/ReplicationResponse (0xE1/0xE2) protocol. Shard slot metadata fits in the existing BlobHeader. The CDN hosting tree, downstream registration, and eviction scoring (Section 18) continue to work unchanged for full copies — sharding is an additional layer for the auto-replication path only.

+
+

19. Sync Protocol