Design doc: erasure-coded CDN replication section (planned)
Adds Section 18b documenting the planned erasure-coded shard layer for public post auto-replication. 3-of-10 scheme where CDN nodes hold sub-threshold shards that are mathematically unreconstructable alone. Re-replication via chunk-pull only — no shard ever reconstructs the full content. Connects to existing CDN tree, encryption, and ReplicationRequest infrastructure. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
926e0c1509
commit
4e18dbd1b9
1 changed files with 48 additions and 0 deletions
|
|
@ -93,6 +93,7 @@
|
|||
<a href="#keep-alive">16. Keep-Alive Sessions</a>
|
||||
<a href="#content">17. Content Propagation</a>
|
||||
<a href="#files">18. Files & Storage</a>
|
||||
<a href="#erasure-cdn">18b. Erasure-Coded CDN Replication</a>
|
||||
<a href="#sync">19. Sync Protocol</a>
|
||||
<a href="#encryption">20. Encryption</a>
|
||||
<a href="#deletes">21. Delete Propagation</a>
|
||||
|
|
@ -953,6 +954,53 @@ FAILURE: C → B → A: AnchorProbeResult { reachable: false }</code></pre
|
|||
</table>
|
||||
</section>
|
||||
|
||||
<!-- 18b. Erasure-Coded CDN Replication (Future) -->
|
||||
<section id="erasure-cdn">
|
||||
<h2>18b. Erasure-Coded CDN Replication <span class="badge badge-planned">Planned</span></h2>
|
||||
|
||||
<h3>Problem</h3>
|
||||
<p>The existing CDN hosting tree (Section 18) replicates full blob copies to downstream peers. This works well when the replicating node chose to pull the content — a follow relationship or explicit action establishes user consent. But the <code>ReplicationRequest</code> (0xE1) protocol also pushes content to infrastructure nodes that never chose to host it. A node holding a full copy of content it never reviewed faces potential liability for that content.</p>
|
||||
<p>Encryption does not solve this for public posts: the content is plaintext by definition. A different mechanism is needed that makes it <strong>technically impossible</strong> for a CDN node to possess reconstructable content.</p>
|
||||
|
||||
<h3>Approach: sub-threshold erasure shards</h3>
|
||||
<p>Instead of replicating full blobs, public post auto-replication distributes erasure-coded shards using a <strong>3-of-10</strong> scheme (k=3, n=10). Each shard contains 1/3 of the data. Reconstruction requires cooperation from any 3 of the 10 shard holders. A single shard is mathematically meaningless noise — not encrypted content where the full payload exists behind a key, but genuinely incomplete data that cannot be reconstructed alone.</p>
|
||||
|
||||
<h3>Where sharding applies</h3>
|
||||
<p>The existing storage tiers each have their own liability story. Sharding only fills the gap for public auto-replication:</p>
|
||||
<table>
|
||||
<tr><th>Tier</th><th>Storage</th><th>Defense</th></tr>
|
||||
<tr><td>Author’s node</td><td>Full copy</td><td>Publisher responsibility (content originator)</td></tr>
|
||||
<tr><td>Pulled content (follows)</td><td>Full copy</td><td>User consent — explicit follow relationship</td></tr>
|
||||
<tr><td>Private auto-replication</td><td>Encrypted (CEK envelope, Section 20)</td><td>Replicating nodes are provably not keyring recipients — existing encryption architecture handles this</td></tr>
|
||||
<tr><td>Public auto-replication</td><td><strong>Erasure-coded shards</strong></td><td>Sub-threshold shard — reconstruction impossible from any single holder</td></tr>
|
||||
</table>
|
||||
|
||||
<h3>Shard assignment</h3>
|
||||
<p>Slot assignment is deterministic from the PostId via DHT-style hashing, carried in the existing BlobHeader metadata — no additional discovery round required. Each node enforces <strong>single-slot acceptance</strong>: it only accepts shard push offers for its assigned slot, rejecting others. This prevents a bad actor from accumulating multiple shards toward the reconstruction threshold. Slot assignment is acceptance policy, not exclusivity — transient duplicate holders for the same slot are harmless and add redundancy.</p>
|
||||
|
||||
<h3>Health monitoring</h3>
|
||||
<ul>
|
||||
<li><strong>≥5 live slots:</strong> healthy, no action</li>
|
||||
<li><strong>4 live slots:</strong> trigger background re-replication, targeting the longest-dark slot first</li>
|
||||
<li><strong><3 live slots:</strong> content at risk (requires catastrophic loss of 8+ nodes simultaneously)</li>
|
||||
</ul>
|
||||
|
||||
<h3>Interaction with full copies</h3>
|
||||
<p>As content gains followers, the follow graph naturally absorbs redundancy through full-copy pull sync. The shard layer can back off:</p>
|
||||
<ul>
|
||||
<li><strong>2+ full copies in mesh:</strong> equivalent to ≥4 live shards → shard chain deprioritizes, may decay</li>
|
||||
<li><strong>1 full copy:</strong> shard chain reformation trigger</li>
|
||||
<li><strong>0 full copies:</strong> shard chain is sole redundancy, maintain aggressively</li>
|
||||
</ul>
|
||||
<p>This means popular content automatically shifts from CDN shard infrastructure to the social follow graph. The shard layer only works hard for content nobody has explicitly chosen to keep — exactly the content with the highest liability exposure.</p>
|
||||
|
||||
<h3>Re-replication</h3>
|
||||
<p>When a slot goes dark, a new shard holder is assigned via DHT. The new holder determines which chunks belong to its slot and <strong>pulls only those chunks</strong> from the live shard holders that have them. No shard holder ever reconstructs the full content — each node only ever possesses its own slot’s chunks. The pulling node identifies what it needs, requests those specific chunks, and aggressively refuses anything outside its assigned slot. The author’s node can go offline permanently once mesh replication is established.</p>
|
||||
|
||||
<h3>Implementation path</h3>
|
||||
<p>Extends the existing <code>ReplicationRequest</code>/<code>ReplicationResponse</code> (0xE1/0xE2) protocol. Shard slot metadata fits in the existing BlobHeader. The CDN hosting tree, downstream registration, and eviction scoring (Section 18) continue to work unchanged for full copies — sharding is an additional layer for the auto-replication path only.</p>
|
||||
</section>
|
||||
|
||||
<!-- 19. Sync Protocol -->
|
||||
<section id="sync">
|
||||
<h2>19. Sync Protocol</h2>
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue