distsoc — Discovery Protocol v2 Design Review

1.1 The Three Layers

Layer	Purpose	Map contents	Update mechanism
L1 Peer Discovery	Find any node's address	NodeIds + hop distance + addresses (1-hop only). ~350K entries.	2-min diffs (1-hop forwarding), worm lookup
L2 File Storage	Content-addressed storage + author update propagation	`node:postid` + media + `author_recent_posts` (256KB max)	File replication, piggybacked updates, staleness pulls
L3 Social Routing	Direct routes to follows / audience	Cached routes to socially-connected nodes	Push (audience), pull (follows), route validation

Design Principle

Layer 1 answers "where is this node?" Layer 2 answers "where is this content?" Layer 3 answers "how do I reach the people I care about?" Each layer operates independently but they reinforce each other — a file layer hit can bypass a Layer 1 worm entirely.

1.2 Follow vs Audience

	Follow	Audience
Initiation	Unilateral — no request needed	Requires request + author approval
Delivery	Pull only — follower pulls updates	Push — author pushes via push worm
Author awareness	Author does not know	Author knows (approved the request)
Latency	Minutes (pull cycle or file-chain propagation)	Seconds (direct push)
Resource cost	Follower bears cost	Author bears cost
Scale	Unlimited followers (pull is distributed)	Author pushes to approved list

Key Distinction

Follows are private and passive — the author never learns who follows them. Content reaches followers via the file layer (author_recent_posts propagates through stored files) or via periodic pull. Audience is consented and active — the author pushes in real-time.

1.3 Connection Model — 101 Persistent QUIC

┌──────────────────────────────────┐
│         This Node (101 conns)    │
├──────────────────────────────────┤
│  81 Social Peers                 │
│  ├─ Mutual follows               │
│  ├─ Audience (granted)           │
│  ├─ Users we follow (online)     │
│  ├─ Recent sync partners         │
│  └─ (evicted by priority)        │
│                                  │
│  20 Wide Peers                   │
│  ├─ Diversity-maximizing         │
│  ├─ Re-evaluated every 10 min    │
│  └─ At least 2 must be anchors   │
└──────────────────────────────────┘

Mobile: 10 social + 5 wide = 15 connections.

All connections use a single ALPN (distsoc/2) with multiplexed message types. One TLS handshake per peer. QUIC keep-alive every 20 seconds.

Resource	Desktop (101)	Mobile (15)
Memory (connection state)	~1.5 MB	~250 KB
Keep-alive bandwidth	~22 MB/day	~3.2 MB/day
CPU	Negligible	Negligible

1.4 Unified Protocol — Single ALPN

All communication over one ALPN distsoc/2. Message types via 1-byte header per QUIC stream:

Layer 1: Peer Discovery
  0x01  RoutingDiff          2-min gossip diff
  0x02  InitialMapSync       Full map exchange on new connection
  0x10  WormRequest          Forwarded search
  0x11  WormQuery            Fan-out to peers (local check)
  0x12  WormResponse         Results to originator
  0x20  AddressRequest       Resolve NodeId → address
  0x21  AddressResponse

Layer 2: File / Content
  0x30  FileRequest          Request a post by PostId
  0x31  FileResponse
  0x32  AuthorUpdateRequest  Request fresh author_recent_posts
  0x33  AuthorUpdateResponse
  0x34  AuthorUpdatePush     Push updated author_recent_posts
  0x35  PostNotification     Real-time new post notification

Layer 3: Social
  0x40  PullSyncRequest      Follower requests posts since seq N
  0x41  PullSyncResponse
  0x42  PushPost             Audience push delivery
  0x43  AudienceRequest      Request to join audience
  0x44  AudienceResponse

General
  0x50  ProfileUpdate
  0x51  DeleteRecord
  0x52  VisibilityUpdate

Why Single ALPN

Previous design used 4 ALPNs (sync/6, addr/1, gossip/1, worm/1) requiring separate connections. Single ALPN means one TLS handshake per peer, connection reuse for all message types, simpler accept loop, easy to add new message types.

2.1 The 3-Hop Discovery Map

┌─────────────────────────────────────────────────────────┐
│ Hop 1: 101 direct peers                                 │
│   Stored: NodeId + SocketAddr + is_anchor + is_wide     │
│   Source: Direct QUIC connection observation             │
│                                                         │
│ Hop 2: ~5,500 unique nodes                              │
│   Stored: NodeId + reporter_peer_id + is_anchor         │
│   Source: Peers' 1-hop diffs (their direct connections) │
│                                                         │
│ Hop 3: ~350,000 unique nodes                            │
│   Stored: NodeId only                                   │
│   Source: Peers' 2-hop diffs (their derived knowledge)  │
└─────────────────────────────────────────────────────────┘

Storage: ~11.3 MB (101×96B + 5,500×66B + 350K×32B)

Why ~350K with 20 Wide Peers

Source (2-hop)	Raw	After dedup
Social intra-cluster (81 × ~50)	4,050	~118 (rest of our cluster)
Social inter-cluster (81 × ~51)	4,131	~3,500
Wide intra-cluster (20 × ~50)	1,000	~1,000
Wide inter-cluster (20 × ~51)	1,020	~1,000
Total 2-hop		~5,500

3-hop: Each of ~5,500 2-hop nodes has ~100 connections not yet counted. Wide-wide-wide paths contribute ~80K+ unique nodes from completely different parts of the graph. Total after dedup: ~350,000.

Wide Peer Multiplier

Without dedicated wide peers, 101 random social connections in a clustered graph reach ~150-200K. With 20 wide peers: ~350K. The wide peers cascade diversity — their wide peers escape their neighborhoods, and so on through 3 levels.

2.2 Diff-Based Gossip (2-min cycles)

Every 2 minutes, each node sends a diff to each of its 100 other peers:

  RoutingDiff {
    hop1_changes: [Added/Removed/AddressChanged],  // our direct observations
    hop2_changes: [Added/Removed],                  // derived from received diffs
    seq: u64,
  }

How Diffs Propagate (1-hop forwarding only)

T=0 Node X goes offline. X's 101 direct peers detect connection drop. T=0-2m X's peers include "X removed" in their next 1-hop diff. → X's peers' neighbors learn "X gone from 2-hop" T=2-4m Those neighbors include "X removed" in their 2-hop diff. → Nodes 3 hops from X learn "X gone from 3-hop" T=4-6m Further propagation for deeper views. 3-hop propagation: ~6 min worst case, ~3 min average.

No amplification: You don't re-forward received diffs. You compute your own view's changes and report those. Each change is re-derived at every hop.

Bandwidth

Churn rate	Diff size/peer	Per day (Layer 1)
1% hourly (low)	~200 bytes	~50 MB
5% hourly (mobile-heavy)	~700 bytes	~122 MB

Previous design: ~318 MB/day. This is 2.5-6x better.

2.3 Worm Lookup with Fan-Out

Used when target NodeId is not in local 3-hop map.

Worm arrives at node A looking for targets [T1, T2, T3]: Step 1: LOCAL CHECK A checks own 3-hop map (~350K entries). O(1) per target. Found T2 → send WormResponse directly to originator. Step 2: FAN-OUT CHECK (parallel, 500ms timeout) A sends WormQuery to all 100 peers. Each peer checks their ~350K map. O(1) per target. Peer P7 finds T1 → resolves address → WormResponse to originator. Step 3: FORWARD remaining targets Select best forwarding peer (wide, not visited, not queried). Forward WormRequest { ttl: ttl-1 }. That peer repeats Steps 1-3.

Coverage Per Hop

Component	Entries checked
Local 3-hop map	~350,000
100 peers' maps (fan-out)	100 × ~350,000 = ~35,000,000
After overlap dedup	~25,000,000 (1.25% of 2B)

With iterative routing (each hop guided toward target):

TTL=3: ~75M entries — finds most socially-proximate targets
TTL=5: ~125M entries — finds virtually any reachable target
Expected resolution: 3-5 hops, 1.5-2.5 seconds

vs Previous Design

Previous worm checked only local map (~2M per hop with 2-hop tables). Fan-out to 100 peers gives 12x more coverage per hop.

2.4 Address Resolution Chain

1. DIRECT       — T in 1-hop → have address                    (instant)
2. 2-HOP REF    — T in 2-hop → ask reporter for address        (1 RTT)
3. 3-HOP REF    — T in 3-hop → ask peers who's closer → chain  (2 RTT)
4. WORM         — T not in map → worm search                   (1.5-5 sec)
5. ANCHOR        — Worm fails → profile anchor or bootstrap     (1-5 sec)

Tier	Nodes covered	% of 2B
1-hop	101	0.000005%
2-hop	~5,500	0.00028%
3-hop	~350,000	0.018%
Worm (1 hop)	~25,000,000	1.25%
Worm (5 hops)	~125,000,000	6.25%

3.1 Core Concept — Files Carry Their Own Routing

Every stored file (post + media) carries a small metadata blob: author_recent_posts (max 256 KB, author-signed). This blob lists the author's recent post IDs.

Key Insight

If you have any file by author X, you passively know X's recent posts. You can then request specific posts from any peer who has them — you don't need to find author X.

This creates a natural CDN: popular authors' post updates propagate through the file storage network as each copy of their files carries the latest post list.

StoredFile {
  post: Post,                          // content-addressed, immutable
  post_id: PostId,                     // blake3(content)
  media_blobs: Vec<MediaBlob>,

  author_recent_posts: AuthorRecentPosts {
    author_id: NodeId,
    posts: Vec<RecentPostEntry>,       // newest first
    updated_at: u64,                   // ms timestamp
    signature: Signature,              // author signs this blob
    // Max 256 KB total → ~8,000 recent post entries
  }
}

3.2 Update Propagation — Three Paths

Author A publishes a new post. Path 1: DIRECT PUSH (audience, seconds) A pushes new post to audience members via push worm. Recipients update their stored copies of A's files with the new author_recent_posts blob. Path 2: FILE-CHAIN PROPAGATION (followers, <12 min typical) A's 101 persistent peers receive the update. When ANY peer accesses a file by A, they see the fresh author_recent_posts and can request the new post. Propagates naturally as files are accessed/synced. Path 3: STALENESS PULL (>1 hour fallback) If author_recent_posts.updated_at is older than 1 hour, the holder triggers an update pull: - Check Layer 3 social route to author - Check other peers who hold author's files - Worm request for latest author_recent_posts

Result

Popular authors' updates reach most file holders within minutes. Unpopular authors' updates reach followers within 1 hour (staleness pull).

3.3 Popular Author Scale (1M Audience)

Author A has 1,000,000 audience members. Posts a new photo. Layer 1: A has 101 persistent connections. PostNotification sent to all. → 101 audience members get it instantly. → 101 copies of updated author_recent_posts now exist. Layer 2: Those 101 peers have files by A. Each has 101 peers. → T+2m: 101 × 100 = ~10,000 peers see fresh author_recent_posts → T+4m: 10,000 × 100 = ~1,000,000 peers reached → Natural file-chain propagation covers the full audience No destination declared. The file layer IS the CDN. Author does 101 pushes. O(log N) hops to reach everyone.

vs Previous Design

Previous design: author splits audience into chunks of 10, tries to push to each chunk leader. 1M audience = 100K chunks = author's machine saturated for hours.

New design: author does 101 pushes total. File layer handles the rest via natural propagation. O(1) work for the author, O(log N) time to reach everyone.

3.4 Requesting Posts via File Layer

You see author A has new post P in author_recent_posts.
You don't have post P stored locally.

1. Check if any persistent peer has P:
   → Fan-out WormQuery to 100 peers (they check local storage)
   → Any peer with the file can serve it

2. Request posts newest-to-oldest:
   → Prioritize catching up on recent content
   → Older posts can wait or be skipped

3. No need to contact author A at all.

File Authority Chain

Each node caches a route back to the author for each file they hold. When author_recent_posts is stale (>1 hour), follow the authority chain hop-by-hop toward the author. Each hop may have a fresher copy — you don't need to reach the author, just a fresher copy.

3.5 File Keep Priority

Formula

priority = pin_bonus + (relationship × heart_recency × post_age / (peer_copies + 1))

Scoring Tables

Relationship (to file's author)
Self (our own content)	∞ (never evicted)
We are audience of author	10
We follow author	8
Author has >10 hearts from network	5
Author has >3 hearts	3
Author has >2 hearts	2
No relationship	1

Time Window	Heart Recency Score	Post Age Score
< 72 hours	100	100
3-14 days	50	50
14-45 days	25	25
45-90 days	12	12
90-365 days	6	6
1-3 years	3	3
4-10 years	1	1

Peer Copies: Divides priority. More copies nearby = lower urgency to keep ours. 0 copies → full priority. 10 copies → 1/11 priority.

Pin: 99,999 bonus. Even pins compete when storage is full.

Examples

YOUR OWN post:
  ∞ → never evicted

Audience author, yesterday, hearted today, 0 copies:
  0 + (10 × 100 × 100 / 1) = 100,000 → very high

Followed author, last week, hearted 2 days ago, 2 copies:
  0 + (8 × 100 × 50 / 3) = 13,333 → high

Popular stranger (>10 hearts), yesterday, 20 copies:
  0 + (5 × 100 × 100 / 21) = 2,381 → moderate

Random, 6 months old, 3 hearts, 8 copies:
  0 + (3 × 6 × 6 / 9) = 12 → very low

Unknown, no hearts, old, many copies:
  0 + (1 × 1 × 1 / 11) = 0.09 → first evicted

Storage Budget

Default: 10 GB. At 256 KB avg: ~40K files. At 1 MB avg (with media): ~10K files. The formula ensures: own posts always kept, audience/follow prioritized, rare content preserved, old/common/unrelated content evicted first.

4.1 Purpose — Cached Routes to People You Care About

Layer 3 is a personal routing cache for follows and audience. It stores recently-working routes so you can push/pull content without going through the Layer 1 worm every time.

SocialRoute {
  target: NodeId,
  relationship: Follow | Audience | Mutual,
  last_route: Vec<NodeId>,          // path that worked
  last_success: u64,
  address_hint: Option<SocketAddr>, // if direct worked
}

4.2 Follow Pull Path

Follower F wants updates from author A: 1. Is A a persistent peer? (Layer 1, 1-hop) → Yes: content flows in real-time. Done. 2. Check social route cache (Layer 3) → Have recent route? Follow it to A. → Pull author_recent_posts + new posts. → Update route cache. 3. Check file layer (Layer 2) → Have any of A's files? Check author_recent_posts freshness. → If <1 hour old: up to date. Request missing posts via worm. → If >1 hour old: follow file authority chain for fresher data. 4. Fall back to Layer 1 worm. Typical: step 1 or 2 (fast, no worm needed).

4.3 Audience Push Path

Author A creates a post. Has approved audience members. 1. Audience members who are persistent peers (1-hop): → Push PostNotification on persistent connection. Instant. 2. Audience members with social routes (Layer 3): → Follow cached route. Push post via push worm. → Update route cache on success. 3. Audience members with no cached route: → Layer 1 worm to find address. → Push post. Cache route for next time. For audience >101: post also pushed to file layer. File storage network handles further propagation. No destination declared — the CDN effect takes over.

4.4 Route Maintenance

On successful push/pull: update route + address hint
On failure: clear route, fall back to Layer 1
Every 30 min: validate routes for top-priority follows/audience
Routes older than 2 hours without verification → stale

5.1 Public Post — From Creation to Feed

T=0 Author A creates post P. PostId = blake3(content). Stores in local DB. Updates own author_recent_posts. Persistent peers (Layer 1): → PostNotification on all 101 connections. → All persistent peers have P + updated author_recent_posts. Audience push (Layer 3): → Persistent audience members: already done. → Social route audience: push via cached route. → No route: worm to find, then push. T<2m Peers' peers see updated author_recent_posts (Layer 2): → File-chain propagation begins. T<12m File layer reaches most file holders (Layer 2): → Anyone accessing any file by A sees new post listed. → Can request P from any peer who has it. T=60m Pull cycle for distant followers (Layer 3 fallback): → Follower checks author_recent_posts → sees P → requests it.

5.2 Encrypted Post (DM / Circle)

T=0 Author A creates post with VisibilityIntent::Direct([R]). 1. Generate random CEK 2. Encrypt content with ChaCha20-Poly1305 3. Wrap CEK per-recipient via X25519 DH 4. PostId = blake3(encrypted_content) Push to recipient R: → R is persistent peer? Push directly. → Social route? Push via route. → Worm to find R. author_recent_posts updated (includes PostId + VisibilityHint::Encrypted). Peers see there's a new post but can't read it without the wrapped key. On receipt, R: → DH to derive shared secret → Unwrap CEK → Decrypt content

5.3 Discovering a New User to Follow

User has author X's NodeId (from out-of-band sharing). 1. Layer 1 map: X in 3-hop? (~350K entries) → If yes: resolve address via referral chain. Connect. Done. 2. Worm search (Layer 1): fan-out to 100 peers. ~25M entries checked per hop, 3-5 hops. → Found? Connect to X. Pull profile + recent posts. Done. 3. File layer (Layer 2): anyone we know have X's files? → Check if any peer has author_recent_posts for X. → If yes: get X's recent posts without finding X directly. 4. Anchor fallback: contact bootstrap anchors. 5. Once connected to X (or X's file holders): → Cache social route (Layer 3) for future pulls. → Store X's files → future updates via file layer.

Three layers cooperate

Layer 1 finds the person. Layer 2 lets you get their content even without finding them. Layer 3 remembers the route for next time. Each layer is a fallback for the others.

5.4 Popular Author with 1M Audience

Author A: 1,000,000 audience members. Posts a photo. T=0s A pushes to 101 persistent peers. Author's work: 101 sends. Done. T=0-2s 101 audience members have post + fresh author_recent_posts. T=2m 101 × 100 = ~10,000 peers see updated author_recent_posts via file-chain propagation (Layer 2). T=4m 10,000 × 100 = ~1,000,000 peers reached. Full audience covered. Total author effort: 101 pushes. Total time to full coverage: ~4 minutes. Total bandwidth (author): 101 × post_size. No audience member list transmitted. No destination declared.

The File Layer IS the CDN

Popular content replicates because many peers have the author's files. The author_recent_posts blob travels with every file copy. The author doesn't need to know or manage the delivery — the storage network handles it.

6.1 Per-Node Summary

Metric	Desktop (101 conns)	Mobile (15 conns)
Layer 1 map	~350K entries, ~11 MB	~15K entries, ~500 KB
Layer 2 files	10K-40K files, ~10 GB	1K-5K files, ~1 GB
Layer 3 routes	~200-500 entries, ~50 KB	same
Layer 1 bandwidth	~50 MB/day	~8 MB/day
Layer 2 bandwidth	~50-100 MB/day (varies)	~10-30 MB/day
Layer 3 bandwidth	~10-20 MB/day	~5-10 MB/day
Total bandwidth	~110-170 MB/day	~23-48 MB/day
Worm coverage/hop	~25M (1.25%)	~2M (0.1%)
Worm hops to find any target	3-5	5-8 (or anchor)

6.2 Comparison to Previous Design

Aspect	Phase F (current code)	Protocol v2 (this spec)
Connections	Ephemeral (connect/sync/disconnect)	101 persistent
ALPNs	4	1
Gossip	Full peer list each time	2-min diffs, 1-hop forward
Map depth	2-hop (~5K)	3-hop (~350K)
Content delivery	Pull-only (60 min)	3 layers: push + pull + file propagation
File storage	Not managed	Priority-based with keep formula
Worm coverage/hop	~2M	~25M
Daily bandwidth	~318 MB	~110-170 MB
Popular author scale	Author pushes to all (O(N) work)	File layer propagates (O(log N) time)
First-contact latency	10-30 seconds	1-5 seconds

6.3 Anchor Node Costs

A well-connected anchor (listed by 10,000 users as profile anchor):

Activity	Per day
Persistent connections (~200)	~30 MB RAM
Gossip diffs (200 peers)	~10 MB
Map storage (larger map)	~50 MB disk
Worm forwarding (~100/hr)	~5 MB
Address lookups (~500/hr)	~2 MB
Content relay	~100 MB
Total	~170 MB/day, ~80 MB RAM

A $5/month VPS handles this comfortably.

New node, first launch: 1. Read bootstrap anchors from anchors.json (shipped with app) 2. Connect to 1-2 bootstrap anchors 3. Exchange Layer 1 maps (InitialMapSync) — learn ~350K NodeIds 4. Begin wide peer selection from learned nodes 5. Connect to 20 wide peers 6. Fill social peer slots based on follow list 7. Worm search for followed users not yet found 8. Within ~10 minutes: fully operational with 101 connections

Lightweight Bootstrap (Future)

New nodes don't need full map exchange. "I'm new, give me 200 diverse peers" (~15 KB response). Connect to received peers, build maps via normal gossip. Reduces anchor load from ~11 MB to ~15 KB per new node.

Phase 1: Foundation

Single ALPN (distsoc/2) with message type multiplexing
Persistent connection manager (81 social + 20 wide slots)
discovery_map table (Layer 1)
1-hop map population from persistent connections

Phase 2: Layer 1 Gossip + Worm

RoutingDiff and 2-min gossip cycle
2-hop + 3-hop derivation from diffs
Wide peer diversity scoring
Worm v2 with fan-out
Address resolution chain

Phase 3: Layer 2 File Storage

stored_files + author_recent_posts tables
File keep priority calculation + eviction
author_recent_posts update propagation
File authority chain routing
Post request via file layer (fetch from any holder)

Phase 4: Layer 3 Social Routing

social_routes table
Follow pull via cached routes
Audience push via cached routes + worm fallback
Route maintenance

Phase 5: Integration + Optimization

Popular author file-chain propagation
Lazy 3-hop streaming on connection
Mobile mode (15 conns, smaller maps)
Delta sync for content (sequence numbers)
Bloom filter caching (optional)

Q1: Peer Eviction Policy

When all 81 social slots are full and a higher-priority peer comes online, which peer gets dropped? Need to prevent thrashing (repeatedly connecting/disconnecting borderline peers).

Q2: `author_recent_posts` Authenticity

The blob is author-signed, but a malicious peer could serve a stale (valid but old) blob. Include sequence numbers? If you see seq 50 from one peer and seq 45 from another, seq 45 is stale.

Q3: Peer Copy Counting for Keep Priority

How do we learn peer_copies? Passively from worm responses and file requests. Exact counting isn't needed — order-of-magnitude is sufficient.

Q4: File Layer Bandwidth

If every file carries 256 KB of author_recent_posts, that's substantial overhead on small posts. Compact format (just PostIds at 32 bytes each = ~8000 entries per 256 KB) or fetch separately on demand?

Q5: Storage Quotas / 3x Hosting Rule

Design spec mentions 3x hosting quota. How does this interact with the keep priority formula? Quota sets overall budget, priority decides what fills it?

Q6: Global Lookup for Isolated Nodes

Worms + anchors handle most cases. For truly isolated nodes (~0.01% of lookups that fail), do we need a structured DHT layer? Or is anchor fallback sufficient at scale?

Overall Notes

Term	Definition
NodeId	ed25519 public key (32 bytes, 64 hex chars). Permanent identity.
Connect string	`NodeId@host:port`. Enough to establish first contact.
Anchor	Node with stable public address. Network entry point + relay.
Wide peer	One of 20 peers selected for maximum graph diversity.
Social peer	One of 81 peers selected by social relationship priority.
3-hop map	L1 ~350K NodeIds reachable within 3 hops of this node.
Worm	L1 Bounded-depth search with fan-out. ~25M entries checked per hop.
author_recent_posts	L2 256KB signed blob listing author's recent posts. Travels with every stored file.
File authority chain	L2 Cached route back to a file's author for freshness updates.
Keep priority	L2 Score determining which files to keep vs evict when storage is limited.
Social route	L3 Cached working path to a followed user or audience member.
Follow	Unilateral pull-only. Author doesn't know. Follower bears cost.
Audience	Consented push. Author knows + approves. Author pushes in real-time.
CEK	Content Encryption Key. Random per-post, ChaCha20-Poly1305.
RoutingDiff	L1 2-min gossip message. 1-hop + 2-hop changes only.
Push worm	Worm that delivers content (not just searches). Used for audience push.
Heart	User endorsement of a post. Affects file keep priority.
Peer copies	Number of copies of a file within 3-hop range. More copies = lower keep priority.