v0.3.1 — 2026-03-13

Design Document

This is the canonical technical reference for ItsGoin. It describes the vision, the architecture, and the current state of every subsystem — with full implementation detail. This document is versioned; each update records what changed.

Changelog

v0.3.1 (2026-03-13): Share links + QUIC proxy + content search. Share link format: itsgoin.net/p/<postid_hex>/<author_nodeid_hex> — simple, no host encoding needed. itsgoin.net web handler acts as QUIC proxy: receives browser request, searches the network for the post, fetches it on-demand via PostFetch (0xD4/0xD5), renders HTML, serves to browser. No permanent storage of fetched content. Extended worm search — WormQuery now carries optional post_id and blob_id fields for unified node/post/blob search. Each peer checks local storage, CDN downstream tree (up to 100 hosts per post), and blob store. WormResponse gains post_holder and blob_holder fields. Nova fan-out pattern — burst peers include one N2 wide referral; referred peer does its own 101-burst, reaching ~10K nodes with ~202 relay hops. PostFetch (0xD4/0xD5) — lightweight single-post retrieval after worm finds a holder, much lighter than full PullSync. itsgoin.net node deployed as anchor + web handler (--web 8080). “Unavailable” page with honest network model explanation + install CTA. Universal Links / App Links planned for native app interception. | Engagement sync — pull sync now fetches reactions, comments, and policies via BlobHeaderRequest/Response after every sync. Profile push fix — profile updates now sent to all connected mesh peers (not just audience). Auto-sync on follow — following a peer triggers immediate post pull + engagement fetch. Popover UI — notifications settings, network diagnostics, and message threads now open as popovers. Notification settings — per-key settings table in SQLite, configurable message/post/nearby notifications with JS Notification API. Tiered DM polling — smart message refresh based on conversation recency. Reaction display — posts show top 5 most popular emoji + total response count. UI cleanup — removed Suggested Peers and Find Nearby sections, placeholder text changed to “How’s it goin?”, clickable node IDs in activity log.

v0.3.0 (2026-03-12): Full rename distsoc → ItsGoin. ALPN, crypto contexts, data paths, Android package ID all changed. Clean break — incompatible with prior versions.

v0.2.11 (2026-03-12): Engagement system — reactions (public + private encrypted via X25519 DH + ChaCha20-Poly1305), inline comments with ed25519 signatures, author-controlled comment/react policies (audience-only, public, none), blocklist enforcement. CDN tree for all posts — new post_downstream table (keyed by PostId, max 100 peers) gives every post a propagation tree; PostDownstreamRegister (0xD3) sent when any peer stores a post. 4 new wire messages: BlobHeaderDiff (0xD0) for incremental engagement propagation, BlobHeaderRequest/Response (0xD1/0xD2), PostDownstreamRegister (0xD3). 6 new SQLite tables, 9 new IPC commands. Thread splitting — headers exceeding 16KB auto-split oldest comments into linked thread posts. Frontend: emoji picker, reaction pills, comment threads, policy selects in compose area.

v0.2.10 (2026-03-12): Per-family NAT classification — IPv4 and IPv6 public reachability now detected independently. Previously, a public IPv6 address incorrectly set has_public_v4=true, causing nodes behind IPv4 NAT to skip hole punching. STUN now always runs (unless --bind) so IPv6-only anchors correctly classify their IPv4 NAT. Anchor advertised address fallback — anchors without --bind or UPnP now advertise their first public bound address (e.g. IPv6 SLAAC), so peers store them in known_anchors for preferential reconnection. Bootstrap anchor deprioritization — startup connection sequence now tries discovered (non-bootstrap) anchors first, falling back to hardcoded bootstrap anchors only when no discovered anchor is reachable. Reduces load on bootstrap infrastructure as the network grows.

v0.2.9 (2026-03-12): ConnectionManager actor redesign — replaced single Arc<Mutex<ConnectionManager>> with two-layer actor pattern: ConnHandle (cheap-to-clone command sender) + ConnectionActor (dedicated tokio task, owns state, processes commands via mpsc/oneshot channels). Eliminated lock contention from 14 code paths that previously held the mutex during network I/O (up to 15s for QUIC connects). All network.rs and node.rs callers now use ConnHandle (~60 call sites migrated). I/O-heavy functions extracted as standalone: broadcast_diff, push_circle_profile, push_visibility, pull_from_peer, send_relay_introduce, send_anchor_register, request_anchor_referrals. Public conn_mgr() accessor removed — Arc<Mutex> is now an internal implementation detail of the actor.

v0.2.8 (2026-03-11): NAT filter probe (0xC6/0xC7) — anchor probes node’s filtering type by attempting QUIC connect from a different source port; address-restricted (Open) vs port-restricted determined in 2s, eliminating unnecessary scanning for most connections. Role-based NAT traversal — EIM nodes punch every 2s (stable port visible to peer scanner), EDM/Unknown nodes walk outward at ~100 ports/sec (opening firewall entries for peer punches to land). Steady scan replaces burst tiers (was 37K tasks, now ~20 in-flight). IPv4 vs IPv6 public differentiation — startup reports v4-only/v6-only/v4+v6, “Public” no longer assumes Open filtering. Task cleanup via JoinSet::abort_all().

v0.2.7 (2026-03-11): Port scanning refinement — scan only the anchor-observed IP (relay-injected first address) instead of all self-reported addresses, avoiding wasted scan budget on unreachable VPN/cellular IPs. Scanning now triggers when peer NAT type is unknown, not just when explicitly EDM.

v0.2.6 (2026-03-11): Anchor self-verification implemented (Section 8) — AnchorProbeRequest/Result (0xC3/0xC4) wire messages, witness-based cold reachability testing via N2 strangers, candidacy checklist (UPnP/public + 50 connections + 2h uptime + non-mobile), periodic re-probe in anchor register cycle, 2-failure revocation. Advanced NAT traversal implemented (Section 10) — NatMapping (EIM/EDM) + NatFiltering (Open/PortRestricted) profile types, hole_punch_with_scanning() replaces hard+hard skip at all 5 call sites, tiered port scanning (±500, ±2000, full ephemeral) at 50 concurrent probes, behavioral filtering inference from connection outcomes, PortScanHeartbeat (0xC5) message type. NAT profile shared in InitialExchange (nat_mapping/nat_filtering fields).

v0.2.5 (2026-03-11): Advanced NAT traversal design (Section 10) — relay-assisted port scanning protocol for EDM/symmetric NATs, full NAT combination matrix (mapping × filtering), tiered scan from observed port at 250/sec, 2s relay heartbeat feedback loop, makes hard+hard pairs solvable without full relay. Reconnection race fix — run_mesh_streams checks stable_id() before cleanup to prevent reconnecting peers from losing their connection entry.

v0.2.4 (2026-03-11): Anchor self-verification probe design (Section 8) — witness-based cold reachability testing via N2 strangers, candidacy checklist, periodic re-probe. Anchor selection simplified to LIFO on last_seen, removed success_count weighting, stale anchor cleanup (7-day probe). BlobHeader separation from blob content (Section 18) — immutable BLAKE3-addressed blobs require separate mutable headers, BlobHeader struct replaces CdnManifest, 25+25 post neighborhood, BlobHeaderDiff incremental propagation. Removed 3x hosting quota — CDN is attention-driven delivery infrastructure, not storage; author owns durability. Keep-alive session ceilings (Section 16) — desktop ~300-500, mobile ~25-50, mobile priority stack, hysteresis for borderline reachability. Mesh stranger controls — mutual mesh blacklist for targeted stranger relationships, --max-mesh CLI flag for topology testing. Phase 2 reciprocity simplified — attention model makes quota enforcement unnecessary.

v0.2.3 (2026-03-11): NAT type detection implemented (Section 10) — raw STUN probing classifies NAT as Public/Easy/Hard/Unknown on startup, shared in InitialExchange, stored per-peer, skip hole punch for hard+hard NAT pairs. LAN Discovery spec (Section 12) — mDNS scan loop for automatic LAN peer connection, keep-alive LAN sessions, local relay design. Pruning & timeout tuning — preferred peer prune 24h→7d, watcher expiry 24h→30d, N2/N3 startup sweep. Growth loop lock fix — resolve_address no longer blocks conn_mgr during network I/O.

v0.2.2 (2026-03-10): Hole punch fixes (Section 10) — session peers now fully participate in relay introduction (observed address injection for both requester and target), all hole punch paths use hole_punch_parallel() (parallel addresses, no more sequential timeouts), requester self-reported addresses filtered to publicly-routable only.

v0.2.1 (2026-03-10): Added UPnP port mapping (Section 11) — best-effort NAT traversal for desktop/home networks, external address in N+10 and peer advertisements, lease renewal cycle.

v0.2.0 (2026-03-09): Major design updates — three-layer architecture (Mesh/Social/File), N+10 identification, keep-alive sessions, 3-tier revocation, multi-device identity, growth loop redesign, pull sync from social/file layers, relay pipes default to own-device-only, remove anchor register loop.

v0.1.0 (2026-03-09): First versioned edition. Consolidated from ARCHITECTURE.md, code review, and gap analysis into a single source of truth.

1. The Vision

"A decentralized fetch-cache-re-serve content network that supports public and private sharing without a central server. It replaces 'upload to a platform' with 'publish into a swarm' where attention creates distribution, privacy is client-side encryption, and availability comes from caching, not money."

The honest promise: The CDN is an attention-driven delivery amplifier, not a storage guarantee. Hot content spreads naturally through demand; cold content decays unless intentionally hosted. Authors are responsible for their own content durability — a post backup/export tool is the author's safety net, not the network's job. The system is a loss-risk network — best-effort availability, not durability guarantees.

Guiding principles

2. Identity & Bootstrap

First startup

  1. Identity: Load or generate ed25519 keypair from {data_dir}/identity.key. NodeId = 32-byte public key. A unique device identity is also generated for multi-device coordination (see Section 23).
  2. Storage: Open SQLite database (distsoc.db), auto-migrate schema.
  3. Blob store: Create {data_dir}/blobs/ with 256 hex-prefix shards (00/ through ff/).
  4. UPnP mapping: Attempt UPnP/NAT-PMP port mapping (2s timeout). If successful, store external address for advertisements. Do not block startup if unavailable. See Section 11.
  5. NAT type detection: STUN probes to two public servers (3s timeout each). Classifies as Public/Easy/Hard/Unknown. UPnP success overrides to Public. Anchors skip probing. Result stored on ConnectionManager, shared in InitialExchangePayload, stored per-peer. See Section 10.
  6. Stale N2/N3 sweep: Remove all N2/N3 entries tagged to peers not in the current mesh. Clears stale reach data from previous sessions (e.g., unclean shutdown).
  7. Bootstrap anchors: Load from {data_dir}/anchors.json. If missing, use hardcoded default anchor.
  8. Bootstrap: If peers table is empty, connect to a bootstrap anchor. Request referrals and matchmaking (unless self or the other node is an anchor). Persist on that anchor's referral list until released (at referral count limit) while beginning the growth loop immediately.

Startup cycles

Spawned after bootstrap completes:

CycleIntervalPurpose
Pull syncOn demand (3h Self Last Encounter threshold)Pull new posts from social + upstream file peers
Routing diff120s (2 min)Broadcast N1/N2 changes to mesh + keep-alive sessions
Rebalance600s (10 min)Clean dead connections, reconnect preferred, signal growth
Growth loop60s + reactive (on N2/N3 receipt)Fill empty mesh slots until 101 (90% threshold for reactive mode)
Recovery loopReactive (mesh empty)Emergency reconnect via anchors
Social/File connectivity check60sVerify <N4 access to N+10 of active social + file peers; open keep-alive sessions as needed
UPnP lease renewal2700s (45 min)Refresh UPnP port mapping before TTL expiry (desktop only)
Removed: Anchor register loop. Anchors are for forming initial mesh connections when bootstrapping, not for ongoing registration. Nodes only connect to anchors during bootstrap or recovery.

3. N+10 Identification

Concept

Every node is identified not just by its NodeId but by its N+10: the node's own NodeId plus the NodeIds of its 10 preferred peers. This accelerates the capacity to find any node — if you can reach any of the 11 nodes in someone's N+10, you can find them.

Where N+10 appears

ContextWhat's included
Self identificationAll self-identification messages include the sender's N+10
Following someoneWhen you follow a peer, you store and maintain their N+10 in your social routes
Post headersEvery post header includes the author's current N+10. Updated whenever they post.
Blob headersBlob/file headers include: (1) the author's N+10, (2) the upstream file source's N+10 (if not the author), (3) N+10s of up to 100 downstream file hosts
Recent post listsAuthor manifests include the author's N+10 alongside their recent post list

Why this works

Preferred peers are bilateral agreements — stable, long-lived connections. By including them in identification, any node that can find any of your 10 preferred peers can transitively find you within one hop. This eliminates most discovery cascades for socially-connected nodes.

Status: Partial

N+10 is partially implemented — preferred peers exist and are tracked, but N+10 is not yet included in all identification contexts (post headers, blob headers, self-identification messages). Currently preferred_tree in social routes provides similar functionality for relay selection.

4. Connections & Growth

Connection types

Slot architecture

Slot kindDesktopMobilePurpose
Preferred103Bilateral agreements, eviction-protected
Non-preferred9112Growth loop fills these with diverse peers
Total mesh10115Long-lived routing backbone
Keep-alive sessionsNo hard limitNo hard limitSocial/file layer peers not in mesh (max 50% of session capacity reserved for keep-alive)
Sessions (interactive)No hard limitNo hard limitActive DM, group interaction, anchor matchmaking
Relay pipes102Own-device relay by default; opt-in for relaying for others
v0.2.0 change: Removed the distinction between "local" (71) and "wide" (20) non-preferred slots. The growth loop goes wide by default. Session counts are no longer hard-limited — an average computer can sustain ~1000 QUIC sessions without strain. The 50% keep-alive reservation ensures sessions remain available for interactive use.

MeshConnection struct

Each mesh connection tracks: node_id, connection (QUIC), slot_kind (Preferred or NonPreferred), remote_addr (captured from Incoming before accept), last_activity (AtomicU64), created_at.

Mutual mesh blacklist Planned

Targeted two-node stranger relationship. Both nodes opt in, maintaining genuine N2 stranger status indefinitely regardless of growth loop behavior. Stored in a local mesh_blacklist { node_id } table.

Production utility: Operators maintaining intentional stranger relationships for network diversity, preventing specific nodes from becoming preferred peers, or any scenario where two nodes want to cooperate at session level without mesh entanglement.

--max-mesh <n> CLI flag Planned

Topology control at network scale. Forces a node to cap its mesh connections, keeping it permanently in N2 of other nodes. Testing affordance only — not for production use.

Keepalive

5. Connection Lifecycle

5.1 Growth Loop (60s timer + reactive on N2/N3 receipt)

Timer: Fires every 60 seconds. Checks current mesh count. If < 101, runs a growth cycle.

Reactive trigger: Fires immediately after receiving a peer's N2/N3 list (from initial exchange or routing diff). Continues firing on each new N2/N3 receipt until mesh is 90% full (~91 connections). After 90%, switches to timer-only mode.

Candidate selection (N2 diversity scoring):

score = 1.0 / reporter_count + (0.3 if not_in_N3)

Connection attempt cascade:

  1. Direct connect (15s timeout) — use stored/resolved address
  2. Introduction fallback — find N2 reporters who know this peer, ask each to relay-introduce us

Failure handling: Track consecutive failures. After 3 consecutive failures, back off (break loop, wait for next signal). Mark unreachable peers for future skipping.

5.2 Rebalance Cycle (every 600s)

Executed in priority order:

  1. Dead connection removal: Remove connections with close_reason() set, or idle > 600s (zombie)
  2. Stale entry pruning: N2/N3 entries tagged to a peer that is no longer connected are pruned immediately (on disconnect and on startup sweep). Age-based fallback: entries older than 7 days. Social route watchers older than 30 days.
  3. Priority 0 — Preferred peer reconnection: Iterate preferred_peers table, reconnect any that are disconnected. If at capacity, evict the lowest-diversity non-preferred peer to make room. Prune preferred peers unreachable for 7+ days (slot released, does NOT auto-return on reconnect — must re-negotiate via MeshPrefer). After 7 days, social checkin frequency drops from 1–3 hours to daily until the 30-day reconnect watcher expires.
  4. Priority 1 — Reconnect recently dead: Re-establish dropped non-preferred connections. Skip blacklisted nodes — do not attempt reconnection to peers in mesh_blacklist.
  5. Priority 2 — Signal growth loop: Fill remaining empty slots via growth loop
  6. Idle session cleanup: Reap interactive sessions idle > 300s (5 min). Keep-alive sessions are NOT reaped by idle timeout.
  7. Relay intro dedup pruning: Clear seen_intros entries older than 30s, cap at 500
Note: Low diversity score alone does NOT trigger eviction. The only eviction path is Priority 0 (making room for a preferred peer).

5.3 Recovery Loop (reactive, mesh empty)

Trigger: disconnect_peer() fires when last mesh connection drops.

  1. Debounce 2 seconds (wait for cascading disconnects to settle)
  2. Gather anchors: known_anchors table ordered by last_seen DESC (LIFO — most recently seen is most likely still reachable) → fallback to hardcoded default anchor(s) only if known_anchors empty or exhausted
  3. For each anchor: connect, request referrals and matchmaking, try direct connect to each referral, fallback to hole punch via anchor for unreachable referrals
  4. Persist on anchor's referral list until released, begin growth loop immediately
  5. Post-bootstrap stale anchor cleanup: After successful bootstrap/recovery, probe known_anchors entries where last_seen > 7 days. Success: update last_seen. Failure: DELETE from known_anchors. Reuses existing anchor probe machinery (0xC3/0xC4). No new cycle or timer — runs as final step of bootstrap/recovery.

5.4 Initial Exchange (on every new connection)

When two nodes connect, they exchange:

Processing: Their N1 → our N2 table (tagged to reporter). Their N2 → our N3 table (tagged to reporter). Store profile, apply deletes, record replica overlaps. Trigger growth loop immediately with new N2/N3 candidates if mesh < 90% full.

5.5 Incremental Routing Diffs (every 120s + on change)

NodeListUpdate (0x01) contains N1 added/removed, N2 added/removed. Sent via uni-stream to all mesh peers and keep-alive sessions. Receiver processes: their N1 adds → our N2 adds, their N2 adds → our N3 adds, etc.

6. Network Knowledge Layers (N1/N2/N3)

LayerSourceContainsShared?Stored in
N1Our connections + social contactsNodeIds onlyYes (as "N1 share")mesh_peers + social_routes
N2Peers' N1 sharesNodeIds tagged by reporterYes (as "N2 share")reachable_n2
N3Peers' N2 sharesNodeIds tagged by reporterNeverreachable_n3

<N4 access

A node has <N4 access to a target if the target appears in its N1, N2, or N3 tables. This means the target is reachable within 3 hops without needing worm search or relay introduction. The social/file connectivity check (see Section 16) uses <N4 access to determine whether keep-alive sessions are needed.

What is NEVER shared

Address resolution cascade (connect_by_node_id)

StepMethodTimeoutSource
0Social route cachesocial_routes table (cached addresses for follows/audience)
1Peers tableStored address from previous connection
2N2 ask reportervariesAsk the mesh peer who reported target in their N1
3N3 chain resolvevariesAsk reporter's reporter (2-hop chain)
4Worm search3s totalBurst to all peers → nova to N2 referrals (each does own burst)
5Relay introduction15sHole punch via intermediary relay
6Session relayPipe traffic through intermediary (own-device or opt-in)

7. Three-Layer Architecture (Mesh / Social / File)

The network operates across three distinct layers, each with its own connections, routing, and purpose. The separation enables specialized behavior without the layers interfering with each other.

LayerPurposeConnectionsSync trigger
MeshStructural backbone: N1/N2/N3 routing, diversity, discovery101 mesh slots (preferred + non-preferred)N/A — mesh is infrastructure, not content
SocialFollows, audience, DMs — the human relationshipsSocial routes + keep-alive sessions as neededPull posts when Self Last Encounter > 3 hours
FileContent storage and distribution — blobs, CDN treesUpstream/downstream file peers + keep-alive sessions as neededPull on blob request, push on post creation

Key principle: mesh is not for content

Pull sync does not pull posts from mesh peers. Mesh connections exist for routing diversity and discovery. Content flows through the social layer (posts from people you follow) and the file layer (blobs from upstream/downstream hosts). This separation means mesh connections can be optimized purely for network topology without social bias.

Cross-layer benefits

Each layer's connections contribute to finding nodes and referrals for the other layers. Keep-alive sessions from the social and file layers participate in N2/N3 routing, which improves <N4 access for all three layers. A social keep-alive session might provide the N2 entry that helps the mesh growth loop find a diverse new peer, and vice versa.

8. Anchors

Intent

Anchors are "just peers that are directly reachable" — standard ItsGoin nodes with a routable address. They run the same code with no special protocol. Their value comes from being directly connectable for bootstrapping new nodes into the network and matchmaking (introducing peers to each other). Anchors include VPS-deployed nodes (always-on) and desktop nodes with UPnP port mappings (see Section 11).

Each profile can carry a preferred anchor list — infrastructure addresses, not social signals.

Status: Complete (with gaps)

When anchors are used

Anchor referral mechanics

When a bootstrapping node connects, the anchor provides referrals from its mesh and referral list. The node persists on the anchor's referral list until released at the referral count limit. During this time, the anchor can matchmake — introducing the new node to other peers requesting referrals.

Anchor selection order

  1. known_anchors tableORDER BY last_seen DESC (LIFO). The most recently seen anchor is most likely still reachable, particularly given short-lived home desktop anchors.
  2. Hardcoded default anchor(s) — only if known_anchors is empty or exhausted. A brand-new node hits hardcoded anchors once on first bootstrap, populates known_anchors from that session, and the hardcoded list recedes to pure fallback.

No scoring, no success counting, no prediction. Attempt, move to next on failure. The known_anchors table stores only: node_id, addresses, last_seen.

Anchor self-verification Complete

Nodes with UPnP-mapped IPv4 or IPv6 public addresses cannot self-certify as anchors — they need external verification that they are genuinely reachable by cold direct connect. A node is a viable anchor only if a complete stranger can connect to it directly with no introduction, no hole punch, and no relay.

Witness selection

Node A (candidate anchor) selects a witness from its own N2 table entries NOT present in its N1. These are genuine strangers — no prior connection, no cached address, no warm path. A selects one (call it C) and knows C's address via the N1 reporter (call it B) who reported C in their N1 share.

Probe message flow

A → B (N1 reporter of C): AnchorProbeRequest {
    target_addr,     // A's external address to test
    witness,         // C's NodeId
    return_via,      // B's NodeId (for failure reporting)
}

B → C: forward AnchorProbeRequest

C: cold direct QUIC connect to target_addr
   — MUST use only raw QUIC connect (step 1 of connect_by_node_id)
   — MUST skip entire resolution cascade, hole punch, introduction, relay
   — 15s timeout

SUCCESS: C → A directly (on new connection): AnchorProbeResult { reachable: true }
FAILURE: C → B → A: AnchorProbeResult { reachable: false }

Asymmetric return path: If cold connect fails, by definition there is no direct path from C to A. C reports failure through B (who has a live connection to A). On success, C has a fresh direct connection and uses it. The return_via field tells C which node to route failure through.

Why bypass the cascade: The normal connect_by_node_id cascade has 7 steps including hole punch and relay. If C uses the full cascade, a successful result via relay is a false positive. The probe handler must be a special code path: raw QUIC connect only.

Anchor candidacy checklist

is_anchor_candidate():
  - has UPnP mapping OR has IPv6 public address
  - probe succeeded within last 30 minutes
  - mesh ≥ 50 peers (sufficient N2 density)
  - uptime ≥ 2 hours continuous
  - NOT mobile (platform check at build time)

Probe refresh schedule

TriggerAction
Startup (after UPnP attempt)Run initial probe
UPnP renewal if address changedRe-probe
Every 30 minutes while anchor-declaredPeriodic re-probe
Any failed inbound connectionImmediate re-probe
Two consecutive probe failuresStop advertising as anchor, revert to normal peer

Session fallback for full anchors

When an anchor's mesh is full (101/101), new nodes fall back to a session connection for matchmaking. The anchor accepts referral requests over session connections, not just mesh.

Remaining gaps

GapImpact
Profile anchor lists not used for discoveryProfiles have an anchors field but it's not consulted during address resolution
No anchor-to-anchor awarenessAnchors don't discover each other unless they connect through normal mesh growth
Bootstrap chicken-and-eggA fresh anchor with few peers produces few N2 candidates for new nodes. Growth stalls because there's nothing to grow from.

9. Referrals

Status: Complete

Referral list mechanics (anchor side)

Anchors maintain an in-memory HashMap of registered peers. Each entry: { node_id, addresses, use_count, disconnected_at }.

PropertyValue
Tiered usage caps3 uses if list < 50, 2 uses at 50+, 1 use at 100+
Disconnect grace2 minutes before pruning
Sort orderLeast-used first (distributes load)
Auto-supplementWhen explicit list is sparse (< 3 entries), supplement with random mesh peers

10. Relay & NAT Traversal

Status: Complete

Relay selection (find_relays_for)

Find up to 3 relay candidates, prioritized:

  1. Preferred tree intersection: Target's preferred_tree (from social_routes, ~100 NodeIds) intersected with our connections. Prefer our own preferred peers within that tree. TTL=0.
  2. N2 reporters: Our mesh peers who reported the target in their N1 share. TTL=0.
  3. N3 via preferred tree: Target's preferred_tree intersected with N3 reporters. TTL=1.
  4. N3 reporters: Any N3 reporter for the target. TTL=1.

RelayIntroduce flow (0xB0/0xB1)

  1. Requester → opens bi-stream to relay, sends RelayIntroduce { target, requester, requester_addresses, ttl }
  2. Relay handles three cases:
    • We ARE the target: Return our addresses, spawn hole punch to requester
    • Target is our mesh or session peer: Forward request to target on new bi-stream, relay response back. Inject observed public addresses for both parties (session peers carry remote_addr from their inbound connection).
    • TTL > 0 and target in our N2: Forward to the reporter with TTL-1 (chain forwarding, max TTL=2)
  3. Requester receives RelayIntroduceResult { target_addresses, relay_available }, then:
    • hole_punch_parallel(): Try all returned addresses in parallel, retry every 2s, 30s total timeout
    • If hole punch fails and relay_available: open SessionRelay (0xB2) pipe through the intermediary

Session relay (relay pipes)

Intermediary splices bi-streams between requester and target. Desktop: max 10 concurrent pipes. Mobile: max 2. Each pipe has a 50MB byte cap and 2-min idle timeout.

v0.2.0 change: Relay pipes are own-device-only by default. A node will only relay traffic between its own devices (same identity key, different device identity). Users can opt in to relaying for others in Settings, but this is not enabled automatically. This prevents nodes from unknowingly burning bandwidth for random peers while still enabling personal multi-device routing.

Deduplication & cooldowns

MechanismWindowPurpose
seen_intros30sPrevents forwarding loops
relay_cooldowns5 min per targetPrevents relay spamming

Hole punch mechanics

Both sides filter self-reported addresses to publicly-routable only (no Docker bridge, VPN, or LAN IPs) and prepend UPnP external address if available. The relay injects each party's observed public address (from the QUIC connection) at the front of the list. All paths use hole_punch_parallel(): parse returned addresses into QUIC EndpointAddr, spawn parallel connect attempts to every address simultaneously. Each attempt: 2s timeout, retried until 30s total deadline. First successful connection wins.

NAT type detection

Status: Complete (interim: public STUN servers)

On startup, each node classifies its NAT type as one of four categories:

Current implementation (interim)

Raw STUN Binding Requests (20 bytes, no crate dependency) sent to stun.l.google.com:19302 and stun.cloudflare.com:3478 from a single UDP socket. XOR-MAPPED-ADDRESS parsed from each response (IPv4 + IPv6 supported). Comparison: same mapped port = Easy, different = Hard, matches local = Public. 3s timeout per server. UPnP success overrides to Public. Anchors skip probing entirely (already Public).

Target design (multi-anchor STUN)

When the network has enough anchors, replace public STUN servers with anchor-reported your_observed_addr from InitialExchange. Connecting to two or more anchors at different public IPs provides the same classification without external dependencies.

NAT type sharing

NAT type is included as a string field ("public"/"easy"/"hard"/"unknown") in InitialExchangePayload. Stored per-peer in the peers table (nat_type TEXT column). Available for hole punch decisions before any connection attempt.

Hole punch strategy

Peer APeer BStrategy
Public / EasyAnyHole punch (likely success)
Hard NATEasy NATHole punch (B's port is predictable)
Hard NATHard NATPort scanninghole_punch_with_scanning() tries standard punch first, then escalates to tiered port scanning (±500, ±2000, full ephemeral range)

All hole punch paths use hole_punch_with_scanning() which replaces the former hard+hard skip. NAT profiles (NatMapping + NatFiltering) from InitialExchange determine whether scanning is attempted. Behavioral inference updates filtering classification from connection outcomes.

Advanced NAT traversal

Status: Complete

NAT "hardness" has two independent dimensions:

STUN probing at startup classifies mapping (EIM/EDM). Filtering is determined reliably via the anchor filter probe.

NAT filter probe (0xC6/0xC7)

After anchor registration, each node with Unknown filtering sends a NatFilterProbe bi-stream request to its anchor. The anchor creates a temporary QUIC endpoint on a random port and attempts to connect to the node’s observed address (2s timeout). If the connection succeeds, the node is Open (address-restricted or better — accepts packets from any port on the anchor’s IP). If it times out, the node is PortRestricted.

This probe runs once at startup (during anchor register cycle) and the result feeds into all subsequent InitialExchange payloads, so peers know each other’s exact filtering type.

Note: “Public” NAT type does not automatically mean Open filtering. A node may be public on IPv6 but NATed on IPv4. The filter probe tests actual reachability from a different port, regardless of self-declared NAT type. Startup logs now report public (v4 only), public (v6 only), or public (v4+v6).

NAT combination matrix

Side ASide BResult
addr-restricted, EIMaddr-restricted, EDMBasic hole punch
port-restricted, EIMaddr-restricted, EDMA scans to find+open port; B punches A’s stable port regularly
addr-restricted, EDMport-restricted, EDMB scans to find+open port; A waits then responds
port-restricted, EDMport-restricted, EDMBoth scan+punch alternately
addr-restricted, EIMaddr-restricted, EIMBasic hole punch
port-restricted, EIMaddr-restricted, EIMBasic hole punch
addr-restricted, EDMport-restricted, EIMB scans to find+open port; A punches B’s stable port regularly
port-restricted, EDMport-restricted, EIMB scans to find+open port; A punches B’s stable port regularly

Key insight: if both sides have Open (address-restricted) filtering, scanning is never needed — should_try_scanning() returns false and basic hole punch handles it.

Role-based scanning protocol

Each side independently determines its role based on its own NAT profile:

The scanner opens ports on its own firewall. The other side’s periodic punch (one every 2s to the scanner’s observed address) checks if the scanner has opened a port matching the puncher’s actual port. For both-EDM pairs, both sides scan and punch simultaneously.

Scan parameters

Why 5-minute scan duration is acceptable

The cost is time, not resources (~20 in-flight at any time, ~100 probes/sec). For connections that would otherwise be impossible (both EDM + port-restricted), accepting a longer setup time is far better than giving up entirely. Most successful connections resolve within the first 40 seconds (±2000 port range).

Design principle: This protocol eliminates the need for full relay in virtually all NAT scenarios. Session relay remains opt-in only — it is never used as an automatic fallback. The scanning approach respects the user’s intent that peers communicate directly whenever physically possible.

11. UPnP Port Mapping

Status: Complete

Purpose

UPnP (Universal Plug and Play) allows a node to request its home router to forward an external port to its local QUIC port. This makes the node directly reachable from the internet without hole punching — any peer with the external address can connect immediately. This dramatically improves connection success rates for desktop nodes on home networks.

Startup flow

bind Endpoint → attempt UPnP mapping (2s timeout) → store external addr → bootstrap
  1. Discover gateway: Search for UPnP/NAT-PMP gateway with a 2-second timeout. If no gateway found, proceed without — do not block startup.
  2. Request mapping: Map both UDP and TCP for the local QUIC port to the same external port (or next available). UDP is required for QUIC (existing). TCP enables HTTP post delivery (see Section 25). Both use the same external port number. If the router supports one but not the other, accept the partial mapping gracefully — QUIC connectivity is not affected by TCP mapping failure. Request lease TTL of 3600s.
  3. Store external address: The resulting external SocketAddr is stored alongside iroh's observed addresses. It feeds into N+10 identification, InitialExchange, anchor registration, and all peer address advertisements.
  4. Log result: Clearly log whether UPnP succeeded, failed, or was unavailable. This is critical for diagnosing connectivity issues.

Lease renewal cycle (every 2700s / 45 min)

UPnP mappings have a TTL (typically 3600s but varies by router). A renewal loop runs every 45 minutes to refresh the mapping before it expires. If renewal fails, the external address is removed from advertisements and the node falls back to hole punch / relay paths gracefully.

Shutdown

Explicitly release the UPnP mapping on clean shutdown. Routers have finite mapping tables — releasing is good citizenship. Tauri's shutdown hook handles this.

Integration with existing address logic

The UPnP external address is treated the same as any other address the node knows about. It feeds into:

Why this matters for mobile

Mobile devices on cellular networks cannot use UPnP (carrier NAT doesn't expose it). However, if the peers they're trying to reach (especially desktop nodes and anchors) have UPnP mappings, those peers become directly reachable from the phone without hole punching. The phone doesn't need UPnP — the other side does.

Honest limitations

LimitationImpact
UPnP disabled on routerSome ISPs ship routers with UPnP off. Mapping silently fails, fallback to hole punch.
Double NATISP modem + user router: mapping reaches inner router but not outer. Partial help at best.
Cellular networksNo UPnP at all. This is purely a desktop/home-network feature.
Carrier-grade NAT (CGNAT)ISP shares one public IP across many customers. UPnP maps to the ISP's NAT, not the internet. Same as double NAT.
Design principle: UPnP is a best-effort enhancement that improves direct connection reliability for the common case. It is not a dependency. The hole punch + relay fallback chain already handles all failure cases — UPnP just reduces how often you fall back to them.

UPnP nodes are anchors

A node with a successful UPnP mapping is directly reachable from the internet — which is the only thing that makes an anchor an anchor. When UPnP mapping succeeds, the node self-declares as an anchor (is_anchor = true). Other peers will add it to their known_anchors table, providing diverse bootstrap paths back into the network.

When the UPnP mapping is lost (lease renewal fails, shutdown), the node reverts to non-anchor. Peers that stored it as an anchor will naturally age it out via last_seen — LIFO ordering means stale anchors drop to the bottom. The 7-day post-bootstrap cleanup probes stale entries and removes failures. No special cleanup needed beyond the existing anchor infrastructure.

This means any desktop on a home network with UPnP-capable router becomes a potential bootstrap point for the network, dramatically increasing the number of available anchors without any manual server deployment.

Implementation

Crate: igd-next (async support, well-maintained fork of igd). Implementation lives in network.rs alongside the iroh Endpoint — UPnP mapping is an Endpoint concern, not a connection concern.

12. LAN Discovery

Status: Planned

iroh's mDNS address lookup broadcasts peer presence on the local network via multicast DNS (service name "irohv1", backed by the swarm-discovery crate). Currently this is configured as a passive address resolver — if we already know a peer's NodeId, mDNS can resolve its LAN address. But mDNS also discovers unknown peers on the same network, and iroh exposes this via MdnsAddressLookup::subscribe().

Discovery flow

  1. Hold the mDNS handle: Build MdnsAddressLookup explicitly (not via the endpoint builder) so we retain a clone for subscribing.
  2. Spawn a LAN scan loop: Call mdns.subscribe().await to get a stream of DiscoveryEvent::Discovered and DiscoveryEvent::Expired events.
  3. On discovery: Extract NodeId + LAN addresses from the event. If not already connected, initiate a direct connection + initial exchange. Register as a LAN session (a keep-alive session tagged as local).
  4. On expiry: Clean up the LAN session. Peer left the network or powered off.

LAN sessions

LAN peers are special: zero-cost bandwidth, sub-millisecond latency, and very likely someone you know (same household/office). They deserve their own treatment beyond regular mesh or session slots:

Design rationale

Today, two distsoc devices on the same WiFi network can only find each other if they happen to share a peer that reports them in N2. This is absurd — they're on the same network segment. LAN discovery turns mDNS from a passive address resolver into an active peer source, exploiting the fact that local bandwidth is essentially unlimited.

The keep-alive + relay pattern means a household with one well-connected desktop and several phones creates its own mini-mesh: the desktop provides anchor-like connectivity, the phones stay connected through it, and everyone syncs instantly over the LAN even when the internet connection drops.

Implementation note: iroh's MdnsAddressLookup::subscribe() returns a Stream<DiscoveryEvent>. The DiscoveryEvent::Discovered variant includes EndpointInfo with NodeId + IP addresses. Custom user_data can be set via endpoint.set_user_data_for_address_lookup() to embed distsoc-specific metadata (e.g., display name) in the mDNS TXT record.

13. Worm Search

Status: Complete

Used at step 4 of connect_by_node_id, after N2/N3 resolution fails.

Algorithm

  1. Build needles: target NodeId + target's N+10 (up to 10 preferred peers from their profile/cached N+10)
  2. Local check: Search own connections + N2/N3 for any of the 11 needles. Also check local storage, CDN downstream tree, and blob store for any requested post/blob content.
  3. Burst (500ms timeout): Send WormQuery{ttl=0} (0x60) to all mesh peers in parallel. Each peer checks their local connections + N2/N3, plus local storage and CDN tree for post/blob content.
  4. Nova (1.5s timeout): Each burst response includes a random "wide referral" — an N2 peer. Connect to those referrals and send WormQuery{ttl=1}. The referred peer does its own 101-burst (fans out to all its mesh peers with ttl=0). This reaches ~10K nodes with only ~202 relay hops, keeping network pressure low by expanding one hop at a time rather than flooding.
  5. Total timeout: 3 seconds for the entire search.

Content search

WormQuery carries optional post_id and blob_id fields, enabling unified search for nodes, posts, and blobs in a single query. Each peer checks:

WormResponse carries post_holder and blob_holder fields alongside the existing node search results. A content hit (post or blob holder found) is treated as a successful response even without a node match.

The CDN layer is the key multiplier: each node's downstream tree can cover hundreds of posts across dozens of hosts, giving every peer thousands of "I know where that is" answers. Combined with social layer knowledge, even a 202-hop nova covers enormous content space.

PostFetch (0xD4/0xD5)

Lightweight single-post retrieval after worm search identifies a holder. Opens a bi-stream to the holder and requests one post by ID. Much lighter than full PullSync — no follow filtering, no batch processing, just the target post.

Dedup & cooldown

MechanismWindowPurpose
seen_worms10sPrevents loops during fan-out
Miss cooldown5 min (in DB)Prevents repeated searches for unreachable targets

14. Preferred Peers

Status: Complete

Negotiation (MeshPrefer, 0xB3)

Properties

15. Social Routing

Status: Complete

Caches addresses for follows and audience members, separate from mesh connections.

social_routes table

FieldPurpose
node_idThe social contact's NodeId
nplus10Their N+10 (NodeId + 10 preferred peers)
addressesTheir known IP addresses
peer_addressesTheir N+10 contacts (PeerWithAddress list)
relationFollow / Audience / Mutual
statusOnline / Disconnected
last_connected_msWhen we last connected
reach_methodDirect / Relay / Indirect
preferred_tree~100 NodeIds for relay tree

Wire messages

CodeNameStreamPurpose
0x70SocialAddressUpdateUniSent when a social contact's address changes or they reconnect
0x71SocialDisconnectNoticeUniSent when a social contact disconnects
0x72SocialCheckinBiKeepalive with address + N+10 updates

Reconnect watchers

reconnect_watchers table: when peer A asks about disconnected peer B, A is registered as a watcher. When B reconnects, A gets a SocialAddressUpdate notification. Watchers pruned after 30 days. Low priority — daily check frequency for watchers older than 7 days.

Social route lifecycle

16. Keep-Alive Sessions

Status: Planned

Purpose

When the mesh 101 doesn't provide <N4 access to all the nodes we need for social and file operations, keep-alive sessions bridge the gap. These are long-lived connections that participate in N2/N3 routing but are not part of the mesh 101.

Social/File connectivity check (every 60s)

Periodically check whether we can reach every node we need. A node is considered reachable if either:

Only when neither condition is met do we open a keep-alive session. With UPnP auto-anchors (see Section 11) scattered throughout the network, the odds of an anchor being within N2 of any given peer increase significantly, reducing the number of keep-alive sessions needed.

Nodes to check:

For any node whose N+10 is NOT reachable within N3, open a keep-alive session to the closest available node in their N+10 (or to them directly if possible). This ensures we can always find and reach our social and file contacts without worm search.

Keep-alive session behavior

Practical ceilings

PlatformCeilingBinding constraint
Desktop~300–500Routing diff broadcast overhead — NodeListUpdate to all sessions every 120s. Memory and connection count are not the bottleneck.
Mobile~25–50Battery (radio wake-ups per heartbeat cycle) and OS background restrictions (iOS/Android will kill background sockets).

Mobile priority stack

When approaching the mobile ceiling, keep-alive sessions are prioritized:

  1. DMs last 30 min — active conversations take highest priority
  2. Follows — people you follow
  3. Audience — people following you
  4. File peers — upstream/downstream blob hosts

Lower-priority sessions are closed first to make room.

Hysteresis

Don't open a keep-alive session for a contact who just barely fell outside N3. Wait for persistent unreachability — the contact must be absent from N1/N2/N3 for multiple consecutive connectivity checks (e.g., 3 checks = 3 minutes) before opening a keep-alive. This prevents churn from nodes that transiently appear and disappear at the N3 boundary.

Reject + redirect

When a node is at its keep-alive session capacity (50% of total sessions), it refuses new keep-alive requests with a redirect — offering a random N2 node that also has <N4 access to the target. Same pattern as mesh RefuseRedirect but for the keep-alive pool. The requester tries the suggested peer instead.

Cross-layer benefit

Keep-alive sessions from the social and file layers feed N2/N3 entries back into the mesh layer. A social keep-alive to a friend's preferred peer might provide N2 entries that help the mesh growth loop. Similarly, a file keep-alive to an upstream host might provide access to nodes the mesh has never seen. The three layers compound each other's reach.

17. Content Propagation

Intent

"Attention creates propagation": when you view something, you cache it. The cache is optionally offered for serving. Hot content spreads naturally through demand. Cold content decays unless intentionally hosted.

The CDN vision: every file by author X carries an author manifest with the author's N+10 and recent post list. If you hold any file by author X, you passively know X's recent posts and can find X through their N+10.

Status: Partial

Passive discovery via neighborhood diffs

Passive file-chain propagation is enabled through BlobHeader neighborhood diffs. Every blob header carries the author's 25+25 post neighborhood (25 previous + 25 following). When a host receives a BlobHeaderDiff (0x96), they learn about the author's newer posts without explicit subscription. Hosts of old content are pulled toward new content by the same author naturally — attention creates propagation.

Remaining gaps

GapImpact
N+10 not yet in file headersBlob headers should include author N+10, upstream N+10, and downstream N+10s. Currently only AuthorManifest travels with blobs.
No "fetch from any peer who has it"Blobs are fetched from specific peers. No content-addressed routing ("who has blob X?").

18. Files & Storage

Blob storage Complete

PropertyValue
CID formatBLAKE3 hash of blob data (32 bytes, hex-encoded)
Filesystem path{data_dir}/blobs/{hex[0..2]}/{hex} (256 shards)
Metadata tableblobs (cid, post_id, author, size_bytes, created_at, last_accessed_at, pinned)
Max blob size10 MB
Max attachments per post4

Blob content immutability

Blob data is BLAKE3-addressed — the CID is the hash of the content. This means blob content is immutable by definition. Any mutable metadata (neighborhood, host lists, signatures) MUST be stored separately in a BlobHeader. Inline mutable headers are architecturally incompatible with content addressing.

BlobHeader Planned

Formal mutable structure replacing/extending CdnManifest. Stored and transmitted separately from blob data.

BlobHeader {
    cid,                    // BLAKE3 hash of blob content
    author_nplus10,         // Author's N+10 (NodeId + 10 preferred peers)
    author_recent_posts,    // 25 previous + 25 following PostIds (neighborhood)
    upstream_nplus10,       // Upstream file source's N+10 (if not author)
    downstream_hosts,       // Up to min(100, floor(170MB / blob_size)) downstream hosts
    author_signature,       // ed25519 signature over author fields
    host_signature,         // ed25519 signature by current host
    updated_at,             // Timestamp of last header update
}

Blob transfer flow (0x90/0x91)

  1. Requester sends BlobRequest { cid, requester_addresses }
  2. Host checks local BlobStore:
    • Has blob: Return base64-encoded data + CDN manifest + file header (N+10s, recent posts). Try to register requester as downstream (max 100). If full, return existing downstream as redirect candidates.
    • No blob: Return found: false
  3. Requester verifies CID, stores blob locally, records upstream in blob_upstream table. Updates Self Last Encounter for the author based on file header.

CDN hosting tree Complete

Blob eviction Complete

priority = pin_boost + (relationship * heart_recency * freshness / (peer_copies + 1))
FactorCalculation
pin_boost1000.0 if pinned, else 0.0. Own blobs auto-pinned.
relationship5.0 (us), 3.0 (mutual follow+audience), 2.0 (follow), 1.0 (audience), 0.1 (stranger)
heart_recencyLinear decay over 30 days: max(0, 1 - age/30d)
freshness1 / (1 + post_age_days)
peer_copiesKnown replica count (from post_replicas, only if < 1 hour old)

Pin modes Planned

The CDN is delivery infrastructure, not storage. Authors own durability. Pinning extends content in the local delivery pool — it is not a network obligation.

ConceptStatus
Anchor pin vs Fork pinNot started. Anchor pin = host the original (author retains control). Fork pin = independent copy (you become key owner).
Personal vaultNot started. Private durability for saved/pinned items.

19. Sync Protocol

Wire format

[1 byte: MessageType] [4 bytes: length (big-endian)] [length bytes: JSON payload]

Max payload: 16 MB. ALPN: itsgoin/3.

Pull sync: social + file layers, not mesh

v0.2.0 change: Pull sync pulls posts from social layer peers (follows, audience) and upstream file peers, NOT from mesh peers. Mesh connections exist for routing diversity, not content. This separates infrastructure from content flow.

Self Last Encounter: For each peer we sync with, we track the timestamp of our last successful sync. When Self Last Encounter ages beyond 3 hours, a pull sync is triggered. Self Last Encounter is updated to the newer of: (a) what's currently stored, or (b) the "file last update" timestamp from file headers received during blob transfers. Since file headers include the author's recent post list, downloading a blob from any peer hosting that author's content can update Self Last Encounter for the author.

Pull sync filtering

Message types (41 total)

HexNameStreamPurpose
0x01NodeListUpdateUniIncremental N1/N2 diff broadcast
0x02InitialExchangeBiFull state exchange on connect
0x03AddressRequestBiResolve NodeId → address via reporter
0x04AddressResponseBiAddress resolution reply
0x05RefuseRedirectUniRefuse mesh + suggest alternative
0x40PullSyncRequestBiRequest posts filtered by follows
0x41PullSyncResponseBiRespond with filtered posts
0x42PostNotificationUniLightweight "new post" push to social contacts
0x43PostPushUniDirect encrypted post delivery to recipients
0x44AudienceRequestBiRequest audience member list
0x45AudienceResponseBiAudience list reply
0x50ProfileUpdateUniPush profile changes
0x51DeleteRecordUniSigned post deletion
0x52VisibilityUpdateUniRe-wrapped visibility after revocation
0x60WormQueryBiBurst/nova search for nodes, posts, or blobs beyond N3
0x61WormResponseBiWorm search reply (node + post_holder + blob_holder)
0x70SocialAddressUpdateUniSocial contact address changed
0x71SocialDisconnectNoticeUniSocial contact disconnected
0x72SocialCheckinBiKeepalive + address + N+10 update
0x90BlobRequestBiFetch blob by CID
0x91BlobResponseBiBlob data + CDN manifest + file header
0x92ManifestRefreshRequestBiCheck manifest freshness
0x93ManifestRefreshResponseBiUpdated manifest reply
0x94ManifestPushUniPush updated manifests downstream
0x95BlobDeleteNoticeUniCDN tree healing on eviction
0xA0GroupKeyDistributeUniDistribute circle group key to member
0xA1GroupKeyRequestBiRequest group key for a circle
0xA2GroupKeyResponseBiGroup key reply
0xB0RelayIntroduceBiRequest relay introduction
0xB1RelayIntroduceResultBiIntroduction result with addresses
0xB2SessionRelayBiSplice bi-streams (own-device default)
0xB3MeshPreferBiPreferred peer negotiation
0xB4CircleProfileUpdateUniEncrypted circle profile variant
0xC0AnchorRegisterUniRegister with anchor (bootstrap/recovery only)
0xC1AnchorReferralRequestBiRequest peer referrals from anchor
0xC2AnchorReferralResponseBiReferral list reply
0xC3AnchorProbeRequestBiA → B → C: test cold reachability of address
0xC4AnchorProbeResultBiC → A (success) or C → B → A (failure)
0xD0BlobHeaderDiffUniIncremental engagement update (reactions, comments, policy, thread splits)
0xD1BlobHeaderRequestBiRequest full engagement header for a post
0xD2BlobHeaderResponseBiFull engagement header response (JSON)
0xD3PostDownstreamRegisterUniRegister as downstream for a post (CDN tree entry)
0xD4PostFetchRequestBiRequest a single post by ID from a known holder
0xD5PostFetchResponseBiSingle post response (SyncPost or not-found)
0xD6TcpPunchRequestBiAsk holder to punch TCP toward browser IP
0xD7TcpPunchResultBiPunch result + HTTP address for redirect
0xE0MeshKeepaliveUni30s connection heartbeat

Engagement propagation

Reactions, comments, and policy changes propagate via BlobHeaderDiff (0xD0) through the CDN tree:

20. Encryption

Envelope encryption (1-layer) Complete

  1. Generate random 32-byte CEK (Content Encryption Key)
  2. Encrypt content: ChaCha20-Poly1305(plaintext, CEK, random_nonce)
  3. Store as: base64(nonce[12] || ciphertext || tag[16])
  4. For each recipient (including self):
    • X25519 DH: our_ed25519_private (as X25519) * their_ed25519_public (as montgomery)
    • Derive wrapping key: BLAKE3_derive_key("distsoc/cek-wrap/v1", shared_secret)
    • Wrap CEK: ChaCha20-Poly1305(CEK, wrapping_key, random_nonce) → 60 bytes per recipient

Visibility variants

VariantOverheadAudience limit
PublicNoneUnlimited
Encrypted { recipients }~60 bytes per recipient~500 (256KB cap)
GroupEncrypted { group_id, epoch, wrapped_cek }~100 bytes totalUnlimited (one CEK wrap for the group)

PostId integrity

PostId = BLAKE3(Post) covers the ciphertext, NOT the recipient list. Visibility is separate metadata. This means visibility can be updated (re-wrapped) without changing the PostId.

Group keys (circles) Complete

Three-tier access revocation

Three levels of revocation, chosen based on threat level:

Tier 1: Remove Going Forward (default)

Revoked member is excluded from future posts automatically. They retain access to anything they already received. This is the default behavior when removing a circle member — no special action needed.

When to use: Normal membership changes. Someone leaves a group, you unfollow someone. The common case.

Cost: Zero. Just stop including them in future recipient lists.

Tier 2: Rewrap Old Posts (cleanup)

Same CEK, re-wrap for remaining recipients only. The revoked member can no longer unwrap the CEK even if they later obtain the ciphertext. Propagate updated visibility headers via VisibilityUpdate (0x52).

When to use: Revoked member never synced the post (common with pull-based sync — encrypted posts only sent to recipients). You want to clean up access lists.

Cost: One WrappedKey operation per remaining recipient, no content re-encryption.

Tier 3: Delete & Re-encrypt (nuclear)

Generate new CEK, re-encrypt content, wrap new CEK for remaining recipients, push delete for old post ID, repost with new content but same logical identity. Well-behaved nodes honor the delete.

When to use: Revoked member already has the ciphertext and could unwrap the old CEK. Only for content that poses an actual danger/risk if the revoked member retains access. Recommended against in most cases.

Cost: Full re-encryption + delete propagation + new post propagation. Heavy.

Trust model: The app honors delete requests from content authors by default. A modified client could ignore deletes, but this is true of any decentralized system. For legal purposes: the author has proof they issued the delete and revoked access.

Private profiles (Phase D-4) Complete

Different profile versions per circle, encrypted with the circle/group key. A peer sees the profile version for the most-privileged circle they belong to. CircleProfileUpdate (0xB4) wire message. Public profiles can be hidden (public_visible=false strips display_name/bio).

21. Delete Propagation

Status: Complete

Delete records

DeleteRecord { post_id, author, timestamp_ms, signature } — ed25519-signed by author. Stored in deleted_posts table (INSERT OR IGNORE). Applied: DELETE from posts table WHERE post_id AND author match.

Propagation paths

  1. InitialExchange: All delete records exchanged on connect
  2. DeleteRecord message (0x51): Pushed via uni-stream to connected peers on creation
  3. PullSync: Included in responses for eventual consistency

CDN cascade on delete

  1. Send BlobDeleteNotice to all downstream hosts (with our upstream info for tree healing)
  2. Send BlobDeleteNotice to upstream
  3. Clean up blob metadata, manifests, downstream/upstream records
  4. Delete blob from filesystem

22. Social Graph Privacy

Status: Complete

Known temporary weakness: An observer who diffs your N1 share over time can infer your social contacts (they're the stable members while mesh peers rotate). This will be addressed when CDN file-swap peers are added to N1, making the stable set larger and harder to distinguish.

23. Multi-Device Identity

Status: Planned

Concept

Multiple devices share the same identity key (ed25519 keypair, same NodeId). All devices ARE the same node from the network's perspective. Posts from any device appear as the same author.

Device identity

Each device also generates a unique device identity (separate ed25519 keypair). This device-specific key is used to:

Setup

Export identity.key from one device, import on another. The device identity is generated automatically on each device. Once two devices share an identity key, they can discover each other through normal network routing (same NodeId appears at multiple addresses).

24. Phase 2: Reciprocity (Reconsidered)

Status: Reconsidered

The original Phase 2 design centered on hosting quotas (3x rule), chunk audits, and tit-for-tat QoS. On reflection, the attention-driven delivery model makes quota enforcement unnecessary. The CDN is a delivery amplifier, not a storage system — hot content propagates through demand, cold content decays. Authors are responsible for their own content durability.

Tit-for-tat QoS solves the wrong problem: it optimizes for fairness in a storage-obligation model that no longer exists. What matters is that the delivery network functions efficiently, which it does through natural attention dynamics.

If reciprocity mechanisms are needed at scale, they should address delivery quality (bandwidth, latency, uptime) rather than storage quotas. This remains an open design area.

25. HTTP Post Delivery

Intent

Every ItsGoin node that is publicly reachable can serve its cached public posts directly to browsers over HTTP — no extra infrastructure, no additional dependencies, no new binary. The same QUIC UDP port used for app traffic is accompanied by a TCP listener on the same port number. UDP goes to the QUIC stack as always. TCP goes to a minimal raw HTTP/1.1 handler baked into the binary.

This makes every publicly-reachable node a browser-accessible content endpoint, enabling share links that deliver content peer-to-browser without routing any post bytes through itsgoin.net.

Dual listener architecture

<port>/UDP  →  QUIC (existing app protocol)
<port>/TCP  →  HTTP/1.1 (new, read-only, single route)

Both listeners bind on the same port. The OS routes UDP and TCP to separate sockets — no conflict, no protocol ambiguity.

HTTP handler

The handler is intentionally minimal — implemented with raw tokio::net::TcpListener, no HTTP crate, no new dependencies. Approximately 150–200 lines of Rust.

Single valid route: GET /p/<postid_hex> HTTP/1.1

Response: Minimal HTML page containing the post content with a small footer:

<footer>
  This post is on the ItsGoin network — content lives on people's devices,
  not servers. <a href="https://itsgoin.com">Get ItsGoin</a>
</footer>

The footer HTML is a static string constant compiled into the binary (~2KB). No template engine, no dynamic footer generation.

Security constraints

ConcernMitigation
Connection exhaustionHard cap: 20 concurrent HTTP connections. New connections over the cap are immediately closed. No queue, no wait.
Slow HTTP attacks5-second read timeout for complete request headers. Exceeded → hard close.
Content enumerationIdentical response (hard close) for “post not found” and “post not public.” No timing oracle, no distinguishable error codes.
Malformed requestsHard close only. No error response.
Encrypted contentNever served. Public visibility check is mandatory before any response.

Which nodes serve HTTP

A node serves HTTP only if it is publicly TCP-reachable:

302 load shedding via CDN tree

When a node is overwhelmed (at the 20-connection cap) or chooses to redirect:

  1. Query post_downstream table for the requested postid
  2. Filter downstream hosts to those with a known public address (IPv6 or UPnP-mapped IPv4)
  3. 302 → http://[their_address]:<port>/p/<postid>

The receiving node applies the same logic recursively if needed. This mirrors the app-layer CDN tree behavior at the HTTP layer — the same attention-driven propagation model, the same tree structure, now accessible to browsers.

Binary size impact

Zero new dependencies. Negligible compiled size delta (~10–20KB). No App Store size concerns. No install size impact for existing users.

Appendix A: Timeout Reference

ConstantValuePurpose
MESH_KEEPALIVE_INTERVAL30sPing to prevent zombie detection
ZOMBIE_TIMEOUT600s (10 min)No activity → dead connection
SESSION_IDLE_TIMEOUT300s (5 min)Reap idle interactive sessions (NOT keep-alive)
SELF_LAST_ENCOUNTER_THRESHOLD10800s (3 hours)Trigger pull sync when last encounter exceeds this
QUIC_CONNECT_TIMEOUT15sDirect connection establishment
HOLE_PUNCH_TIMEOUT30sOverall hole punch window
HOLE_PUNCH_ATTEMPT2sPer-address attempt within window
RELAY_INTRO_TIMEOUT15sRelay introduction request
RELAY_PIPE_IDLE120s (2 min)Relay pipe idle before close
RELAY_COOLDOWN300s (5 min)Per-target relay cooldown
RELAY_INTRO_DEDUP30sDedup intro forwarding
WORM_TOTAL_TIMEOUT3sEntire worm search
WORM_FAN_OUT_TIMEOUT500msPer-peer fan-out query
WORM_BLOOM_TIMEOUT1.5sBloom round to wide referrals
WORM_DEDUP10sIn-flight worm dedup
WORM_COOLDOWN300s (5 min)Miss cooldown before retry
REFERRAL_DISCONNECT_GRACE120s (2 min)Anchor keeps peer in referral list after disconnect
N2/N3_STALE_PRUNEImmediate on disconnect + 7 day fallbackRemove reach entries tagged to disconnected peers; age-based fallback for stragglers
N2/N3_STARTUP_SWEEPOn bootRemove all N2/N3 entries tagged to peers not in current mesh
PREFERRED_UNREACHABLE_PRUNE7 daysRelease preferred slot (must re-negotiate MeshPrefer on reconnect)
RECONNECT_WATCHER_EXPIRY30 daysLow-priority reconnect awareness; daily check after 7 days
GROWTH_LOOP_TIMER60sPeriodic growth loop check
CONNECTIVITY_CHECK60sSocial/file <N4 access check for keep-alive sessions
DM_RECENCY_WINDOW14400s (4 hours)DM'd nodes included in connectivity check
UPNP_DISCOVERY_TIMEOUT2sGateway discovery on startup (do not block)
UPNP_LEASE_RENEWAL2700s (45 min)Refresh port mapping before TTL expiry
ANCHOR_PROBE_INTERVAL1800s (30 min)Periodic re-probe while anchor-declared
ANCHOR_PROBE_TIMEOUT15sCold connect attempt by witness
ANCHOR_STALE_THRESHOLD7 daysPost-bootstrap cleanup probes known_anchors older than this

Appendix B: Design Constraints

ConstraintValueNotes
Visibility metadata cap256 KBApplies to WrappedKey lists in encrypted posts
Max recipients (per-recipient wrapping)~500256KB / ~500 bytes JSON per WrappedKey
Max blob size10 MBPer attachment
Max attachments per post4
Public post encryption overheadZeroNo WrappedKeys, no sharding, unlimited audience
Max payload (wire)16 MBLength-prefixed JSON framing
Mesh slots101 (Desktop) / 15 (Mobile)Preferred + non-preferred, no local/wide distinction
Keep-alive session cap50% of session capacityEnsures interactive sessions remain available
Keep-alive ceiling (desktop)~300–500Binding constraint: routing diff broadcast overhead
Keep-alive ceiling (mobile)~25–50Binding constraint: battery + OS background restrictions
mesh_blacklist table{ node_id }Targeted mutual stranger relationships for testing/diversity
known_anchors table{ node_id, addresses, last_seen }LIFO ordered, 7-day stale cleanup via probe

Appendix C: Implementation Scorecard

AreaStatus
Mesh connection architecture (101 slots, preferred/non-preferred)Complete
N1/N2/N3 knowledge layersComplete
Growth loop (60s timer + reactive on N2/N3)Partial (timer exists, reactive trigger needs update)
Preferred peers + bilateral negotiationComplete
N+10 identificationPartial (preferred peers exist, N+10 not in all headers)
Worm search (nodes + content search for posts/blobs)Complete
Relay introduction + hole punchComplete
Session relay (own-device default)Partial (relay works, own-device restriction not implemented)
Social routing cacheComplete
Three-layer architecture (Mesh/Social/File)Partial (layers exist conceptually, pull sync still uses mesh)
Keep-alive sessionsPlanned
Self Last Encounter sync triggerPlanned
Algorithm-free reverse-chronological feedComplete
Envelope encryption (1-layer)Complete
Group keys for circlesComplete
Three-tier access revocationPartial (Tier 1+2 work, Tier 3 crypto exists but no UI)
Private profiles per circleComplete
Pull-based sync with follow filteringComplete
Push notifications (post/profile/delete)Complete
Blob storage + transferComplete
CDN hosting tree + manifestsComplete
Blob eviction with priority scoringComplete
Anchor bootstrap + referralsComplete
Delete propagation + CDN cascadeComplete
Multi-device identityPlanned
UPnP port mapping (desktop)Complete
NAT type detection (STUN) + hard+hard skipComplete
Advanced NAT traversal (role-based scanning + filter probe)Complete
LAN discovery (mDNS scan + auto-connect)Planned
Content propagation via attentionPartial
BlobHeader separation from blob contentComplete
25+25 neighborhood with HeaderDiff propagationPartial (engagement diffs work, neighborhood diffs planned)
BlobHeaderDiff message (engagement)Complete
Reactions (public + private encrypted)Complete
Comments + author policy enforcementComplete
Engagement sync via BlobHeaderRequest after pull syncComplete
Notification settings (messages/posts/nearby)Complete
Tiered DM polling (recency-based schedule)Complete
Auto-sync on followComplete
Post CDN tree (post_downstream)Complete
Anchor self-verification (reachability probe)Complete
Mutual mesh blacklistPlanned
--max-mesh flag (test affordance)Planned
Audience shardingPlanned
Custom feedsPlanned
HTTP post delivery (TCP listener, single route, load shedding)Planned
Share link generation (postid + author NodeId)Complete
itsgoin.net QUIC proxy handler (on-demand fetch + render)Complete
PostFetch (0xD4/0xD5) single-post retrievalComplete
Universal Links / App Links (itsgoin.net/p/*)Planned
itsgoin.net ItsGoin node (anchor + web handler)Complete
UPnP TCP port mapping alongside UDPPlanned

Appendix D: Critical Path Forward

The highest-impact items, in priority order:

1. Three-layer separation (pull sync from social/file, not mesh)

Implement Self Last Encounter tracking and move pull sync to social + upstream file peers. This is the foundation for the layered architecture.

2. N+10 in all identification

Add N+10 (NodeId + 10 preferred peers) to self-identification, post headers, blob headers, and social routes. Dramatically improves findability.

3. Keep-alive sessions

Implement social/file connectivity check and keep-alive sessions for peers not reachable within N3. Cross-layer N2/N3 routing from keep-alive sessions.

4. UPnP port mapping

Best-effort NAT traversal for desktop/home networks. Makes nodes directly reachable without hole punching. External address feeds into N+10 and all peer advertisements. Especially impactful for mobile-to-desktop connectivity.

5. Growth loop reactive trigger

Fire growth loop immediately on N2/N3 receipt until 90% full. Currently only timer-based.

6. Multi-device identity

Same identity key across devices with device-specific identity for self-discovery and own-device relay.

7. File-chain propagation

Make AuthorManifest with N+10 and recent posts work passively. Enable discovery of new content from any blob holder.

8. Share links + HTTP post delivery

The viral growth mechanism. Every share becomes a product demo for non-app users and opens natively for app users. Dependencies in order:

  1. UPnP TCP mapping (small addition to existing UPnP code)
  2. Raw TCP HTTP listener (150–200 lines, zero new dependencies)
  3. Host list generation at share time (query post_downstream, encode, embed in URL)
  4. itsgoin.net redirect handler + known_good DB (server-side, independent of app releases)
  5. itsgoin.net loading screen
  6. Universal Links / App Links registration (static JSON files + Tauri config)
  7. itsgoin.net ItsGoin node (run the binary, configure as anchor)

Steps 4–7 are itsgoin.net infrastructure, deployable independently of app releases. Steps 1–3 ship in the app. Step 6 requires an app store release to activate but can be deployed to itsgoin.net ahead of time.

9. Own-device relay restriction

Restrict relay pipes to own-device by default, opt-in for relaying for others.

Appendix E: Features Designed But Not Built

FeatureSourceStatus
Three-layer pull sync (social/file, not mesh)v0.2.0 designPlanned
N+10 in all identification & headersv0.2.0 designPlanned
Keep-alive sessionsv0.2.0 designPlanned
Multi-device identityv0.2.0 designPlanned
Own-device relay restrictionv0.2.0 designPlanned
Self Last Encounter sync triggerv0.2.0 designPlanned
Anchor pin vs Fork pin distinctionproject discussion.txtPlanned
Audience sharding for groups > 250ARCHITECTURE.mdPlanned
Repost as first-class post typeproject discussion.txtPlanned
Custom feeds (keyword/media/family rules)project discussion.txtPlanned
Bounce routing (social graph as routing)ARCHITECTURE.mdPlanned
Reactions (public + private encrypted)v0.2.11Complete
RefuseRedirect handling (retry suggested peer)protocol.rsPartial (send-only)
Profile anchor list used for discoveryARCHITECTURE.mdPartial (field exists)
File-chain propagation (passive post discovery)DesignPartial (manifest exists)
Anchor-to-anchor gossip/registryObserved gapPlanned
BlobHeader as separate mutable structurev0.2.11Complete
BlobHeaderDiff incremental propagation (engagement)v0.2.11Complete
Post export/backup tooling (author durability)v0.2.4 designPlanned
Anchor reachability probe (self-verification)v0.2.6Complete
Mutual mesh blacklistv0.2.4 designPlanned
--max-mesh flag (test topology control)v0.2.4 designPlanned
Relay-assisted port scanning (advanced NAT traversal)v0.2.6Complete

Appendix F: File Map

crates/core/
  src/
    lib.rs          — module registration, parse_connect_string, parse_node_id_hex
    types.rs        — Post, PostId, NodeId, PublicProfile, PostVisibility, WrappedKey,
                      VisibilityIntent, Circle, PeerRecord, Attachment
    content.rs      — compute_post_id (BLAKE3), verify_post_id
    crypto.rs       — X25519 key conversion, DH, encrypt_post, decrypt_post, BLAKE3 KDF
    blob.rs         — BlobStore, compute_blob_id, verify_blob
    storage.rs      — SQLite: posts, peers, follows, profiles, circles, circle_members,
                      mesh_peers, reachable_n2/n3, social_routes, blobs, group_keys,
                      preferred_peers, known_anchors; auto-migration
    protocol.rs     — MessageType enum (39 types), ALPN (itsgoin/3),
                      length-prefixed JSON framing, read/write helpers
    connection.rs   — ConnectionManager + ConnHandle/ConnectionActor (actor pattern):
                      mesh QUIC connections (MeshConnection), session connections,
                      slot management, initial exchange, N1/N2 diff broadcast,
                      pull sync, relay introduction. All external access via ConnHandle.
    network.rs      — iroh Endpoint, accept loop, connect_to_peer,
                      connect_by_node_id (7-step cascade), mDNS discovery
    node.rs         — Node struct (ties identity + storage + network), post CRUD,
                      follow/unfollow, profile CRUD, circle CRUD, encrypted post creation,
                      startup cycles, bootstrap, anchor register cycle
    web.rs          — itsgoin.net web handler: QUIC proxy for share links,
                      on-demand post fetch via content search, blob serving
    http.rs         — HTML rendering for shared posts (render_post_html)

crates/cli/
  src/main.rs       — interactive REPL + anchor mode (--bind, --daemon, --web)

crates/tauri-app/
  src/lib.rs        — Tauri v2 commands (38 IPC handlers), DTOs

frontend/
  index.html        — single-page UI: 5 tabs (Feed / My Posts / People / Messages / Settings)
  app.js            — Tauri invoke calls, rendering, identicon generator, circle CRUD
  style.css         — dark theme, post cards, visibility badges, transitions

License

ItsGoin is released under the Apache License, Version 2.0. You may use, modify, and distribute this software freely under the terms of that license.

This is a gift. Use it well.