fix: v0.7.3 — disable EDM scanner, bootstrap batching, stale-anchor prune
Bandwidth + bootstrap hardening on top of v0.7.2. Wire-compatible with v0.7.0/v0.7.1/v0.7.2; no protocol changes. EDM port scanner DISABLED - hole_punch_with_scanning() now does only single quick punch + parallel punch over 30s window. The EDM port-scanner branch is gone from the live path because per-probe endpoint.connect() amplifies catastrophically: iroh accumulates every connect() target into a per-endpoint paths set and probes them all under QUIC NAT-traversal in the background. A 100-probes/sec / 5-min scan inserted ~30k paths; iroh probed all of them. Observed at 22MB/s outbound from one client — DoS-grade. - Scanner body preserved as edm_port_scan_disabled_v0_7_3() with all supporting helpers (PortWalkIter, scanner_semaphore, role-based scanner/puncher split, found_tx/found_rx channel pattern, deadline + tokio::select! orchestration) marked #[allow(dead_code)]. Refactor target: replace per-probe endpoint.connect() with raw socket.send_to() so probes don't enter iroh's path store. Bootstrap probing batched - New probe_anchors_batched() helper: 3 anchors in flight at a time, 2s stagger between batch dispatches, 10s per-anchor timeout, no abort on success. First success unblocks the bootstrap flow; remaining probes continue in background and fill peer connections naturally. - Phase 2 (bootstrap fallback) still only fires when every discovered anchor failed — preserves load-distribution intent. Replaces the sequential 50s+ timeout cascade users observed with old data dirs. Stale-anchor self-pruning - New storage.get_known_anchor_last_seen() and storage.delete_known_anchor(). - maybe_prune_stale_anchor(): when a probe fails AND last_seen_ms > 3 days, delete the entry from known_anchors immediately. Recoverable anchors (failed once, succeeded recently) are preserved. Self-healing for old data dirs whose discovered anchors point to keypairs that rotated months ago. Android close button kills NodeService - New NodeService.stopFromNative() Kotlin static method called via JNI from android_wifi::stop_node_service(). exit_app invokes it on Android before app.exit(0). Previously the button ended the Activity but the foreground service kept networking running. Cosmetic - Power-icon SVG (inline) replaces ⏻ so Android webviews lacking U+23FB don't render a missing-image tofu box. Docs - design.html section 11 rewritten for portmapper (UPnP+NAT-PMP+PCP, v0.7.2) including per-platform contract and bidirectional anchor watcher. - design.html section 10 marks session relay as opt-in (v0.7.2) and EDM scanner as disabled-pending-refactor (v0.7.3). - download.html carries v0.7.3 release notes. - MEMORY.md updated; older v0.7.0/v0.7.1 status sections condensed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
4706e81603
commit
6ef11fa61c
14 changed files with 425 additions and 73 deletions
|
|
@ -474,8 +474,9 @@ FAILURE: C → B → A: AnchorProbeResult { reachable: false }</code></pre
|
|||
<h3>Session relay (relay pipes)</h3>
|
||||
<p>Intermediary splices bi-streams between requester and target. Desktop: max 10 concurrent pipes. Mobile: max 2. Each pipe has a 50MB byte cap and 2-min idle timeout.</p>
|
||||
<div class="note">
|
||||
<strong>v0.2.0 change</strong>: Relay pipes are <strong>own-device-only by default</strong>. A node will only relay traffic between its own devices (same identity key, different device identity). Users can opt in to relaying for others in Settings, but this is not enabled automatically. This prevents nodes from unknowingly burning bandwidth for random peers while still enabling personal multi-device routing.
|
||||
<strong>v0.7.2 change</strong>: Session relay is now <strong>OPT-IN ONLY and DISABLED BY DEFAULT</strong> — including for anchor-mode nodes (servers are most likely to pay for bandwidth). Gated by the <code>relay.session_relay_enabled</code> setting on both <em>serving</em> (<code>can_accept_relay_pipe</code>) and <em>using</em> (the auto-fallback after a failed hole punch). Settings UI exposes the toggle. Hole-punch-failure no longer silently routes a peer-to-peer session through an unrelated third party's bandwidth.
|
||||
</div>
|
||||
<p style="color: var(--text-muted); font-size: 0.85rem;">This rule covers <em>only</em> full byte-piping. Small relay-style signaling/discovery — <code>RelayIntroduce</code> for hole-punch coordination, <code>worm_lookup</code> multi-hop search, N1/N2/N3 share-list exchange — remains always-on; that's not session relay. The anchor's HTTP proxy path (anchor fetches a post via QUIC and serves it back over HTTP) is also not session relay — it's the anchor doing its own QUIC fetch on the browser's behalf.</p>
|
||||
|
||||
<h3>Deduplication & cooldowns</h3>
|
||||
<table>
|
||||
|
|
@ -516,8 +517,15 @@ FAILURE: C → B → A: AnchorProbeResult { reachable: false }</code></pre
|
|||
</table>
|
||||
<p>All hole punch paths use <code>hole_punch_with_scanning()</code> which replaces the former hard+hard skip. NAT profiles (NatMapping + NatFiltering) from InitialExchange determine whether scanning is attempted. Behavioral inference updates filtering classification from connection outcomes.</p>
|
||||
|
||||
<h3>Advanced NAT traversal</h3>
|
||||
<h3>Status: <span class="badge badge-complete">Complete</span></h3>
|
||||
<h3>Advanced NAT traversal (EDM port scanner)</h3>
|
||||
<h3>Status: <span class="badge badge-planned">Disabled in v0.7.3 — refactor pending</span></h3>
|
||||
|
||||
<div class="note">
|
||||
<strong>v0.7.3:</strong> the EDM port scanner is DISABLED. <code>hole_punch_with_scanning()</code> currently does only Step 1 (quick punch to the anchor-observed address) and Step 2 (parallel punch to all known addresses over a 30s window). No port scan.
|
||||
<p style="margin: 0.5rem 0 0 0;"><strong>Why:</strong> iroh's <code>Endpoint</code> accumulates every <code>endpoint.connect()</code> target into a per-endpoint paths set and probes them all in the background under QUIC NAT-traversal. A 100-probes/sec / 5-min scan inserted ~30,000 paths; iroh then probed all of them. Observed at 22MB/s outbound from a single client — DoS-grade.</p>
|
||||
<p style="margin: 0.5rem 0 0 0;"><strong>Refactor target:</strong> replace per-probe <code>endpoint.connect()</code> with raw <code>socket.send_to()</code> on the endpoint's bound UDP socket. The probe still opens a NAT mapping on our side; we just don't ask iroh to manage the path. The original scanner body is preserved as <code>edm_port_scan_disabled_v0_7_3</code> in <code>connection.rs</code>, including <code>PortWalkIter</code>, <code>scanner_semaphore</code>, role-based scanner/puncher split, and the <code>tokio::select!</code> orchestration — refactor against that.</p>
|
||||
<p style="margin: 0.5rem 0 0 0;">The description below documents the <em>intended design</em> the refactor will deliver against.</p>
|
||||
</div>
|
||||
|
||||
<p>NAT "hardness" has two independent dimensions:</p>
|
||||
<ul style="padding-left: 1.25rem; margin: 0.5rem 0; color: var(--text-muted);">
|
||||
|
|
@ -572,28 +580,44 @@ FAILURE: C → B → A: AnchorProbeResult { reachable: false }</code></pre
|
|||
</div>
|
||||
</section>
|
||||
|
||||
<!-- 11. UPnP Port Mapping -->
|
||||
<!-- 11. Port Mapping (UPnP-IGD + NAT-PMP + PCP) -->
|
||||
<section id="upnp">
|
||||
<h2>11. UPnP Port Mapping</h2>
|
||||
<h3>Status: <span class="badge badge-complete">Complete</span></h3>
|
||||
<h2>11. Port Mapping — UPnP-IGD + NAT-PMP + PCP</h2>
|
||||
<h3>Status: <span class="badge badge-complete">Complete (v0.7.2)</span></h3>
|
||||
|
||||
<h3>Purpose</h3>
|
||||
<p>UPnP (Universal Plug and Play) allows a node to request its home router to forward an external port to its local QUIC port. This makes the node <strong>directly reachable from the internet</strong> without hole punching — any peer with the external address can connect immediately. This dramatically improves connection success rates for desktop nodes on home networks.</p>
|
||||
<p>Asks the local gateway router to forward an external port to this node's local QUIC port. A successful mapping makes the node <strong>directly reachable from the internet</strong> without hole punching — any peer with the external address can connect immediately. Three protocols are attempted in parallel; the first router-response wins.</p>
|
||||
|
||||
<h3>Protocols (v0.7.2)</h3>
|
||||
<ul style="padding-left: 1.25rem; margin: 0.5rem 0; color: var(--text-muted);">
|
||||
<li><strong>UPnP-IGD</strong> — long-standing consumer-router default. Discovery via SSDP multicast on 239.255.255.250:1900. Behavior varies; many routers ship with UPnP disabled by default.</li>
|
||||
<li><strong>NAT-PMP</strong> (RFC 6886) — Apple lineage; widespread on routers that ever shipped Bonjour. Unicast to the gateway on UDP/5351.</li>
|
||||
<li><strong>PCP</strong> (RFC 6887) — modern IETF-track successor to NAT-PMP. Unicast on UDP/5351. Supports both IPv4 NAT mapping and IPv6 firewall pinholes. Works on iOS without the multicast networking entitlement.</li>
|
||||
</ul>
|
||||
<p>Implementation uses the <code>portmapper</code> crate (also used by iroh internally). Replaces the v0.7.1 hand-rolled <code>igd-next</code>-only path.</p>
|
||||
|
||||
<h3>Startup flow</h3>
|
||||
<pre><code>bind Endpoint → attempt UPnP mapping (2s timeout) → store external addr → bootstrap</code></pre>
|
||||
<pre><code>bind Endpoint → spawn portmapper Client (UDP) → wait up to 3s for first protocol response → bootstrap (TCP mapping fires in parallel for HTTP serving)</code></pre>
|
||||
<ol style="padding-left: 1.25rem; margin: 0.5rem 0; color: var(--text-muted);">
|
||||
<li><strong>Discover gateway</strong>: Search for UPnP/NAT-PMP gateway with a 2-second timeout. If no gateway found, proceed without — do not block startup.</li>
|
||||
<li><strong>Request mapping</strong>: Map both UDP and TCP for the local QUIC port to the same external port (or next available). UDP is required for QUIC (existing). TCP enables HTTP post delivery (see <a href="#http-delivery">Section 25</a>). Both use the same external port number. If the router supports one but not the other, accept the partial mapping gracefully — QUIC connectivity is not affected by TCP mapping failure. Request lease TTL of 3600s.</li>
|
||||
<li><strong>Store external address</strong>: The resulting external <code>SocketAddr</code> is stored alongside iroh's observed addresses. It feeds into N+10 identification, InitialExchange, anchor registration, and all peer address advertisements.</li>
|
||||
<li><strong>Log result</strong>: Clearly log whether UPnP succeeded, failed, or was unavailable. This is critical for diagnosing connectivity issues.</li>
|
||||
<li><strong>Probe all three protocols in parallel</strong>: portmapper's background service fires UPnP-IGD discovery + NAT-PMP unicast + PCP unicast concurrently. First success wins; failures from the others are absorbed silently.</li>
|
||||
<li><strong>UDP mapping for QUIC</strong>: maps the local QUIC port to an external port. Required for direct inbound. Address feeds N+10 identification, InitialExchange, anchor registration, and peer address advertisements.</li>
|
||||
<li><strong>TCP mapping for HTTP</strong>: separate parallel attempt for HTTP serving (see <a href="#http-delivery">Section 25</a>). Independent of UDP — either can succeed alone. Phones with permissive NAT can serve HTTP directly to browser fetches as of v0.7.2.</li>
|
||||
<li><strong>Per-platform behavior</strong>: All three protocols on desktop. On Android, a WiFi/Ethernet gate skips probing on cellular (no UPnP/PCP gateway exposed by carriers) and a <code>WifiManager.MulticastLock</code> is held for the lifetime of the mapping so UPnP-IGD's SSDP responses actually arrive. On iOS, PCP and NAT-PMP work without the multicast entitlement; UPnP-IGD silently fails until the entitlement is granted.</li>
|
||||
</ol>
|
||||
|
||||
<h3>Lease renewal cycle (every 2700s / 45 min)</h3>
|
||||
<p>UPnP mappings have a TTL (typically 3600s but varies by router). A renewal loop runs every 45 minutes to refresh the mapping before it expires. If renewal fails, the external address is removed from advertisements and the node falls back to hole punch / relay paths gracefully.</p>
|
||||
<h3>Auto-renewal</h3>
|
||||
<p>The <code>portmapper::Client</code> renews leases internally in a background task. No external renewal cycle to schedule. Dropping the <code>PortMapping</code> handle aborts the renewal task and releases the mapping.</p>
|
||||
|
||||
<h3>Bidirectional anchor reachability watcher</h3>
|
||||
<p>A startup-spawned task watches the UDP mapping's reactive external-address channel:</p>
|
||||
<ul style="padding-left: 1.25rem; margin: 0.5rem 0; color: var(--text-muted);">
|
||||
<li><strong>Mapping lost for >5 min</strong> → clear <code>is_anchor</code>. The node stops advertising itself as an anchor at a now-stale external address.</li>
|
||||
<li><strong>Mapping restored</strong> (None → Some) → re-evaluate auto-anchor. On non-mobile devices the anchor flag is set back on so the node re-joins the anchor set without a restart.</li>
|
||||
</ul>
|
||||
<p>Network roams between UPnP-capable WiFi networks self-heal. Mobile devices never auto-anchor regardless — cellular IPs look public but sit behind CGNAT.</p>
|
||||
|
||||
<h3>Shutdown</h3>
|
||||
<p>Explicitly release the UPnP mapping on clean shutdown. Routers have finite mapping tables — releasing is good citizenship. Tauri's shutdown hook handles this.</p>
|
||||
<p>Explicitly release the mapping on clean shutdown. Routers have finite mapping tables — releasing is good citizenship.</p>
|
||||
|
||||
<h3>Integration with existing address logic</h3>
|
||||
<p>The UPnP external address is treated the same as any other address the node knows about. It feeds into:</p>
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue