fix: v0.7.3 — disable EDM scanner, bootstrap batching, stale-anchor prune

Bandwidth + bootstrap hardening on top of v0.7.2. Wire-compatible with
v0.7.0/v0.7.1/v0.7.2; no protocol changes.

EDM port scanner DISABLED
- hole_punch_with_scanning() now does only single quick punch + parallel
  punch over 30s window. The EDM port-scanner branch is gone from the
  live path because per-probe endpoint.connect() amplifies catastrophically:
  iroh accumulates every connect() target into a per-endpoint paths set
  and probes them all under QUIC NAT-traversal in the background. A
  100-probes/sec / 5-min scan inserted ~30k paths; iroh probed all of
  them. Observed at 22MB/s outbound from one client — DoS-grade.
- Scanner body preserved as edm_port_scan_disabled_v0_7_3() with all
  supporting helpers (PortWalkIter, scanner_semaphore, role-based
  scanner/puncher split, found_tx/found_rx channel pattern,
  deadline + tokio::select! orchestration) marked #[allow(dead_code)].
  Refactor target: replace per-probe endpoint.connect() with raw
  socket.send_to() so probes don't enter iroh's path store.

Bootstrap probing batched
- New probe_anchors_batched() helper: 3 anchors in flight at a time,
  2s stagger between batch dispatches, 10s per-anchor timeout, no abort
  on success. First success unblocks the bootstrap flow; remaining
  probes continue in background and fill peer connections naturally.
- Phase 2 (bootstrap fallback) still only fires when every discovered
  anchor failed — preserves load-distribution intent. Replaces the
  sequential 50s+ timeout cascade users observed with old data dirs.

Stale-anchor self-pruning
- New storage.get_known_anchor_last_seen() and storage.delete_known_anchor().
- maybe_prune_stale_anchor(): when a probe fails AND last_seen_ms > 3 days,
  delete the entry from known_anchors immediately. Recoverable anchors
  (failed once, succeeded recently) are preserved. Self-healing for old
  data dirs whose discovered anchors point to keypairs that rotated
  months ago.

Android close button kills NodeService
- New NodeService.stopFromNative() Kotlin static method called via JNI
  from android_wifi::stop_node_service(). exit_app invokes it on Android
  before app.exit(0). Previously the button ended the Activity but the
  foreground service kept networking running.

Cosmetic
- Power-icon SVG (inline) replaces ⏻ so Android webviews lacking
  U+23FB don't render a missing-image tofu box.

Docs
- design.html section 11 rewritten for portmapper (UPnP+NAT-PMP+PCP,
  v0.7.2) including per-platform contract and bidirectional anchor
  watcher.
- design.html section 10 marks session relay as opt-in (v0.7.2) and EDM
  scanner as disabled-pending-refactor (v0.7.3).
- download.html carries v0.7.3 release notes.
- MEMORY.md updated; older v0.7.0/v0.7.1 status sections condensed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Scott Reimers 2026-05-15 14:33:45 -06:00
parent 4706e81603
commit 6ef11fa61c
14 changed files with 425 additions and 73 deletions

6
Cargo.lock generated
View file

@ -2732,7 +2732,7 @@ checksum = "92ecc6618181def0457392ccd0ee51198e065e016d1d527a7ac1b6dc7c1f09d2"
[[package]]
name = "itsgoin-cli"
version = "0.7.2"
version = "0.7.3"
dependencies = [
"anyhow",
"hex",
@ -2744,7 +2744,7 @@ dependencies = [
[[package]]
name = "itsgoin-core"
version = "0.7.2"
version = "0.7.3"
dependencies = [
"anyhow",
"base64 0.22.1",
@ -2769,7 +2769,7 @@ dependencies = [
[[package]]
name = "itsgoin-desktop"
version = "0.7.2"
version = "0.7.3"
dependencies = [
"anyhow",
"base64 0.22.1",

View file

@ -1,6 +1,6 @@
[package]
name = "itsgoin-cli"
version = "0.7.2"
version = "0.7.3"
edition = "2021"
[[bin]]

View file

@ -1,6 +1,6 @@
[package]
name = "itsgoin-core"
version = "0.7.2"
version = "0.7.3"
edition = "2021"
[dependencies]

View file

@ -213,3 +213,44 @@ impl Drop for MulticastLockGuard {
}
}
}
/// Stop the Android `NodeService` foreground service. Called from the
/// in-app close button so the network process actually exits rather
/// than continuing to run as a foreground service after the Activity
/// closes (foreground services are kept alive across Activity exit by
/// design).
///
/// Errors are logged but not propagated — best-effort cleanup before
/// `AppHandle::exit(0)` finishes the Activity.
pub fn stop_node_service() {
if let Err(e) = stop_node_service_inner() {
warn!("stop_node_service failed (will exit anyway): {}", e);
}
}
fn stop_node_service_inner() -> Result<(), String> {
let ctx = ndk_context::android_context();
if ctx.vm().is_null() {
return Err("ndk_context: null JavaVM".into());
}
if ctx.context().is_null() {
return Err("ndk_context: null activity context".into());
}
let vm = unsafe { JavaVM::from_raw(ctx.vm() as *mut _) }
.map_err(|e| format!("JavaVM init: {:?}", e))?;
let mut env = vm
.attach_current_thread()
.map_err(|e| format!("attach_current_thread: {:?}", e))?;
let activity = unsafe { JObject::from_raw(ctx.context() as *mut _) };
// NodeService.stopFromNative(activity)
env.call_static_method(
"com/itsgoin/app/NodeService",
"stopFromNative",
"(Landroid/content/Context;)V",
&[JValue::Object(&activity)],
)
.map_err(|e| format!("stopFromNative: {:?}", e))?;
Ok(())
}

View file

@ -145,14 +145,22 @@ pub(crate) async fn hole_punch_parallel(
None
}
// EDM port scanner — DISABLED in v0.7.3 (see hole_punch_with_scanning).
// Constants and helpers preserved as the refactor target for a raw-UDP
// scanner that bypasses iroh's path-store accumulation.
/// Timeout for each individual scan connect attempt (200ms → ~20 in-flight at 100/sec)
#[allow(dead_code)]
const SCAN_CONNECT_TIMEOUT_MS: u64 = 200;
/// Scan rate: one attempt every 10ms = 100 ports/sec
#[allow(dead_code)]
const SCAN_INTERVAL_MS: u64 = 10;
/// How often to punch peer's anchor-observed address during scanning (seconds).
/// Each punch checks if the peer has opened a firewall port matching our actual port.
#[allow(dead_code)]
const SCAN_PUNCH_INTERVAL_SECS: u64 = 2;
/// Maximum scan duration (seconds) — accept the cost for otherwise-impossible connections
#[allow(dead_code)]
const SCAN_MAX_DURATION_SECS: u64 = 300; // 5 minutes
/// Global cap on concurrent port-scan hole punches. Each scanner fires
@ -164,11 +172,63 @@ const SCAN_MAX_DURATION_SECS: u64 = 300; // 5 minutes
/// at proxy timeouts. A permit is acquired before the scanning loop
/// starts and held until the scanner returns; extra callers fall back
/// to the cheaper `hole_punch_parallel`.
#[allow(dead_code)]
fn scanner_semaphore() -> &'static tokio::sync::Semaphore {
static SEM: std::sync::OnceLock<tokio::sync::Semaphore> = std::sync::OnceLock::new();
SEM.get_or_init(|| tokio::sync::Semaphore::new(1))
}
/// Hole punch orchestrator.
///
/// **v0.7.3:** the EDM port scanner is DISABLED. We do Step 1 (quick punch to
/// the anchor-observed address) → Step 2 (parallel punch over the 30s window
/// to all known addresses). No port scan.
///
/// **Why disabled:** iroh's `Endpoint` accumulates every `endpoint.connect()`
/// target into a per-endpoint paths set and probes them all in the background
/// under QUIC NAT-traversal. A 100-probes/sec / 5-min scan inserts ~30,000
/// paths; iroh then probes all of them. Observed at 22MB/s outbound from a
/// single client. Disabled until we replace per-probe `endpoint.connect()`
/// with a raw `socket.send_to()` on the endpoint's bound UDP socket — see
/// `edm_port_scan_disabled_v0_7_3` for the preserved scanner logic to
/// refactor against.
///
/// Original docstring is preserved on `edm_port_scan_disabled_v0_7_3`.
pub(crate) async fn hole_punch_with_scanning(
endpoint: &iroh::Endpoint,
target: &NodeId,
addresses: &[String],
_our_profile: crate::types::NatProfile,
_peer_profile: crate::types::NatProfile,
) -> Option<iroh::endpoint::Connection> {
if let Some(conn) = hole_punch_single(endpoint, target, addresses).await {
return Some(conn);
}
hole_punch_parallel(endpoint, target, addresses).await
}
/// **DISABLED in v0.7.3** — kept as the refactor target for a safe replacement.
///
/// **Why disabled:** iroh's `Endpoint` accumulates every `endpoint.connect()`
/// target into a per-endpoint paths set and probes them all in the background
/// under QUIC NAT-traversal. A 100-probes/sec / 5-min scan inserts ~30,000
/// paths; iroh then probes all of them. Observed at 22MB/s outbound from a
/// single client (DoS-grade).
///
/// **Refactor target:** replace `endpoint.connect()` in the per-probe path
/// with a raw `socket.send_to(...)` on the endpoint's bound UDP socket. The
/// probe still opens a NAT mapping on our side; we just don't ask iroh to
/// manage the path. The every-2s punch retains `endpoint.connect()` so the
/// real handshake completes when the peer's punch arrives.
///
/// Logic worth preserving below: role-based scanner/puncher split,
/// `PortWalkIter`, `scanner_semaphore`, `found_tx`/`found_rx` channel
/// pattern, deadline + `tokio::select!` orchestration.
///
/// ---
///
/// Original docstring:
///
/// Advanced hole punch with port scanning fallback for EDM/port-restricted NAT.
///
/// **Role-based behavior** (each side calls this independently):
@ -183,7 +243,8 @@ fn scanner_semaphore() -> &'static tokio::sync::Semaphore {
/// NAT mapping alive and checks if the peer's scan has opened their firewall for us.
///
/// For both-EDM pairs: both sides scan + punch simultaneously.
pub(crate) async fn hole_punch_with_scanning(
#[allow(dead_code)]
async fn edm_port_scan_disabled_v0_7_3(
endpoint: &iroh::Endpoint,
target: &NodeId,
addresses: &[String],
@ -389,12 +450,17 @@ pub(crate) async fn hole_punch_with_scanning(
/// Iterator that walks outward from a base port: base, base+1, base-1, base+2, base-2, ...
/// Skips ports outside [1, 65535].
///
/// Used by `edm_port_scan_disabled_v0_7_3` — preserved for the future
/// raw-UDP scanner refactor.
#[allow(dead_code)]
struct PortWalkIter {
base: u16,
offset: u32,
tried_plus: bool, // within current offset, have we tried base+offset?
}
#[allow(dead_code)]
impl PortWalkIter {
fn new(base: u16) -> Self {
Self { base, offset: 0, tried_plus: false }

View file

@ -92,6 +92,175 @@ async fn ensure_initial_v_me(
generate_and_store_initial_v_me(&s, persona_id, now_ms)
}
/// Probe a list of anchors with batched parallelism, returning the first
/// successful NodeId. Remaining probes continue in background tasks after
/// first success and naturally register additional mesh connections.
///
/// **Parameters fixed in v0.7.3:**
/// - 3 anchors in flight at a time
/// - 2-second stagger between batch dispatches
/// - 10s per-anchor connect timeout
/// - Failed probes to anchors with `last_seen_ms` older than 3 days
/// auto-delete from `known_anchors` (self-healing pruning)
///
/// Returns `None` only when every probe completed without success.
async fn probe_anchors_batched(
anchors: Vec<(NodeId, Vec<std::net::SocketAddr>)>,
network: Arc<crate::network::Network>,
storage: Arc<StoragePool>,
self_node_id: NodeId,
label: &'static str,
) -> Option<NodeId> {
use std::sync::atomic::{AtomicUsize, Ordering};
const BATCH_SIZE: usize = 3;
const BATCH_STAGGER_SECS: u64 = 2;
const PER_ANCHOR_TIMEOUT_SECS: u64 = 10;
const STALE_THRESHOLD_MS: u64 = 3 * 86_400 * 1000;
let total = anchors.len();
if total == 0 {
return None;
}
let (success_tx, success_rx) = tokio::sync::oneshot::channel::<NodeId>();
let success_tx = Arc::new(tokio::sync::Mutex::new(Some(success_tx)));
let completed = Arc::new(AtomicUsize::new(0));
let all_done = Arc::new(tokio::sync::Notify::new());
// Dispatcher: spawns per-anchor tasks in batches of BATCH_SIZE,
// sleeping BATCH_STAGGER_SECS between batches. The per-anchor tasks
// continue running after the dispatcher exits.
let dispatcher = {
let network = Arc::clone(&network);
let storage = Arc::clone(&storage);
let success_tx = Arc::clone(&success_tx);
let completed = Arc::clone(&completed);
let all_done = Arc::clone(&all_done);
tokio::spawn(async move {
let mut iter = anchors.into_iter();
loop {
let batch: Vec<_> = (&mut iter).take(BATCH_SIZE).collect();
if batch.is_empty() {
break;
}
let more = iter.size_hint().0 > 0;
for (nid, addrs) in batch {
let network = Arc::clone(&network);
let storage = Arc::clone(&storage);
let success_tx = Arc::clone(&success_tx);
let completed = Arc::clone(&completed);
let all_done = Arc::clone(&all_done);
tokio::spawn(async move {
let result = probe_one_anchor(&network, &storage, nid, addrs, self_node_id, label).await;
if let Some(nid) = result {
let mut guard = success_tx.lock().await;
if let Some(sender) = guard.take() {
let _ = sender.send(nid);
}
}
let prev = completed.fetch_add(1, Ordering::SeqCst);
if prev + 1 == total {
all_done.notify_one();
}
});
}
if more {
tokio::time::sleep(std::time::Duration::from_secs(BATCH_STAGGER_SECS)).await;
}
}
})
};
// Race: first success vs all probes complete unsuccessfully.
let result = tokio::select! {
Ok(nid) = success_rx => Some(nid),
_ = all_done.notified() => None,
};
// Detach the dispatcher; in-flight per-anchor tasks continue.
drop(dispatcher);
let _ = BATCH_STAGGER_SECS; // silence unused-const if compiler is picky
let _ = PER_ANCHOR_TIMEOUT_SECS;
let _ = STALE_THRESHOLD_MS;
result
}
async fn probe_one_anchor(
network: &crate::network::Network,
storage: &Arc<StoragePool>,
nid: NodeId,
addrs: Vec<std::net::SocketAddr>,
self_node_id: NodeId,
label: &'static str,
) -> Option<NodeId> {
const PER_ANCHOR_TIMEOUT_SECS: u64 = 10;
const STALE_THRESHOLD_MS: u64 = 3 * 86_400 * 1000;
if nid == self_node_id || network.is_peer_connected_or_session(&nid).await {
return None;
}
let endpoint_id = match iroh::EndpointId::from_bytes(&nid) {
Ok(eid) => eid,
Err(_) => return None,
};
let mut addr = iroh::EndpointAddr::from(endpoint_id);
for sa in &addrs {
addr = addr.with_ip_addr(*sa);
}
info!(peer = hex::encode(&nid), label, "Trying anchor");
let result = tokio::time::timeout(
std::time::Duration::from_secs(PER_ANCHOR_TIMEOUT_SECS),
network.connect_to_anchor(nid, addr),
).await;
match result {
Ok(Ok(())) => {
info!(peer = hex::encode(&nid), label, "Connected to anchor");
Some(nid)
}
Ok(Err(e)) => {
debug!(error = %e, peer = hex::encode(&nid), label, "Anchor connect failed");
maybe_prune_stale_anchor(storage, &nid, STALE_THRESHOLD_MS).await;
None
}
Err(_) => {
debug!(peer = hex::encode(&nid), label, "Anchor connect timed out");
maybe_prune_stale_anchor(storage, &nid, STALE_THRESHOLD_MS).await;
None
}
}
}
/// If the anchor's last successful contact was more than `threshold_ms`
/// ago, delete it from `known_anchors`. Future startups won't waste a
/// probe slot on it. Anchors that were recently successful are preserved
/// even when they fail a single probe (likely transient).
async fn maybe_prune_stale_anchor(
storage: &Arc<StoragePool>,
nid: &NodeId,
threshold_ms: u64,
) {
let s = storage.get().await;
let last_seen_ms = match s.get_known_anchor_last_seen(nid) {
Ok(Some(ms)) => ms,
_ => return,
};
let now_ms = std::time::SystemTime::now()
.duration_since(std::time::UNIX_EPOCH)
.map(|d| d.as_millis() as u64)
.unwrap_or(0);
if now_ms > last_seen_ms && now_ms - last_seen_ms > threshold_ms {
let _ = s.delete_known_anchor(nid);
debug!(
peer = hex::encode(nid),
age_ms = now_ms - last_seen_ms,
"Pruned stale anchor (>3 days since last success + failed probe)"
);
}
}
impl Node {
/// Create or open a node in the given data directory (Desktop profile)
pub async fn open(data_dir: impl AsRef<Path>) -> anyhow::Result<Self> {
@ -272,6 +441,11 @@ impl Node {
/// Bootstrap: connect to anchors, pull initial data, NAT probe, referrals.
/// Can be called during open_with_bind (blocking startup) or deferred to background.
///
/// v0.7.3: anchor probing is batched (3 in flight, 2s stagger between batches,
/// 10s per-anchor timeout, first success unblocks downstream, remaining probes
/// continue in background and naturally fill peer connections). Failed probes
/// to anchors >3 days stale auto-prune from `known_anchors`.
pub async fn run_bootstrap(&self, data_dir: &Path) -> anyhow::Result<()> {
let storage = &self.storage;
let network = &self.network;
@ -479,57 +653,28 @@ impl Node {
let (discovered, bootstrap_known): (Vec<_>, Vec<_>) = known.into_iter()
.partition(|(nid, _)| !bootstrap_anchor_ids.contains(nid));
// Phase 1: Try discovered (non-bootstrap) anchors first
let mut connected_anchor = None;
for (anchor_nid, anchor_addrs) in &discovered {
if *anchor_nid == node_id || network.is_peer_connected_or_session(anchor_nid).await {
continue;
}
let endpoint_id = match iroh::EndpointId::from_bytes(anchor_nid) {
Ok(eid) => eid,
Err(_) => continue,
};
let mut addr = iroh::EndpointAddr::from(endpoint_id);
for sa in anchor_addrs {
addr = addr.with_ip_addr(*sa);
}
info!(peer = hex::encode(anchor_nid), "Trying discovered anchor");
match tokio::time::timeout(std::time::Duration::from_secs(10), network.connect_to_anchor(*anchor_nid, addr)).await {
Ok(Ok(())) => {
info!(peer = hex::encode(anchor_nid), "Connected to discovered anchor");
connected_anchor = Some(*anchor_nid);
break;
}
Ok(Err(e)) => debug!(error = %e, peer = hex::encode(anchor_nid), "Discovered anchor: connect failed"),
Err(_) => debug!(peer = hex::encode(anchor_nid), "Discovered anchor: connect timed out"),
}
}
// Phase 1: probe discovered (non-bootstrap) anchors in batches.
// First success returns immediately; remaining probes continue in
// background. Failed probes to anchors >3 days stale auto-prune.
let mut connected_anchor = probe_anchors_batched(
discovered.clone(),
network.clone(),
Arc::clone(storage),
node_id,
"discovered",
).await;
// Phase 2: Fall back to bootstrap anchors only if no discovered anchor worked
// Phase 2: bootstrap anchors as fallback — only fires if every
// Phase 1 entry failed. Preserves the load-distribution intent
// (don't smash the central anchor when discovered anchors work).
if connected_anchor.is_none() {
for (anchor_nid, anchor_addrs) in &bootstrap_known {
if *anchor_nid == node_id || network.is_peer_connected_or_session(anchor_nid).await {
continue;
}
let endpoint_id = match iroh::EndpointId::from_bytes(anchor_nid) {
Ok(eid) => eid,
Err(_) => continue,
};
let mut addr = iroh::EndpointAddr::from(endpoint_id);
for sa in anchor_addrs {
addr = addr.with_ip_addr(*sa);
}
info!(peer = hex::encode(anchor_nid), "Trying bootstrap anchor (fallback)");
match tokio::time::timeout(std::time::Duration::from_secs(10), network.connect_to_anchor(*anchor_nid, addr)).await {
Ok(Ok(())) => {
info!(peer = hex::encode(anchor_nid), "Connected to bootstrap anchor");
connected_anchor = Some(*anchor_nid);
break;
}
Ok(Err(e)) => debug!(error = %e, peer = hex::encode(anchor_nid), "Bootstrap anchor: connect failed"),
Err(_) => debug!(peer = hex::encode(anchor_nid), "Bootstrap anchor: connect timed out"),
}
}
connected_anchor = probe_anchors_batched(
bootstrap_known.clone(),
network.clone(),
Arc::clone(storage),
node_id,
"bootstrap",
).await;
}
// Phase 3: NAT probe + referrals from whichever anchor we connected to

View file

@ -2248,6 +2248,33 @@ impl Storage {
Ok(result)
}
/// Get the last successful contact time (ms since epoch) for a known anchor.
/// Returns None if the anchor isn't in the table.
pub fn get_known_anchor_last_seen(&self, node_id: &NodeId) -> anyhow::Result<Option<u64>> {
let mut stmt = self.conn.prepare(
"SELECT last_seen_ms FROM known_anchors WHERE node_id = ?1",
)?;
let mut rows = stmt.query(params![node_id.as_slice()])?;
if let Some(row) = rows.next()? {
let ms: i64 = row.get(0)?;
Ok(Some(ms as u64))
} else {
Ok(None)
}
}
/// Remove a known anchor entry. Used by the bootstrap connect path
/// when a stale anchor (>3 days since last successful contact) fails
/// to connect — self-healing pruning so future startups don't re-try
/// long-dead entries.
pub fn delete_known_anchor(&self, node_id: &NodeId) -> anyhow::Result<()> {
self.conn.execute(
"DELETE FROM known_anchors WHERE node_id = ?1",
params![node_id.as_slice()],
)?;
Ok(())
}
/// Prune known anchors to keep at most `max` entries (by highest success_count).
pub fn prune_known_anchors(&self, max: usize) -> anyhow::Result<usize> {
let count: i64 = self.conn.query_row(

View file

@ -1,6 +1,6 @@
[package]
name = "itsgoin-desktop"
version = "0.7.2"
version = "0.7.3"
edition = "2021"
[lib]

View file

@ -5,6 +5,7 @@ import android.app.NotificationChannel
import android.app.NotificationManager
import android.app.PendingIntent
import android.app.Service
import android.content.Context
import android.content.Intent
import android.content.pm.ServiceInfo
import android.os.Build
@ -16,6 +17,17 @@ class NodeService : Service() {
companion object {
const val CHANNEL_ID = "itsgoin_node"
const val NOTIFICATION_ID = 1
// Called via JNI from Rust when the user taps the in-app close
// button. Foreground services survive Activity exit by design
// (keeps connections alive when backgrounded). When the user
// explicitly wants to stop networking, we need to stop the
// service in addition to ending the Activity.
@JvmStatic
fun stopFromNative(context: Context) {
val intent = Intent(context, NodeService::class.java)
context.stopService(intent)
}
}
private var wakeLock: PowerManager.WakeLock? = null

View file

@ -1144,6 +1144,13 @@ async fn list_vouches_given(state: State<'_, AppNode>) -> Result<Vec<VouchGivenD
#[tauri::command]
async fn exit_app(app: tauri::AppHandle) {
// On Android, the foreground NodeService survives Activity exit by
// design (keeps network alive when backgrounded). When the user
// explicitly hits the in-app close button, also stop the service
// so we actually free the device's network/wakelock.
#[cfg(target_os = "android")]
itsgoin_core::android_wifi::stop_node_service();
app.exit(0);
}

View file

@ -1,6 +1,6 @@
{
"productName": "itsgoin",
"version": "0.7.2",
"version": "0.7.3",
"identifier": "com.itsgoin.app",
"build": {
"frontendDist": "../../frontend",

View file

@ -25,7 +25,7 @@
<span id="net-dot"></span>
<span id="net-labels"></span>
</div>
<button id="close-app-btn" title="Close app (stops connections to save battery)" aria-label="Close app">&#x23FB;</button>
<button id="close-app-btn" title="Close app (stops connections to save battery)" aria-label="Close app"><svg viewBox="0 0 24 24" width="14" height="14" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M18.36 6.64a9 9 0 1 1-12.73 0"/><line x1="12" y1="2" x2="12" y2="12"/></svg></button>
</div>
<nav id="tabs">
<button class="tab" data-tab="feed"><span class="tab-icon">&#x1f4f0;</span><span class="tab-label">Feed</span></button>

View file

@ -474,8 +474,9 @@ FAILURE: C &rarr; B &rarr; A: AnchorProbeResult { reachable: false }</code></pre
<h3>Session relay (relay pipes)</h3>
<p>Intermediary splices bi-streams between requester and target. Desktop: max 10 concurrent pipes. Mobile: max 2. Each pipe has a 50MB byte cap and 2-min idle timeout.</p>
<div class="note">
<strong>v0.2.0 change</strong>: Relay pipes are <strong>own-device-only by default</strong>. A node will only relay traffic between its own devices (same identity key, different device identity). Users can opt in to relaying for others in Settings, but this is not enabled automatically. This prevents nodes from unknowingly burning bandwidth for random peers while still enabling personal multi-device routing.
<strong>v0.7.2 change</strong>: Session relay is now <strong>OPT-IN ONLY and DISABLED BY DEFAULT</strong> &mdash; including for anchor-mode nodes (servers are most likely to pay for bandwidth). Gated by the <code>relay.session_relay_enabled</code> setting on both <em>serving</em> (<code>can_accept_relay_pipe</code>) and <em>using</em> (the auto-fallback after a failed hole punch). Settings UI exposes the toggle. Hole-punch-failure no longer silently routes a peer-to-peer session through an unrelated third party's bandwidth.
</div>
<p style="color: var(--text-muted); font-size: 0.85rem;">This rule covers <em>only</em> full byte-piping. Small relay-style signaling/discovery &mdash; <code>RelayIntroduce</code> for hole-punch coordination, <code>worm_lookup</code> multi-hop search, N1/N2/N3 share-list exchange &mdash; remains always-on; that's not session relay. The anchor's HTTP proxy path (anchor fetches a post via QUIC and serves it back over HTTP) is also not session relay &mdash; it's the anchor doing its own QUIC fetch on the browser's behalf.</p>
<h3>Deduplication &amp; cooldowns</h3>
<table>
@ -516,8 +517,15 @@ FAILURE: C &rarr; B &rarr; A: AnchorProbeResult { reachable: false }</code></pre
</table>
<p>All hole punch paths use <code>hole_punch_with_scanning()</code> which replaces the former hard+hard skip. NAT profiles (NatMapping + NatFiltering) from InitialExchange determine whether scanning is attempted. Behavioral inference updates filtering classification from connection outcomes.</p>
<h3>Advanced NAT traversal</h3>
<h3>Status: <span class="badge badge-complete">Complete</span></h3>
<h3>Advanced NAT traversal (EDM port scanner)</h3>
<h3>Status: <span class="badge badge-planned">Disabled in v0.7.3 &mdash; refactor pending</span></h3>
<div class="note">
<strong>v0.7.3:</strong> the EDM port scanner is DISABLED. <code>hole_punch_with_scanning()</code> currently does only Step 1 (quick punch to the anchor-observed address) and Step 2 (parallel punch to all known addresses over a 30s window). No port scan.
<p style="margin: 0.5rem 0 0 0;"><strong>Why:</strong> iroh's <code>Endpoint</code> accumulates every <code>endpoint.connect()</code> target into a per-endpoint paths set and probes them all in the background under QUIC NAT-traversal. A 100-probes/sec / 5-min scan inserted ~30,000 paths; iroh then probed all of them. Observed at 22MB/s outbound from a single client &mdash; DoS-grade.</p>
<p style="margin: 0.5rem 0 0 0;"><strong>Refactor target:</strong> replace per-probe <code>endpoint.connect()</code> with raw <code>socket.send_to()</code> on the endpoint's bound UDP socket. The probe still opens a NAT mapping on our side; we just don't ask iroh to manage the path. The original scanner body is preserved as <code>edm_port_scan_disabled_v0_7_3</code> in <code>connection.rs</code>, including <code>PortWalkIter</code>, <code>scanner_semaphore</code>, role-based scanner/puncher split, and the <code>tokio::select!</code> orchestration &mdash; refactor against that.</p>
<p style="margin: 0.5rem 0 0 0;">The description below documents the <em>intended design</em> the refactor will deliver against.</p>
</div>
<p>NAT "hardness" has two independent dimensions:</p>
<ul style="padding-left: 1.25rem; margin: 0.5rem 0; color: var(--text-muted);">
@ -572,28 +580,44 @@ FAILURE: C &rarr; B &rarr; A: AnchorProbeResult { reachable: false }</code></pre
</div>
</section>
<!-- 11. UPnP Port Mapping -->
<!-- 11. Port Mapping (UPnP-IGD + NAT-PMP + PCP) -->
<section id="upnp">
<h2>11. UPnP Port Mapping</h2>
<h3>Status: <span class="badge badge-complete">Complete</span></h3>
<h2>11. Port Mapping &mdash; UPnP-IGD + NAT-PMP + PCP</h2>
<h3>Status: <span class="badge badge-complete">Complete (v0.7.2)</span></h3>
<h3>Purpose</h3>
<p>UPnP (Universal Plug and Play) allows a node to request its home router to forward an external port to its local QUIC port. This makes the node <strong>directly reachable from the internet</strong> without hole punching &mdash; any peer with the external address can connect immediately. This dramatically improves connection success rates for desktop nodes on home networks.</p>
<p>Asks the local gateway router to forward an external port to this node's local QUIC port. A successful mapping makes the node <strong>directly reachable from the internet</strong> without hole punching &mdash; any peer with the external address can connect immediately. Three protocols are attempted in parallel; the first router-response wins.</p>
<h3>Protocols (v0.7.2)</h3>
<ul style="padding-left: 1.25rem; margin: 0.5rem 0; color: var(--text-muted);">
<li><strong>UPnP-IGD</strong> &mdash; long-standing consumer-router default. Discovery via SSDP multicast on 239.255.255.250:1900. Behavior varies; many routers ship with UPnP disabled by default.</li>
<li><strong>NAT-PMP</strong> (RFC 6886) &mdash; Apple lineage; widespread on routers that ever shipped Bonjour. Unicast to the gateway on UDP/5351.</li>
<li><strong>PCP</strong> (RFC 6887) &mdash; modern IETF-track successor to NAT-PMP. Unicast on UDP/5351. Supports both IPv4 NAT mapping and IPv6 firewall pinholes. Works on iOS without the multicast networking entitlement.</li>
</ul>
<p>Implementation uses the <code>portmapper</code> crate (also used by iroh internally). Replaces the v0.7.1 hand-rolled <code>igd-next</code>-only path.</p>
<h3>Startup flow</h3>
<pre><code>bind Endpoint &rarr; attempt UPnP mapping (2s timeout) &rarr; store external addr &rarr; bootstrap</code></pre>
<pre><code>bind Endpoint &rarr; spawn portmapper Client (UDP) &rarr; wait up to 3s for first protocol response &rarr; bootstrap (TCP mapping fires in parallel for HTTP serving)</code></pre>
<ol style="padding-left: 1.25rem; margin: 0.5rem 0; color: var(--text-muted);">
<li><strong>Discover gateway</strong>: Search for UPnP/NAT-PMP gateway with a 2-second timeout. If no gateway found, proceed without &mdash; do not block startup.</li>
<li><strong>Request mapping</strong>: Map both UDP and TCP for the local QUIC port to the same external port (or next available). UDP is required for QUIC (existing). TCP enables HTTP post delivery (see <a href="#http-delivery">Section 25</a>). Both use the same external port number. If the router supports one but not the other, accept the partial mapping gracefully &mdash; QUIC connectivity is not affected by TCP mapping failure. Request lease TTL of 3600s.</li>
<li><strong>Store external address</strong>: The resulting external <code>SocketAddr</code> is stored alongside iroh's observed addresses. It feeds into N+10 identification, InitialExchange, anchor registration, and all peer address advertisements.</li>
<li><strong>Log result</strong>: Clearly log whether UPnP succeeded, failed, or was unavailable. This is critical for diagnosing connectivity issues.</li>
<li><strong>Probe all three protocols in parallel</strong>: portmapper's background service fires UPnP-IGD discovery + NAT-PMP unicast + PCP unicast concurrently. First success wins; failures from the others are absorbed silently.</li>
<li><strong>UDP mapping for QUIC</strong>: maps the local QUIC port to an external port. Required for direct inbound. Address feeds N+10 identification, InitialExchange, anchor registration, and peer address advertisements.</li>
<li><strong>TCP mapping for HTTP</strong>: separate parallel attempt for HTTP serving (see <a href="#http-delivery">Section 25</a>). Independent of UDP &mdash; either can succeed alone. Phones with permissive NAT can serve HTTP directly to browser fetches as of v0.7.2.</li>
<li><strong>Per-platform behavior</strong>: All three protocols on desktop. On Android, a WiFi/Ethernet gate skips probing on cellular (no UPnP/PCP gateway exposed by carriers) and a <code>WifiManager.MulticastLock</code> is held for the lifetime of the mapping so UPnP-IGD's SSDP responses actually arrive. On iOS, PCP and NAT-PMP work without the multicast entitlement; UPnP-IGD silently fails until the entitlement is granted.</li>
</ol>
<h3>Lease renewal cycle (every 2700s / 45 min)</h3>
<p>UPnP mappings have a TTL (typically 3600s but varies by router). A renewal loop runs every 45 minutes to refresh the mapping before it expires. If renewal fails, the external address is removed from advertisements and the node falls back to hole punch / relay paths gracefully.</p>
<h3>Auto-renewal</h3>
<p>The <code>portmapper::Client</code> renews leases internally in a background task. No external renewal cycle to schedule. Dropping the <code>PortMapping</code> handle aborts the renewal task and releases the mapping.</p>
<h3>Bidirectional anchor reachability watcher</h3>
<p>A startup-spawned task watches the UDP mapping's reactive external-address channel:</p>
<ul style="padding-left: 1.25rem; margin: 0.5rem 0; color: var(--text-muted);">
<li><strong>Mapping lost for &gt;5 min</strong> &rarr; clear <code>is_anchor</code>. The node stops advertising itself as an anchor at a now-stale external address.</li>
<li><strong>Mapping restored</strong> (None &rarr; Some) &rarr; re-evaluate auto-anchor. On non-mobile devices the anchor flag is set back on so the node re-joins the anchor set without a restart.</li>
</ul>
<p>Network roams between UPnP-capable WiFi networks self-heal. Mobile devices never auto-anchor regardless &mdash; cellular IPs look public but sit behind CGNAT.</p>
<h3>Shutdown</h3>
<p>Explicitly release the UPnP mapping on clean shutdown. Routers have finite mapping tables &mdash; releasing is good citizenship. Tauri's shutdown hook handles this.</p>
<p>Explicitly release the mapping on clean shutdown. Routers have finite mapping tables &mdash; releasing is good citizenship.</p>
<h3>Integration with existing address logic</h3>
<p>The UPnP external address is treated the same as any other address the node knows about. It feeds into:</p>

View file

@ -26,6 +26,36 @@
<h1 style="font-size: 2rem; font-weight: 800; letter-spacing: -0.03em; margin-bottom: 0.25rem;">Download ItsGoin</h1>
<p>Available for Android, Linux, and Windows. Free and open source.</p>
<h2 style="margin-top: 2rem;">v0.7.3 &mdash; May 15, 2026</h2>
<p style="color: var(--text-muted); font-size: 0.85rem;">Bandwidth + bootstrap hardening on top of v0.7.2. Wire-compatible with v0.7.0/v0.7.1/v0.7.2.</p>
<div class="downloads">
<a href="itsgoin-0.7.3.apk" class="download-btn btn-android">
Android APK
<span class="sub">v0.7.3</span>
</a>
<a href="itsgoin_0.7.3_amd64.AppImage" class="download-btn btn-linux">
Linux AppImage
<span class="sub">v0.7.3</span>
</a>
<a href="itsgoin-cli-0.7.3-linux-amd64" class="download-btn btn-linux">
Linux CLI / Anchor
<span class="sub">v0.7.3</span>
</a>
<a href="itsgoin-0.7.3-windows-x64-setup.exe" class="download-btn btn-windows">
Windows Installer
<span class="sub">v0.7.3</span>
</a>
</div>
<ul style="color: var(--text-muted); font-size: 0.85rem; line-height: 1.6; margin-top: 1rem;">
<li><strong>EDM port scanner disabled.</strong> The "advanced NAT traversal" port-scanner (Hard NAT &harr; Hard NAT) used <code>endpoint.connect()</code> as its probe primitive; iroh accumulates every connect target into its per-endpoint path store and probes them all in the background under QUIC NAT-traversal. A 5-min scan inserted ~30k paths; iroh then probed all of them &mdash; observed at 22MB/s outbound from a single client. DoS-grade at any scale. Disabled until we replace per-probe <code>connect()</code> with raw UDP sends. The scanner source is preserved as <code>edm_port_scan_disabled_v0_7_3</code> to refactor against.</li>
<li><strong>Bootstrap anchor probing batched.</strong> Discovered anchors are now probed 3 at a time with a 2s stagger between batches and a 10s per-anchor timeout. First success unblocks the bootstrap flow immediately; remaining probes continue in background and naturally fill peer connections. Phase 2 (bootstrap fallback) still only fires when every discovered anchor has failed &mdash; preserves the load-distribution intent for when the network scales.</li>
<li><strong>Stale-anchor self-pruning.</strong> When a probe fails AND the anchor's <code>last_seen_ms</code> is more than 3 days old, the entry is deleted from <code>known_anchors</code> immediately. Recoverable anchors (failed once, succeeded recently) are preserved. Users with old data dirs whose discovered anchors point to keypairs that rotated months ago no longer carry stale baggage forward.</li>
<li><strong>Close button kills the Android NodeService.</strong> The in-app close button now calls <code>NodeService.stopFromNative()</code> via JNI before exiting the Activity, so the foreground service actually stops &mdash; previously the button ended the UI but networking kept running.</li>
<li><strong>Power-icon SVG.</strong> The close-button glyph is now an inline SVG instead of <code>&amp;#x23FB;</code> &mdash; Android webview fonts that lack U+23FB previously rendered the button as a missing-image tofu box.</li>
</ul>
<h2 style="margin-top: 2rem;">v0.7.2 &mdash; May 15, 2026</h2>
<p style="color: var(--text-muted); font-size: 0.85rem;">Network &amp; reachability improvements, plus a relay-privacy fix.</p>