Fix: GroupKeyDistribute admin forgery + cap concurrent port scanners

Two pre-release fixes found during audit.

1) GroupKeyDistribute admin forgery (critical)

   `group_key_distribution::try_apply_distribution_post` trusted the
   `admin` field inside the decrypted payload without verifying it
   matched the post's author. Exploit: any peer who learns a victim's
   posting NodeId (public — appears as a recipient on any DM/group
   post) and observes a target group_id in the wild could craft an
   encrypted distribution post claiming to be from the legitimate
   admin. The victim's storage uses INSERT OR REPLACE on group_keys,
   so a successful forgery would overwrite the victim's legitimate
   group key record and stored seed, breaking future rotations / key
   distributions from the real admin.

   Fix: reject the distribution post when `content.admin != post.author`.
   Added test `forged_admin_is_rejected` that seeds a legitimate
   record, attempts a forgery, and asserts the legitimate record is
   untouched.

2) Cap concurrent port-scan hole punches at 1 (bandwidth)

   `hole_punch_with_scanning` fires ~100 QUIC ClientHellos/sec for up
   to SCAN_MAX_DURATION_SECS (300s), ~1 Mbps per active scanner. With
   no cap, the growth loop / anchor referrals / replication paths
   could spawn several scanners at once and drive sustained multi-Mbps
   upload — particularly pathological on obfuscated VPNs where every
   probe stalls at a proxy timeout, explaining the reported 10 Mbps
   sustained upload after anchor connect.

   Fix: module-level `tokio::sync::Semaphore(1)` guarding entry to the
   scanning loop. Second-and-beyond callers fall back to the cheaper
   `hole_punch_parallel` (standard punching, no 100/sec port walk)
   instead of spawning another scanner. Permit is held for the scanner
   lifetime and released on return. Added unit test
   `scanner_semaphore_caps_concurrent_scans_at_one`.

Both changes leave the successful-call path untouched (single scanner
still runs; legitimate key distributions still apply). 120 / 120 core
tests pass.
This commit is contained in:
Scott Reimers 2026-04-22 23:32:10 -04:00
parent f88618bb6f
commit dfd3253734
2 changed files with 127 additions and 0 deletions

View file

@ -155,6 +155,20 @@ const SCAN_PUNCH_INTERVAL_SECS: u64 = 2;
/// Maximum scan duration (seconds) — accept the cost for otherwise-impossible connections
const SCAN_MAX_DURATION_SECS: u64 = 300; // 5 minutes
/// Global cap on concurrent port-scan hole punches. Each scanner fires
/// ~100 QUIC ClientHellos/sec for up to `SCAN_MAX_DURATION_SECS`, which
/// is ~1 Mbps per active scanner. Without a cap, multiple parallel
/// referrals (growth loop, anchor referrals, replication) can spawn
/// several scanners at once and drive sustained multi-Mbps upload —
/// especially pathological on obfuscated VPNs where every probe stalls
/// at proxy timeouts. A permit is acquired before the scanning loop
/// starts and held until the scanner returns; extra callers fall back
/// to the cheaper `hole_punch_parallel`.
fn scanner_semaphore() -> &'static tokio::sync::Semaphore {
static SEM: std::sync::OnceLock<tokio::sync::Semaphore> = std::sync::OnceLock::new();
SEM.get_or_init(|| tokio::sync::Semaphore::new(1))
}
/// Advanced hole punch with port scanning fallback for EDM/port-restricted NAT.
///
/// **Role-based behavior** (each side calls this independently):
@ -188,6 +202,21 @@ pub(crate) async fn hole_punch_with_scanning(
return hole_punch_parallel(endpoint, target, addresses).await;
}
// v0.6.2: cap to one concurrent port scanner per node. Additional
// callers fall back to the cheaper `hole_punch_parallel` instead of
// spawning another 100-probes-per-second scanner. The permit is held
// for the lifetime of the scanner loop below (dropped on return).
let _scan_permit = match scanner_semaphore().try_acquire() {
Ok(p) => p,
Err(_) => {
tracing::debug!(
peer = hex::encode(target),
"another port scan already in progress — falling back to parallel punch"
);
return hole_punch_parallel(endpoint, target, addresses).await;
}
};
// Filter to reachable families, then use observed address (first in list, injected by relay)
let reachable = filter_reachable_families(endpoint, addresses);
let observed_addr = reachable.first()
@ -8379,3 +8408,21 @@ fn now_ms() -> u64 {
.unwrap_or_default()
.as_millis() as u64
}
#[cfg(test)]
mod tests {
use super::scanner_semaphore;
#[test]
fn scanner_semaphore_caps_concurrent_scans_at_one() {
let sem = scanner_semaphore();
// Fresh — one permit should be available.
let p1 = sem.try_acquire().expect("first scan should acquire");
// Second concurrent caller must be rejected.
assert!(sem.try_acquire().is_err(), "second scan must not acquire while first holds permit");
// Dropping the first permit returns it to the pool.
drop(p1);
let p2 = sem.try_acquire().expect("after release, next scan should acquire");
drop(p2);
}
}