# DOS Hardening TODO Identified during v0.4.0 audit (2026-03-21). Implement before v0.4.1. ## CRITICAL — Lock Contention (v4-introduced) ### L1. ManifestPush discovery holds cm lock during network I/O (connection.rs:4847-4981) - Spawned task grabs cm lock, calls send_post_fetch (QUIC I/O), waits for response — all while locked - Every connection operation queues behind it (5s+ freeze possible) - **Fix:** Gather connection handle before locking cm. PostFetch outside lock. Brief re-acquire for DB writes. ### L2. Pull request handler holds lock during filtering (connection.rs:1855-1905) - Loads ALL posts, loops through checking visibility + timestamps while holding storage lock - 50K posts = 500ms+ lock hold - **Fix:** Load posts under lock (brief), release, filter without lock (CPU only), re-acquire briefly for is_deleted() on filtered subset. ### L3. Pull sender's second lock too long (connection.rs:1650-1721, 1572-1624) - After receiving posts: store + add_upstream (count query each) + update_last_sync — all under one lock - 100 posts = 100 inserts + 100 count queries + 20 author updates - **Fix:** Split into two brief locks. First: bulk store posts. Second: batch upstream adds + last_sync updates. Collect unique authors during first lock. ### L4. Per-post engagement lock acquisitions (connection.rs:1777-1833) - Lock acquired/released 100 times in tight loop (once per post) - Each acquisition blocks behind other tasks - **Fix:** Batch writes. Collect all engagement results, acquire lock once, write all. Network I/O already outside lock. ### L5. Stale follows query (node.rs:2528-2530) — LOW - get_stale_follows every 60s, brief query, acceptable - **Fix (optional):** Add index on follows(last_sync_ms) if missing ## HIGH Priority — DOS ### 1. Stream handler cap (connection.rs:4252, 4270) - Max 10 concurrent workers per connection via Semaphore - Excess streams wait, not spawned unbounded ### 2. Slot index memory bomb (connection.rs:5754-5792) - Soft 1K slot limit per post - Author can sign capacity increase that propagates via BlobHeaderDiff - Without author signature, cap stays at 1K - Consider thread-split pattern for overflow (already exists for 16KB comments) ### 3. ManifestPush amplification (connection.rs:4877-4936) - Custom ManifestPush for new posts: only deliver [new_post_id, previous_post_id] - Each CDN partner updates their own local manifest copy - Same diff pattern useful for N+10 updates - Lower bandwidth, low-priority background task ## MEDIUM Priority ### 4. Post list pagination (connection.rs:1857) - Limit to 200 posts per pull response - ~100KB memory, <5ms lock hold - Next sync cycle catches remainder via since_ms timestamps ### 5. Eviction candidate cap (storage.rs:3678-3737) - Limit to 100 candidates per batch - ~40KB memory, <5ms lock hold - Next 5-min cycle catches more if needed ### 6. Payload element abuse — CDN consensus check - Before accepting a large engagement update, check 1-2 CDN neighbors - "Does your header for this post look like this?" If not → reject - Attacker must compromise multiple CDN nodes to pass - No trust scoring needed — just peer corroboration ### 7. Lock acquisition timeouts (connection.rs: throughout) - 5-second timeout on storage lock acquisition - On timeout: skip operation, try next cycle - Log: operation name, wait duration, who holds the lock - Add `last_lock_holder: AtomicU64` storing hash of acquiring function name ### 8. Discovery task cap (connection.rs:4938-4980) - One discovery task per peer at a time - AtomicBool flag per connection, skip if already running ## LOW Priority ### 9. Engagement rate limiting - Self-claimed: max 3 emoji + 1 comment per 10 seconds - Chain-propagated: CDN consensus check from #6 applies - Process only first 100 ops per BlobHeaderDiff message ### 10. Mesh stream spawn cap (connection.rs) - Same as #1 — 10 max concurrent handlers per connection - Supplements auth rate limiter (which handles connection-level, not stream-level) ### 11. Retry backoff per target - Start at 5 seconds, triple on each failure - 5s → 15s → 45s → 135s → 405s → 1215s → 3645s → 10935s → 14400s (cap at 4hr) - 8 failures to hit max backoff - Reset to 5s on success - Track per target peer, not global --- # Security Hardening TODO Identified during v0.4.0 security audit (2026-03-21). ## CRITICAL — Immediate (before next public release) ### S1. Comment signature verification — ONE LINE FIX (connection.rs:5711-5724) - `verify_comment_signature()` exists in crypto.rs but is NEVER called on receipt - Add `if !crypto::verify_comment_signature(...) { continue; }` before `store_comment()` - Infrastructure exists, just not wired up ### S2. Reaction removal auth check — TWO LINE FIX (connection.rs:5708-5709) - `RemoveReaction` accepts from any sender, no auth - Add: `if *reactor == sender || sender == payload.author { ... }` - Same pattern already used in EditComment/DeleteComment ### S3. Reaction signature — ~30 lines (types.rs, crypto.rs, connection.rs) - `Reaction` has no signature field — anyone can fake reactions from any NodeId - Add `signature: Vec` to Reaction struct (#[serde(default)] for compat) - Sign `(reactor + post_id + emoji + timestamp)` with reactor's ed25519 key - Verify in handle_blob_header_diff before storing - Follow existing `sign_comment` / `verify_comment_signature` pattern ### S4. BlobHeader author verification — ~5 lines (connection.rs:5821-5836) - Header rebuild uses `payload.author` without checking against stored post author - Look up actual author from `storage.get_post(&payload.post_id)` - Use stored author, not payload-claimed author ## HIGH — Short-term ### S5. PostId verification in all paths (connection.rs) - PostPush verifies with `verify_post_id()` but some pull paths don't - Audit all `store_post_with_visibility` call sites - Ensure `verify_post_id()` called before each store ### S6. Slot write protection — self-healing signature system (connection.rs:5749-5803) - Problem: any peer can overwrite encrypted slots with garbage - Solution (two layers): 1. CDN tree membership check: only accept slot writes from peers in post_downstream or post_upstream for that post. Rejects random peers. 2. Self-healing signatures: participants sign their own slot writes with the slot key (derived from CEK) and keep a local copy. On diff check, if their slot was overwritten with something they didn't sign, they re-write their signed version. Other participants verify signatures — keep the signed version, discard unsigned garbage. The legitimate version propagates through the CDN tree. Attacker must keep overwriting forever; the real version keeps coming back from every CDN node that received it. - Relay nodes can't verify signatures (don't have CEK) but pass through all writes — participants do client-side verification on decrypt ### S7. Comment edit/delete cryptographic proof (connection.rs:5726-5736) - Currently "trust-based" — checks sender == author at transport layer - QUIC connection IS authenticated (iroh ed25519), so sender identity is verified - Risk: compromised relay node - Fix: require new signature over edited content (editor proves they hold private key) - For post-author deletes: require post author's signature over delete request ### S8. Pull sync follow list privacy (connection.rs:1846-1915) - PullSyncRequest sends entire follow list unencrypted to every sync peer - Every mesh peer learns your complete social graph - Options: - Accept and document (mesh peers are semi-trusted infrastructure) — RECOMMENDED for now - Bloom filter: probabilistic set, leaks less, some irrelevant posts received (acceptable bandwidth cost) - Long-term: oblivious transfer / PIR (heavy crypto, probably not worth it for social network) ## MEDIUM — Design review ### S9. Nonce reuse guard (crypto.rs:54-56) - ChaCha20-Poly1305 catastrophic on nonce reuse - RNG is reliable on modern OS (getrandom syscall) - Add sanity check: if nonce is all zeros after generation, panic rather than encrypt - One-line guard ### S10. Slot timing metadata leakage (connection.rs:5757, 5776) - `header.updated_at` changes on slot writes, leaking WHEN engagement occurs on private posts - Passive observer can correlate timestamps with known user behavior - Fix: round updated_at to 10-minute buckets for private posts - Or batch slot writes on fixed schedule rather than immediately ### S11. Per-author engagement rate limiting (connection.rs:5699-5725) - A peer can send 10,000 fake reactions in one BlobHeaderDiff - Cap ops per message (100 max per DOS hardening #9) - Deduplicate by (reactor, post_id, emoji) — storage already does ON CONFLICT DO UPDATE - Combined with reaction signatures (S3), fake NodeId reactions become impossible ## LOW --- # Data Cleanup TODO ### D1. post_downstream not cleaned on post delete (storage.rs delete_post) - When a post is deleted, downstream registrations stay forever - Fix: add `DELETE FROM post_downstream WHERE post_id = ?1` in delete_post() - Also add: `DELETE FROM post_upstream WHERE post_id = ?1` - Also add: `DELETE FROM seen_engagement WHERE post_id = ?1` - One-line fixes each ### D2. Document BlobHeader-table relationship (storage.rs store_blob_header) - Header JSON is a snapshot, reactions/comments tables are authoritative - They can temporarily diverge (BlobHeaderResponse arrives with newer header than tables) - Header rebuilt from tables on next engagement op - Add clarifying comment to store_blob_header --- # Low Priority ### S12. Hex parse error logging (web.rs:110-119) - Malformed hex strings silently return 404 - Add debug logging for malformed inputs ### S13. Edit comment signature consistency (storage.rs:4338-4343) - edit_comment updates content without updating signature - If signature verification (S1) is enabled, edited comments would have invalid signatures - Fix: add signature parameter to edit_comment, re-sign edited content