Security: - Reaction signatures: ed25519 sign/verify (sign_reaction, verify_reaction_signature) Backward-compatible — unsigned reactions from old nodes still accepted - Comment signature verification: verify_comment_signature now called on receipt - Reaction removal authorization: only reactor or post author can remove - BlobHeader author verification: lookup actual author from storage, don't trust payload Lock contention (4 fixes): - ManifestPush discovery: cm lock released before PostFetch I/O - Pull request handler: load under lock, filter without lock, brief re-lock for is_deleted - Pull sender: split into two brief locks (store posts, then batch upstream+sync) - Engagement checker: batch all chunk results, single lock for writes Data cleanup: - Post deletion cleans post_downstream, post_upstream, seen_engagement tables - Added TODO-hardening.md documenting remaining DOS/security/lock/data issues Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
204 lines
9.8 KiB
Markdown
204 lines
9.8 KiB
Markdown
# DOS Hardening TODO
|
|
|
|
Identified during v0.4.0 audit (2026-03-21). Implement before v0.4.1.
|
|
|
|
## CRITICAL — Lock Contention (v4-introduced)
|
|
|
|
### L1. ManifestPush discovery holds cm lock during network I/O (connection.rs:4847-4981)
|
|
- Spawned task grabs cm lock, calls send_post_fetch (QUIC I/O), waits for response — all while locked
|
|
- Every connection operation queues behind it (5s+ freeze possible)
|
|
- **Fix:** Gather connection handle before locking cm. PostFetch outside lock. Brief re-acquire for DB writes.
|
|
|
|
### L2. Pull request handler holds lock during filtering (connection.rs:1855-1905)
|
|
- Loads ALL posts, loops through checking visibility + timestamps while holding storage lock
|
|
- 50K posts = 500ms+ lock hold
|
|
- **Fix:** Load posts under lock (brief), release, filter without lock (CPU only), re-acquire briefly for is_deleted() on filtered subset.
|
|
|
|
### L3. Pull sender's second lock too long (connection.rs:1650-1721, 1572-1624)
|
|
- After receiving posts: store + add_upstream (count query each) + update_last_sync — all under one lock
|
|
- 100 posts = 100 inserts + 100 count queries + 20 author updates
|
|
- **Fix:** Split into two brief locks. First: bulk store posts. Second: batch upstream adds + last_sync updates. Collect unique authors during first lock.
|
|
|
|
### L4. Per-post engagement lock acquisitions (connection.rs:1777-1833)
|
|
- Lock acquired/released 100 times in tight loop (once per post)
|
|
- Each acquisition blocks behind other tasks
|
|
- **Fix:** Batch writes. Collect all engagement results, acquire lock once, write all. Network I/O already outside lock.
|
|
|
|
### L5. Stale follows query (node.rs:2528-2530) — LOW
|
|
- get_stale_follows every 60s, brief query, acceptable
|
|
- **Fix (optional):** Add index on follows(last_sync_ms) if missing
|
|
|
|
## HIGH Priority — DOS
|
|
|
|
### 1. Stream handler cap (connection.rs:4252, 4270)
|
|
- Max 10 concurrent workers per connection via Semaphore
|
|
- Excess streams wait, not spawned unbounded
|
|
|
|
### 2. Slot index memory bomb (connection.rs:5754-5792)
|
|
- Soft 1K slot limit per post
|
|
- Author can sign capacity increase that propagates via BlobHeaderDiff
|
|
- Without author signature, cap stays at 1K
|
|
- Consider thread-split pattern for overflow (already exists for 16KB comments)
|
|
|
|
### 3. ManifestPush amplification (connection.rs:4877-4936)
|
|
- Custom ManifestPush for new posts: only deliver [new_post_id, previous_post_id]
|
|
- Each CDN partner updates their own local manifest copy
|
|
- Same diff pattern useful for N+10 updates
|
|
- Lower bandwidth, low-priority background task
|
|
|
|
## MEDIUM Priority
|
|
|
|
### 4. Post list pagination (connection.rs:1857)
|
|
- Limit to 200 posts per pull response
|
|
- ~100KB memory, <5ms lock hold
|
|
- Next sync cycle catches remainder via since_ms timestamps
|
|
|
|
### 5. Eviction candidate cap (storage.rs:3678-3737)
|
|
- Limit to 100 candidates per batch
|
|
- ~40KB memory, <5ms lock hold
|
|
- Next 5-min cycle catches more if needed
|
|
|
|
### 6. Payload element abuse — CDN consensus check
|
|
- Before accepting a large engagement update, check 1-2 CDN neighbors
|
|
- "Does your header for this post look like this?" If not → reject
|
|
- Attacker must compromise multiple CDN nodes to pass
|
|
- No trust scoring needed — just peer corroboration
|
|
|
|
### 7. Lock acquisition timeouts (connection.rs: throughout)
|
|
- 5-second timeout on storage lock acquisition
|
|
- On timeout: skip operation, try next cycle
|
|
- Log: operation name, wait duration, who holds the lock
|
|
- Add `last_lock_holder: AtomicU64` storing hash of acquiring function name
|
|
|
|
### 8. Discovery task cap (connection.rs:4938-4980)
|
|
- One discovery task per peer at a time
|
|
- AtomicBool flag per connection, skip if already running
|
|
|
|
## LOW Priority
|
|
|
|
### 9. Engagement rate limiting
|
|
- Self-claimed: max 3 emoji + 1 comment per 10 seconds
|
|
- Chain-propagated: CDN consensus check from #6 applies
|
|
- Process only first 100 ops per BlobHeaderDiff message
|
|
|
|
### 10. Mesh stream spawn cap (connection.rs)
|
|
- Same as #1 — 10 max concurrent handlers per connection
|
|
- Supplements auth rate limiter (which handles connection-level, not stream-level)
|
|
|
|
### 11. Retry backoff per target
|
|
- Start at 5 seconds, triple on each failure
|
|
- 5s → 15s → 45s → 135s → 405s → 1215s → 3645s → 10935s → 14400s (cap at 4hr)
|
|
- 8 failures to hit max backoff
|
|
- Reset to 5s on success
|
|
- Track per target peer, not global
|
|
|
|
---
|
|
|
|
# Security Hardening TODO
|
|
|
|
Identified during v0.4.0 security audit (2026-03-21).
|
|
|
|
## CRITICAL — Immediate (before next public release)
|
|
|
|
### S1. Comment signature verification — ONE LINE FIX (connection.rs:5711-5724)
|
|
- `verify_comment_signature()` exists in crypto.rs but is NEVER called on receipt
|
|
- Add `if !crypto::verify_comment_signature(...) { continue; }` before `store_comment()`
|
|
- Infrastructure exists, just not wired up
|
|
|
|
### S2. Reaction removal auth check — TWO LINE FIX (connection.rs:5708-5709)
|
|
- `RemoveReaction` accepts from any sender, no auth
|
|
- Add: `if *reactor == sender || sender == payload.author { ... }`
|
|
- Same pattern already used in EditComment/DeleteComment
|
|
|
|
### S3. Reaction signature — ~30 lines (types.rs, crypto.rs, connection.rs)
|
|
- `Reaction` has no signature field — anyone can fake reactions from any NodeId
|
|
- Add `signature: Vec<u8>` to Reaction struct (#[serde(default)] for compat)
|
|
- Sign `(reactor + post_id + emoji + timestamp)` with reactor's ed25519 key
|
|
- Verify in handle_blob_header_diff before storing
|
|
- Follow existing `sign_comment` / `verify_comment_signature` pattern
|
|
|
|
### S4. BlobHeader author verification — ~5 lines (connection.rs:5821-5836)
|
|
- Header rebuild uses `payload.author` without checking against stored post author
|
|
- Look up actual author from `storage.get_post(&payload.post_id)`
|
|
- Use stored author, not payload-claimed author
|
|
|
|
## HIGH — Short-term
|
|
|
|
### S5. PostId verification in all paths (connection.rs)
|
|
- PostPush verifies with `verify_post_id()` but some pull paths don't
|
|
- Audit all `store_post_with_visibility` call sites
|
|
- Ensure `verify_post_id()` called before each store
|
|
|
|
### S6. Slot write protection — self-healing signature system (connection.rs:5749-5803)
|
|
- Problem: any peer can overwrite encrypted slots with garbage
|
|
- Solution (two layers):
|
|
1. CDN tree membership check: only accept slot writes from peers in post_downstream or post_upstream for that post. Rejects random peers.
|
|
2. Self-healing signatures: participants sign their own slot writes with the slot key (derived from CEK) and keep a local copy. On diff check, if their slot was overwritten with something they didn't sign, they re-write their signed version. Other participants verify signatures — keep the signed version, discard unsigned garbage. The legitimate version propagates through the CDN tree. Attacker must keep overwriting forever; the real version keeps coming back from every CDN node that received it.
|
|
- Relay nodes can't verify signatures (don't have CEK) but pass through all writes — participants do client-side verification on decrypt
|
|
|
|
### S7. Comment edit/delete cryptographic proof (connection.rs:5726-5736)
|
|
- Currently "trust-based" — checks sender == author at transport layer
|
|
- QUIC connection IS authenticated (iroh ed25519), so sender identity is verified
|
|
- Risk: compromised relay node
|
|
- Fix: require new signature over edited content (editor proves they hold private key)
|
|
- For post-author deletes: require post author's signature over delete request
|
|
|
|
### S8. Pull sync follow list privacy (connection.rs:1846-1915)
|
|
- PullSyncRequest sends entire follow list unencrypted to every sync peer
|
|
- Every mesh peer learns your complete social graph
|
|
- Options:
|
|
- Accept and document (mesh peers are semi-trusted infrastructure) — RECOMMENDED for now
|
|
- Bloom filter: probabilistic set, leaks less, some irrelevant posts received (acceptable bandwidth cost)
|
|
- Long-term: oblivious transfer / PIR (heavy crypto, probably not worth it for social network)
|
|
|
|
## MEDIUM — Design review
|
|
|
|
### S9. Nonce reuse guard (crypto.rs:54-56)
|
|
- ChaCha20-Poly1305 catastrophic on nonce reuse
|
|
- RNG is reliable on modern OS (getrandom syscall)
|
|
- Add sanity check: if nonce is all zeros after generation, panic rather than encrypt
|
|
- One-line guard
|
|
|
|
### S10. Slot timing metadata leakage (connection.rs:5757, 5776)
|
|
- `header.updated_at` changes on slot writes, leaking WHEN engagement occurs on private posts
|
|
- Passive observer can correlate timestamps with known user behavior
|
|
- Fix: round updated_at to 10-minute buckets for private posts
|
|
- Or batch slot writes on fixed schedule rather than immediately
|
|
|
|
### S11. Per-author engagement rate limiting (connection.rs:5699-5725)
|
|
- A peer can send 10,000 fake reactions in one BlobHeaderDiff
|
|
- Cap ops per message (100 max per DOS hardening #9)
|
|
- Deduplicate by (reactor, post_id, emoji) — storage already does ON CONFLICT DO UPDATE
|
|
- Combined with reaction signatures (S3), fake NodeId reactions become impossible
|
|
|
|
## LOW
|
|
|
|
---
|
|
|
|
# Data Cleanup TODO
|
|
|
|
### D1. post_downstream not cleaned on post delete (storage.rs delete_post)
|
|
- When a post is deleted, downstream registrations stay forever
|
|
- Fix: add `DELETE FROM post_downstream WHERE post_id = ?1` in delete_post()
|
|
- Also add: `DELETE FROM post_upstream WHERE post_id = ?1`
|
|
- Also add: `DELETE FROM seen_engagement WHERE post_id = ?1`
|
|
- One-line fixes each
|
|
|
|
### D2. Document BlobHeader-table relationship (storage.rs store_blob_header)
|
|
- Header JSON is a snapshot, reactions/comments tables are authoritative
|
|
- They can temporarily diverge (BlobHeaderResponse arrives with newer header than tables)
|
|
- Header rebuilt from tables on next engagement op
|
|
- Add clarifying comment to store_blob_header
|
|
|
|
---
|
|
|
|
# Low Priority
|
|
|
|
### S12. Hex parse error logging (web.rs:110-119)
|
|
- Malformed hex strings silently return 404
|
|
- Add debug logging for malformed inputs
|
|
|
|
### S13. Edit comment signature consistency (storage.rs:4338-4343)
|
|
- edit_comment updates content without updating signature
|
|
- If signature verification (S1) is enabled, edited comments would have invalid signatures
|
|
- Fix: add signature parameter to edit_comment, re-sign edited content
|