v0.4.1: Security hardening, lock contention fixes, data cleanup
Security: - Reaction signatures: ed25519 sign/verify (sign_reaction, verify_reaction_signature) Backward-compatible — unsigned reactions from old nodes still accepted - Comment signature verification: verify_comment_signature now called on receipt - Reaction removal authorization: only reactor or post author can remove - BlobHeader author verification: lookup actual author from storage, don't trust payload Lock contention (4 fixes): - ManifestPush discovery: cm lock released before PostFetch I/O - Pull request handler: load under lock, filter without lock, brief re-lock for is_deleted - Pull sender: split into two brief locks (store posts, then batch upstream+sync) - Engagement checker: batch all chunk results, single lock for writes Data cleanup: - Post deletion cleans post_downstream, post_upstream, seen_engagement tables - Added TODO-hardening.md documenting remaining DOS/security/lock/data issues Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
bbaacf9b6c
commit
bb6f2b64b0
11 changed files with 500 additions and 138 deletions
204
docs/TODO-hardening.md
Normal file
204
docs/TODO-hardening.md
Normal file
|
|
@ -0,0 +1,204 @@
|
|||
# DOS Hardening TODO
|
||||
|
||||
Identified during v0.4.0 audit (2026-03-21). Implement before v0.4.1.
|
||||
|
||||
## CRITICAL — Lock Contention (v4-introduced)
|
||||
|
||||
### L1. ManifestPush discovery holds cm lock during network I/O (connection.rs:4847-4981)
|
||||
- Spawned task grabs cm lock, calls send_post_fetch (QUIC I/O), waits for response — all while locked
|
||||
- Every connection operation queues behind it (5s+ freeze possible)
|
||||
- **Fix:** Gather connection handle before locking cm. PostFetch outside lock. Brief re-acquire for DB writes.
|
||||
|
||||
### L2. Pull request handler holds lock during filtering (connection.rs:1855-1905)
|
||||
- Loads ALL posts, loops through checking visibility + timestamps while holding storage lock
|
||||
- 50K posts = 500ms+ lock hold
|
||||
- **Fix:** Load posts under lock (brief), release, filter without lock (CPU only), re-acquire briefly for is_deleted() on filtered subset.
|
||||
|
||||
### L3. Pull sender's second lock too long (connection.rs:1650-1721, 1572-1624)
|
||||
- After receiving posts: store + add_upstream (count query each) + update_last_sync — all under one lock
|
||||
- 100 posts = 100 inserts + 100 count queries + 20 author updates
|
||||
- **Fix:** Split into two brief locks. First: bulk store posts. Second: batch upstream adds + last_sync updates. Collect unique authors during first lock.
|
||||
|
||||
### L4. Per-post engagement lock acquisitions (connection.rs:1777-1833)
|
||||
- Lock acquired/released 100 times in tight loop (once per post)
|
||||
- Each acquisition blocks behind other tasks
|
||||
- **Fix:** Batch writes. Collect all engagement results, acquire lock once, write all. Network I/O already outside lock.
|
||||
|
||||
### L5. Stale follows query (node.rs:2528-2530) — LOW
|
||||
- get_stale_follows every 60s, brief query, acceptable
|
||||
- **Fix (optional):** Add index on follows(last_sync_ms) if missing
|
||||
|
||||
## HIGH Priority — DOS
|
||||
|
||||
### 1. Stream handler cap (connection.rs:4252, 4270)
|
||||
- Max 10 concurrent workers per connection via Semaphore
|
||||
- Excess streams wait, not spawned unbounded
|
||||
|
||||
### 2. Slot index memory bomb (connection.rs:5754-5792)
|
||||
- Soft 1K slot limit per post
|
||||
- Author can sign capacity increase that propagates via BlobHeaderDiff
|
||||
- Without author signature, cap stays at 1K
|
||||
- Consider thread-split pattern for overflow (already exists for 16KB comments)
|
||||
|
||||
### 3. ManifestPush amplification (connection.rs:4877-4936)
|
||||
- Custom ManifestPush for new posts: only deliver [new_post_id, previous_post_id]
|
||||
- Each CDN partner updates their own local manifest copy
|
||||
- Same diff pattern useful for N+10 updates
|
||||
- Lower bandwidth, low-priority background task
|
||||
|
||||
## MEDIUM Priority
|
||||
|
||||
### 4. Post list pagination (connection.rs:1857)
|
||||
- Limit to 200 posts per pull response
|
||||
- ~100KB memory, <5ms lock hold
|
||||
- Next sync cycle catches remainder via since_ms timestamps
|
||||
|
||||
### 5. Eviction candidate cap (storage.rs:3678-3737)
|
||||
- Limit to 100 candidates per batch
|
||||
- ~40KB memory, <5ms lock hold
|
||||
- Next 5-min cycle catches more if needed
|
||||
|
||||
### 6. Payload element abuse — CDN consensus check
|
||||
- Before accepting a large engagement update, check 1-2 CDN neighbors
|
||||
- "Does your header for this post look like this?" If not → reject
|
||||
- Attacker must compromise multiple CDN nodes to pass
|
||||
- No trust scoring needed — just peer corroboration
|
||||
|
||||
### 7. Lock acquisition timeouts (connection.rs: throughout)
|
||||
- 5-second timeout on storage lock acquisition
|
||||
- On timeout: skip operation, try next cycle
|
||||
- Log: operation name, wait duration, who holds the lock
|
||||
- Add `last_lock_holder: AtomicU64` storing hash of acquiring function name
|
||||
|
||||
### 8. Discovery task cap (connection.rs:4938-4980)
|
||||
- One discovery task per peer at a time
|
||||
- AtomicBool flag per connection, skip if already running
|
||||
|
||||
## LOW Priority
|
||||
|
||||
### 9. Engagement rate limiting
|
||||
- Self-claimed: max 3 emoji + 1 comment per 10 seconds
|
||||
- Chain-propagated: CDN consensus check from #6 applies
|
||||
- Process only first 100 ops per BlobHeaderDiff message
|
||||
|
||||
### 10. Mesh stream spawn cap (connection.rs)
|
||||
- Same as #1 — 10 max concurrent handlers per connection
|
||||
- Supplements auth rate limiter (which handles connection-level, not stream-level)
|
||||
|
||||
### 11. Retry backoff per target
|
||||
- Start at 5 seconds, triple on each failure
|
||||
- 5s → 15s → 45s → 135s → 405s → 1215s → 3645s → 10935s → 14400s (cap at 4hr)
|
||||
- 8 failures to hit max backoff
|
||||
- Reset to 5s on success
|
||||
- Track per target peer, not global
|
||||
|
||||
---
|
||||
|
||||
# Security Hardening TODO
|
||||
|
||||
Identified during v0.4.0 security audit (2026-03-21).
|
||||
|
||||
## CRITICAL — Immediate (before next public release)
|
||||
|
||||
### S1. Comment signature verification — ONE LINE FIX (connection.rs:5711-5724)
|
||||
- `verify_comment_signature()` exists in crypto.rs but is NEVER called on receipt
|
||||
- Add `if !crypto::verify_comment_signature(...) { continue; }` before `store_comment()`
|
||||
- Infrastructure exists, just not wired up
|
||||
|
||||
### S2. Reaction removal auth check — TWO LINE FIX (connection.rs:5708-5709)
|
||||
- `RemoveReaction` accepts from any sender, no auth
|
||||
- Add: `if *reactor == sender || sender == payload.author { ... }`
|
||||
- Same pattern already used in EditComment/DeleteComment
|
||||
|
||||
### S3. Reaction signature — ~30 lines (types.rs, crypto.rs, connection.rs)
|
||||
- `Reaction` has no signature field — anyone can fake reactions from any NodeId
|
||||
- Add `signature: Vec<u8>` to Reaction struct (#[serde(default)] for compat)
|
||||
- Sign `(reactor + post_id + emoji + timestamp)` with reactor's ed25519 key
|
||||
- Verify in handle_blob_header_diff before storing
|
||||
- Follow existing `sign_comment` / `verify_comment_signature` pattern
|
||||
|
||||
### S4. BlobHeader author verification — ~5 lines (connection.rs:5821-5836)
|
||||
- Header rebuild uses `payload.author` without checking against stored post author
|
||||
- Look up actual author from `storage.get_post(&payload.post_id)`
|
||||
- Use stored author, not payload-claimed author
|
||||
|
||||
## HIGH — Short-term
|
||||
|
||||
### S5. PostId verification in all paths (connection.rs)
|
||||
- PostPush verifies with `verify_post_id()` but some pull paths don't
|
||||
- Audit all `store_post_with_visibility` call sites
|
||||
- Ensure `verify_post_id()` called before each store
|
||||
|
||||
### S6. Slot write protection — self-healing signature system (connection.rs:5749-5803)
|
||||
- Problem: any peer can overwrite encrypted slots with garbage
|
||||
- Solution (two layers):
|
||||
1. CDN tree membership check: only accept slot writes from peers in post_downstream or post_upstream for that post. Rejects random peers.
|
||||
2. Self-healing signatures: participants sign their own slot writes with the slot key (derived from CEK) and keep a local copy. On diff check, if their slot was overwritten with something they didn't sign, they re-write their signed version. Other participants verify signatures — keep the signed version, discard unsigned garbage. The legitimate version propagates through the CDN tree. Attacker must keep overwriting forever; the real version keeps coming back from every CDN node that received it.
|
||||
- Relay nodes can't verify signatures (don't have CEK) but pass through all writes — participants do client-side verification on decrypt
|
||||
|
||||
### S7. Comment edit/delete cryptographic proof (connection.rs:5726-5736)
|
||||
- Currently "trust-based" — checks sender == author at transport layer
|
||||
- QUIC connection IS authenticated (iroh ed25519), so sender identity is verified
|
||||
- Risk: compromised relay node
|
||||
- Fix: require new signature over edited content (editor proves they hold private key)
|
||||
- For post-author deletes: require post author's signature over delete request
|
||||
|
||||
### S8. Pull sync follow list privacy (connection.rs:1846-1915)
|
||||
- PullSyncRequest sends entire follow list unencrypted to every sync peer
|
||||
- Every mesh peer learns your complete social graph
|
||||
- Options:
|
||||
- Accept and document (mesh peers are semi-trusted infrastructure) — RECOMMENDED for now
|
||||
- Bloom filter: probabilistic set, leaks less, some irrelevant posts received (acceptable bandwidth cost)
|
||||
- Long-term: oblivious transfer / PIR (heavy crypto, probably not worth it for social network)
|
||||
|
||||
## MEDIUM — Design review
|
||||
|
||||
### S9. Nonce reuse guard (crypto.rs:54-56)
|
||||
- ChaCha20-Poly1305 catastrophic on nonce reuse
|
||||
- RNG is reliable on modern OS (getrandom syscall)
|
||||
- Add sanity check: if nonce is all zeros after generation, panic rather than encrypt
|
||||
- One-line guard
|
||||
|
||||
### S10. Slot timing metadata leakage (connection.rs:5757, 5776)
|
||||
- `header.updated_at` changes on slot writes, leaking WHEN engagement occurs on private posts
|
||||
- Passive observer can correlate timestamps with known user behavior
|
||||
- Fix: round updated_at to 10-minute buckets for private posts
|
||||
- Or batch slot writes on fixed schedule rather than immediately
|
||||
|
||||
### S11. Per-author engagement rate limiting (connection.rs:5699-5725)
|
||||
- A peer can send 10,000 fake reactions in one BlobHeaderDiff
|
||||
- Cap ops per message (100 max per DOS hardening #9)
|
||||
- Deduplicate by (reactor, post_id, emoji) — storage already does ON CONFLICT DO UPDATE
|
||||
- Combined with reaction signatures (S3), fake NodeId reactions become impossible
|
||||
|
||||
## LOW
|
||||
|
||||
---
|
||||
|
||||
# Data Cleanup TODO
|
||||
|
||||
### D1. post_downstream not cleaned on post delete (storage.rs delete_post)
|
||||
- When a post is deleted, downstream registrations stay forever
|
||||
- Fix: add `DELETE FROM post_downstream WHERE post_id = ?1` in delete_post()
|
||||
- Also add: `DELETE FROM post_upstream WHERE post_id = ?1`
|
||||
- Also add: `DELETE FROM seen_engagement WHERE post_id = ?1`
|
||||
- One-line fixes each
|
||||
|
||||
### D2. Document BlobHeader-table relationship (storage.rs store_blob_header)
|
||||
- Header JSON is a snapshot, reactions/comments tables are authoritative
|
||||
- They can temporarily diverge (BlobHeaderResponse arrives with newer header than tables)
|
||||
- Header rebuilt from tables on next engagement op
|
||||
- Add clarifying comment to store_blob_header
|
||||
|
||||
---
|
||||
|
||||
# Low Priority
|
||||
|
||||
### S12. Hex parse error logging (web.rs:110-119)
|
||||
- Malformed hex strings silently return 404
|
||||
- Add debug logging for malformed inputs
|
||||
|
||||
### S13. Edit comment signature consistency (storage.rs:4338-4343)
|
||||
- edit_comment updates content without updating signature
|
||||
- If signature verification (S1) is enabled, edited comments would have invalid signatures
|
||||
- Fix: add signature parameter to edit_comment, re-sign edited content
|
||||
Loading…
Add table
Add a link
Reference in a new issue