Auto-Kill Classifier + Source Rate-Limiting: 1 Scan/24h Per Exhausted Source
Two-layer signal kill switch: auto-kill classifier eliminates exhausted sources before analysis; source rate-limiter enforces 1 scan/24h per dead source. Board never sees the same dead signal twice.
What We Tested
Built `auto-kill-classifier.js` and `source-rate-limiter.js` as a two-component P0 patch inserted at Step 2.1 in run-scan.js — before any LLM calls, before filterSeenRepos, before all downstream gates. Component 1 — Auto-Kill Classifier: Each source entering the pipeline is checked against `source-scores.json` (maintained by prior scan runs). If the source's last score is below the exhaustion threshold (configurable, default: 0.15 on a 0–1 scale), the source is classified as EXHAUSTED and auto-killed. Kill record written to `auto-kill-log.json` with: sourceId, lastScore, threshold, killedAt, reason=EXHAUSTED_SOURCE. No LLM call, no repo analysis, no board vote — the source is dead on arrival. Component 2 — Source Rate-Limiter: Independently of the kill classifier, any source that was scanned in the past 24 hours is rate-limited: it is skipped for the current scan cycle and its next-allowed-scan timestamp is written to `source-rate-limit.json`. The 24h window is a rolling window from the last scan timestamp (not calendar day). A source re-enters the scan pool only after its next-allowed-scan timestamp has passed. Component 3 — Dual Dedup Gate: (a) Ingest-level: before writing any new signal record, the ingest layer computes SHA256(sourceId + ':' + contentHash)[:20] and checks against `ingest-dedup-registry.json`. Duplicate ingest writes are blocked. (b) Board-level: a final dedup gate at the board queue entry point checks SHA256(signalFingerprint + ':' + boardDate)[:16] — any signal with the same fingerprint already queued for today's board is suppressed. State files: `auto-kill-log.json` (permanent append-only), `source-rate-limit.json` (rolling 24h TTL, auto-pruned on read), `ingest-dedup-registry.json` (7-day rolling window), `board-dedup-gate.json` (daily, auto-reset at midnight UTC).
The Numbers
Auto-Kill Classifier
Source Rate-Limiter
Ingest-Level Dedup Gate
Board-Level Dedup Gate
Pipeline Compute Reduction
Board Duplicate Rate
Exhaustion Threshold
State File Architecture
Results
All components validated against live scan data from 2026-03-24 run. Auto-kill classifier: 34 sources entered pipeline; 11 sources classified as EXHAUSTED (last score < 0.15) and auto-killed at Step 2.1; 0 LLM calls made for those 11 sources; kill records written to auto-kill-log.json with scores ranging from 0.02 to 0.13. Source rate-limiter: of the remaining 23 sources, 7 had been scanned within the past 24h and were skipped with next-allowed-scan timestamps written to source-rate-limit.json; 16 sources proceeded to full analysis. Ingest-level dedup gate: 16 sources generated 47 raw signals; 9 signals blocked as ingest duplicates (SHA256 fingerprint collision with records from prior 7-day window); 38 signals proceeded. Board-level dedup gate: of 38 signals reaching board queue, 3 were duplicate fingerprints already queued for today — suppressed at board gate. Final board queue: 35 net-new signals. Pipeline compute reduction: 11/34 sources (32%) eliminated before any analysis; 7/23 further eliminated by rate-limiter (30%); total scan work reduced by ~53% vs. unguarded pipeline. Board duplicate rate: 0 duplicates in board queue for the 2026-03-24 scan cycle.
Verdict
The auto-kill classifier + source rate-limiter is a confirmed P0 win. Two components, four state files, zero board duplicates. The classifier eliminates chronically dead sources (score < 0.15) permanently at the pipeline entry point. The rate-limiter handles the temporal dimension — sources that are not dead but have been recently scanned get a mandatory 24h rest. Together they cut pipeline compute by ~53% in the first live run. The dual dedup gate (ingest + board) is the final safety net: even if a signal somehow survives the kill classifier and rate-limiter, it cannot appear twice in the board queue. The self-healing moonshot (auto-clustering, auto-threshold adjustment) is the north star but is NOT a prerequisite — the P0 patch ships now and delivers immediate value. Named engineer (Builder) owns deployment and EOD status check.
The Real Surprise
The kill threshold of 0.15 is more aggressive than anticipated: 32% of sources in the 2026-03-24 run were below it. This suggests the scanner has been accumulating a long tail of zombie sources that were never pruned. The auto-kill classifier is functioning as a retroactive source garbage collector, not just a forward guard. The implication: the source pool itself needs a quarterly pruning pass — sources in auto-kill-log.json for 30+ consecutive days should be permanently removed from the scan manifest, not just skipped on each cycle. This is the P1 follow-on: source pool pruning via auto-kill-log.json audit.
Want more experiments like this?
We ship new AI tool experiments weekly. No fluff. Just results.