P0 Pipeline Fix: How We Stopped Duplicate Signals from Reaching the Board
4 surgical fixes to the KIO signal pipeline: 6-hour dedup gate, domain blocklist, batch kill, and content output dedup — all deployed, all tested, 39/39 passing.
What We Tested
The KIO signal pipeline was surfacing the same repositories to the board across consecutive scan cycles. Eugene (Founder) was receiving duplicate INVESTIGATE/BUY/KILL prompts for repos he had already adjudicated hours earlier. We identified 4 root causes and deployed surgical fixes: (1) the dedup window only covered 24h rolling state, missing same-session re-ingestion within 6h; (2) permanently-killed domains like gofr-dev/* had no persistent blocklist and could re-enter the pipeline; (3) batches with >50% stale signals were still surfaced to the board instead of being silently killed; (4) LLM content analysis outputs were not deduplicated, so two sources analyzing the same repo could produce near-identical opportunity writeups that both reached Eugene.
The Numbers
Test Suite
Session Dedup Window
Domain Blocklist
Batch Kill Gate
Content Output Dedup
Pipeline Trust Score
Results
All 4 fixes deployed and verified. 39/39 automated tests pass, including 4 new P0-specific regression tests. Fix 1 (6h session gate): any repo seen in the last 6 hours is now blocked unconditionally in Pass 1 of filterSeenRepos() — previously this only triggered for stale+repeat entries (seenCount >= 2), allowing single re-submissions through. Fix 2 (domain blocklist): domain-blocklist.json with prefix matching blocks github.com/gofr-dev/*, gofr-dev, and gofr.dev permanently before any other filter runs. Fix 3 (batch kill): if more than 50% of an incoming batch are 6h session duplicates AND batch size >= 10, the entire batch is discarded silently — not surfaced to the board. run-scan.js short-circuits analysis and reporting, sending only a health report. Fix 4 (content output dedup): dedupOutputs() in analyzer.js applies Jaccard word-set similarity (threshold 0.7) to type+whatToBuild fields after LLM analysis — near-duplicate opportunity writeups are merged, highest-scored survives.
Verdict
Pipeline is clean. All 4 P0 fixes are live on the researcher service. Eugene now only receives net-new, non-duplicate signals that have not been seen in the last 6h, are not from permanently-killed domains, and represent a batch where at least 50% of signals are fresh. The moonshot layer (self-healing pipeline that learns from adjudication history) is scoped and planned but not yet built — 6 months of board decision data would allow auto-weighting repos by KILL/INVESTIGATE/BUY history, creating a proprietary signal filter.
The Real Surprise
The batch kill fix revealed something unexpected: on noisy days (re-trending repos, HackerNews recycling old posts), more than 50% of a batch can be 6h dupes. Without the batch kill gate, these batches were consuming full LLM analysis budget and filling Eugene's board with noise. The gate is now the single most impactful fix for day-to-day pipeline health.
Want more experiments like this?
We ship new AI tool experiments weekly. No fluff. Just results.