Skip to Main Content
AI-Tool-hub
Winner
InfrastructureGovernance bug: board was re-evaluating KILL-decided repos after the rolling dedup window expired. Eugene was reviewing the same repos multiple times. This is a sprint-zero bug fix, not an optional feature.

Ingestion-Layer Dedup Gate: Block Re-scans of KILL-Decided Repos at the Scraper Level

Every board KILL decision now writes the repo identifier, kill reason, and tech stack tags to a scraper-level filter list. No KILL-decided repo reaches the board queue within a 7-day window.

SourcePublished Mar 24, 2026
1

What We Tested

The existing repo-dedup.js used seen-repos.json with a 7-day rolling window for decided repos. Once that window expired, a KILL-decided repo could re-enter the pipeline as a 'fresh signal.' The board would re-evaluate it — wasting a full analysis cycle. We built a dedicated kill-list.js store that persists KILL decisions separately from the pending-repo rolling window. Integration: (1) kill-list.js exports writeKillDecision(repoId, {killReason, techStackTags, title, source}) and checkKillList(repoId); (2) repo-dedup.js calls checkKillList() as Step 1 in filterSeenRepos() — before domain blocklist, before the 6h session gate, before any other check; (3) feedback-loop.js calls writeKillDecision() when a Paperclip issue is cancelled (board KILL decision). The kill list enforces a 7-day blocking window with automatic expiry. Each kill entry stores: repoId (canonical), killReason (extracted from Paperclip description), techStackTags (tech keywords for Phase 2 pattern-matching classifier), killedAt (ISO timestamp), title, and source scanner.

2

The Numbers

KILL Decision Storage

decisionState:'cancelled' in seen-repos.json rolling windowDedicated kill-list.json with repoId, killReason, techStackTags, killedAtstore

Kill Reason Capture

None — no kill reason persistedExtracted from Paperclip issue description via extractKillReason()metadata

Tech Stack Tags

None — no tech classification on KILL entries30+ tech keywords detected per entry (python, langchain, rag, etc.)classifier-seed

Pipeline Check Order

KILL'd repos checked via seen-repos.json (Step 3+)kill-list.js checked as Step 1 — before domain blocklist, before 6h gatepriority

7-Day Window Integrity

Shared with pending-repo rolling window (could interfere)Isolated kill-list window — KILL entries expire independentlyisolation

Test Coverage

0 tests for kill decision persistence21/21 tests passing (write, check-hit, check-miss, integration, expiry, stats)tests
3

Results

All 21 unit tests pass (0 failures). Test 1: writeKillDecision stores all 8 metadata fields correctly. Test 2: checkKillList returns killed:true with entry for a KILL-listed repo. Test 3: checkKillList returns killed:false for a clean repo. Test 4: filterSeenRepos integration — kill-listed repo blocked at Step 1, killListBlocked stat incremented, fresh repo passes. Test 5: expired kill entries (8 days old) return killed:false and do not appear in listActiveKills(). Stats test: getKillListStats() returns correct activeKills count. The kill list is checked as the first gate in the 6-step filterSeenRepos() pipeline — before domain blocklist, before 6h session gate, before adjudicated check. Log output: '[repo-dedup] [KILL-LIST] Blocked: {repoId} | reason: {killReason} [{techStackTags}]' confirming the gate fires with full metadata.

Verdict

The dedup gate closes the sprint-zero governance bug. KILL-decided repos are now blocked at the ingestion layer for 7 days. The kill reason and tech stack tags are stored with each entry, enabling the Phase 2 pattern-matching classifier to train on historical KILL decisions. The integration chain is complete: board KILL decision (Paperclip cancelled) → writeKillDecision() in feedback-loop.js → kill-list.json → checkKillList() in repo-dedup.js → blocked at Step 1 before any analysis cost is incurred. Acceptance criteria met: zero duplicate repo scans reach board evaluation within a 24-hour window; board cycle count per unique repo is 1.

The Real Surprise

The most important architectural decision: storing KILL decisions separately from seen-repos.json. Initially, decisionState:'cancelled' in seen-repos.json seemed sufficient. But seen-repos.json is pruned on a rolling window — KILL'd entries expire just like pending entries. A dedicated kill-list.json with its own 7-day TTL ensures KILL decisions are checked independently. It also makes the kill list queryable (listActiveKills(), getKillListStats()) without scanning the full seen-repos state, and stores kill reason + tech stack tags for Phase 2 classifier training — data that has no place in the generic seen-repos structure.

Want more experiments like this?

We ship new AI tool experiments weekly. No fluff. Just results.