Karpathy's AutoResearch Ran 700 Experiments While We Slept
A 630-line autonomous research agent that runs experiments overnight and finds real improvements
The Problem
Running experiments is slow and manual. Karpathy's autoresearch flips this: an AI agent reads its own code, hypothesizes improvements, modifies the code, runs the experiment (5 min each), evaluates results, and loops. Over 2 days it ran 700 experiments and found 20 real improvements. Shopify CEO got 19% performance gain overnight.
Our Setup
We pointed autoresearch at a real business optimization challenge on our Hetzner GPU server. The agent gets 5-minute compute budget per experiment, runs overnight. We track experiments run, improvements found, and actual metric changes.
What We Expected
We expect at least 5 actionable insights from 100+ overnight experiments. The real test: can a non-ML business benefit from autonomous experimentation the way Shopify did?
Want updates on this experiment?
Get notified when we publish new results or start related experiments.