Karpathy's AutoResearch Ran 700 Experiments While We Slept

A 630-line autonomous research agent that runs experiments overnight and finds real improvements

The Problem

Running experiments is slow and manual. Karpathy's autoresearch flips this: an AI agent reads its own code, hypothesizes improvements, modifies the code, runs the experiment (5 min each), evaluates results, and loops. Over 2 days it ran 700 experiments and found 20 real improvements. Shopify CEO got 19% performance gain overnight.

Our Setup

We pointed autoresearch at a real business optimization challenge on our Hetzner GPU server. The agent gets 5-minute compute budget per experiment, runs overnight. We track experiments run, improvements found, and actual metric changes.

What We Expected

We expect at least 5 actionable insights from 100+ overnight experiments. The real test: can a non-ML business benefit from autonomous experimentation the way Shopify did?

Want updates on this experiment?

Get notified when we publish new results or start related experiments.