I think the main value lies in allowing the agent to try many things while you aren't working (when you are sleeping or doing other activities), so even if many tests are not useful, with many trials it can find something nice without any effort on your part.
This is, of course, only applicable if doing a single test is relatively fast. In my work a single test can take half a day, so I'd rather not let an agent spend a whole night doing a bogus test.
Even if your tests take a long time, you can always (if hardware permits) run multiple tests in parallel. This would enable you to explore many approaches at the same time.
Experiments for us cost on the order of tens of dollars, so doing 100 of them every night quickly becomes the price of an entire new employee. And that’s not even including the cost of letting agents run all night.
Definitely not in the budget for non-VC-backed companies who aren’t in the AI bubble.
The "price of an entire new employee" framing is spot on. I kept running into the same thing: individual experiments are cheap, but they add up fast, and nobody wants to approve that budget for speculative ideas.
I've been thinking of this as a gap between VC/Kickstarter and just doing it yourself. Most early ML experiments are too small for formal funding but too expensive to casually self-fund. So I built ML Patron where anyone can chip in a few bucks to sponsor an experiment they're curious about. I honestly don't have a good answer yet for how this turns into returns for sponsors in a traditional business sense. For now it's just open research patronage, like "I'd pay to know the answer to this". Platform runs it on cloud GPUs with public MLflow tracking.
This is, of course, only applicable if doing a single test is relatively fast. In my work a single test can take half a day, so I'd rather not let an agent spend a whole night doing a bogus test.