Hacker Newsnew | past | comments | ask | show | jobs | submit | zachdotai's commentslogin

I wrote about this recently here: https://fabraix.com/blog/adversarial-cost-to-exploit

I think the core issue is in static benchmarks and the community needs to start moving beyond measuring pass/fail (which worked when agents were incapable of doing much of the work) to dynamic evals that simulate more how we evaluate humans.


We're doing that internally to continuously improve our own agent and make it robust against adversarial attacks itself. We will release some insights about self-improvement soon!

Cool, sounds like an interesting experiment!

AI agents break in ways traditional software doesn't. Logic bugs, reasoning failures, edge cases that manual testing and static benchmarks don't fully explore.

Nyx is an autonomous adversarial harness that probes your agents for vulnerabilities. Since agents are non-deterministic, it can be hard to find the gaps by just reading code. So it interacts with your AI agents in blackbox mode to surface issues across security, logic, and alignment at scale, before they reach users. It's also massively parallel by default

Instead of spending time writing static evals for the key failure modes of your AI agents, point Nyx at any system and it autonomously discovers failure modes that matter. It can typically find issues in under 10 minutes that manual audits take hours to surface.

This is early work and we know the methodology is still going to evolve. We would love nothing more than feedback from the community as we iterate on this.


Why did I read this title and immediately think Ketchup?

Yes! The docs can be found here: https://docs.fabraix.com

We wrote some thoughts on static vs. dynamic evals and how it relates to understanding the security posture of an AI system. Static security evals no longer carry the signal they used to. A one-shot pass/fail tells you almost nothing about real-world risk.

Would love your thoughts on this: https://fabraix.com/blog/adversarial-cost-to-exploit


we did a lot of thinking around this topic. and distilled it into a new way to dynamically evaluate the security posture of an AI system (which can apply for any system for that matter). we wrote some thoughts on this here: https://fabraix.com/blog/adversarial-cost-to-exploit

Easily one of my favorite LLM personalities! It's interesting as well that it recognizes you're trying to jailbreak it and calls you out for it :D


Not sure which version of Gemini are you using but Claude is so much better for me. Gemini is generally overeager to make a code change even when I am just asking conceptual questions, among other issues.


Yup! But in my opinion the current state of guardrails is still lacking and I hope this is one way that helps improve our understanding of these systems.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: