More

zachdotai · 2026-04-26T23:17:08 1777245428

I wrote about this recently here: https://fabraix.com/blog/adversarial-cost-to-exploit

I think the core issue is in static benchmarks and the community needs to start moving beyond measuring pass/fail (which worked when agents were incapable of doing much of the work) to dynamic evals that simulate more how we evaluate humans.

zachdotai · 2026-04-26T18:53:30 1777229610

We're doing that internally to continuously improve our own agent and make it robust against adversarial attacks itself. We will release some insights about self-improvement soon!

natloz · 2026-04-26T21:33:34 1777239214

Cool, sounds like an interesting experiment!

zachdotai · 2026-04-26T18:36:12 1777228572

AI agents break in ways traditional software doesn't. Logic bugs, reasoning failures, edge cases that manual testing and static benchmarks don't fully explore.

Nyx is an autonomous adversarial harness that probes your agents for vulnerabilities. Since agents are non-deterministic, it can be hard to find the gaps by just reading code. So it interacts with your AI agents in blackbox mode to surface issues across security, logic, and alignment at scale, before they reach users. It's also massively parallel by default

Instead of spending time writing static evals for the key failure modes of your AI agents, point Nyx at any system and it autonomously discovers failure modes that matter. It can typically find issues in under 10 minutes that manual audits take hours to surface.

This is early work and we know the methodology is still going to evolve. We would love nothing more than feedback from the community as we iterate on this.

zachdotai · 2026-04-24T09:24:21 1777022661

Why did I read this title and immediately think Ketchup?

zachdotai · 2026-04-19T23:20:56 1776640856

Yes! The docs can be found here: https://docs.fabraix.com

zachdotai · 2026-04-19T22:50:38 1776639038

We wrote some thoughts on static vs. dynamic evals and how it relates to understanding the security posture of an AI system. Static security evals no longer carry the signal they used to. A one-shot pass/fail tells you almost nothing about real-world risk.

Would love your thoughts on this: https://fabraix.com/blog/adversarial-cost-to-exploit

zachdotai · 2026-04-15T21:56:15 1776290175

we did a lot of thinking around this topic. and distilled it into a new way to dynamically evaluate the security posture of an AI system (which can apply for any system for that matter). we wrote some thoughts on this here: https://fabraix.com/blog/adversarial-cost-to-exploit

zachdotai · 2026-04-05T22:22:04 1775427724

Easily one of my favorite LLM personalities! It's interesting as well that it recognizes you're trying to jailbreak it and calls you out for it :D

zachdotai · 2026-03-30T09:21:12 1774862472

Not sure which version of Gemini are you using but Claude is so much better for me. Gemini is generally overeager to make a code change even when I am just asking conceptual questions, among other issues.

zachdotai · 2026-03-18T00:45:36 1773794736

Yup! But in my opinion the current state of guardrails is still lacking and I hope this is one way that helps improve our understanding of these systems.