Because the fundamental task many of these programs are doing is neither complicated nor resource intensive.
In the age of cheap custom software solutions everyone should at least try to make something themselves that's fit for purpose. It doesn't take much to be a dangerous professional these days, and certainly more than ever before can a dangerous professional be truly dangerous.
Thank you, I get so confused when people think a $5/vps shouldn't be able to do much. We're talking about 99% of small business that might have 5 concurrent users max.
2 gigs of ram should be considered overkill to cover every single business case for a variety of tools (analytics, mailer/newsletter, CRM, socials, e-commerce).
I heartily agree with you except for the ongoing childhood-screentime pandemic where kids aren't going outside to play, but instead are staying inside, alone, and maybe playing with others virtually, but with more exposure to harm (e.g. gambling). This is clearly going to cause some serious long term generational fallout.
I'm grateful that I got the best of both worlds. When I was young I could play outside with freedom and climb around on highly dangerous playground equipment and now that I'm older and more fragile I get to stay inside on the couch and play amazing video games all day.
It's a shame that kids today don't get the option to do crazy kid stuff while they're young and healthy enough to bounce back from injury. I can't blame the tech for that though. It's parents who don't restrict screentime and our society that thinks it's okay to call the police on parents who let their kids walk down the street unattended.
Agreed--we're already seeing some of that, and I fully support minimizing kids' exposure to that.
I probably should have been explicit that I don't think technology has no downsides--it most certainly does. It's just, IMHO, the benefits outweigh the risks. And, over time, we figure out how to ameliorate the downsides.
I think you're confused. Kindles need to be connected to the Internet so you can purchase and read books on them. The SIM card removed friction from the process e.g. buying books while on vacation or at the airport or whatever.
They didn't put SIM cards in there to spy on you. They were always an opt-in (at additional cost) option for a better user experience.
Modern medicine absolutely short-circuits natural selection. If you have an older sibling who was delivered via C-section chances are you wouldn't exist.
The large award for a medical malpractice trial was the reason for doctors pushing for a C-section if there’s any possibility of a complication. (Sometimes called defensive medicine.)
Most people point to the cases won by John Edwards, trial lawyer and vice presidential candidate as the reason for the great increase in C-sections. His case wins include 30 trials at which he won at least $1 million dollars each.
In my generation (80s-90s) pretty much everyone in Brazil that was born in a hospital was born through C-section. Only recently did the practice of defaulting to c-section is beginning to fade.
This is just a wrapper around sandbox-exec. It's nice that there are a ton of presets that have been thought out, since 90% of wielding sandbox-exec is correctly scoping it to whatever the inner environment requires (the other 90% is figuring out how sandbox-exec works).
I like that it's just a shell script.
I do wish that there was a simple way to sandbox programs with an overlay or copy-on-write semantics (or better yet bind mounts). I don't care if, in the process of doing some work, an LLM agent modifies .bashrc -- I only care if it modifies _my_ .bashrc
Thanks, I picked Bash because I’m scared of all Go and Rust binaries out there!
Re “overlay FS” - I too wish this was possible on Macs, but the closest I got was restricting agents to be read-only outside of CWD which, after a few turns, bullies them into working in $TMP. Not the same though.
I took a more paranoid approach to sandboxing agents. They can do whatever they want inside their container, and then I choose which of their changes to apply outside as commits:
┌─ YOLO shell ──────────────────────┬─ Outer shell ─────────────────────┐
│ │ │
│ yoloai new myproject . -a │ │
│ │ │
│ # Tell the agent what to do, │ │
│ # have it commit when done. │ │
│ │ yoloai diff myproject │
│ │ yoloai apply myproject │
│ │ # Review and accept the commits. │
│ │ │
│ # ... next task, next commit ... │ │
│ │ yoloai apply myproject │
│ │ │
│ │ # When you have a good set of │
│ │ # commits, push: │
│ │ git push │
│ │ │
│ │ # Done? Tear it down: │
│ │ yoloai destroy myproject │
└───────────────────────────────────┴───────────────────────────────────┘
Works with Docker, Seatbelt, and Tart backends (I've even had it build an iOS app inside a seatbelt container).
I've been working on an OSS project, Amika[1], to quickly spin up local or remote sandboxes for coding workloads. We support copy-on-write semantics locally (well, "copy-and-then-write" for now... we just copy directories to a temp file-tree).
It's tailored to play nicely with Git: spin up sandboxes form CLI, expose TCP/UDP ports of apps to check your work, and if running hosted sandboxes, share the sandbox URLs with teammates. I basically want running sandboxed agents to be as easy as `git clone ...`.
Docs are early and edges are rough. This week I'm starting to dogfood all my dev using Amika. Feedback is super appreciated!
FYI: we are also a startup, but local sandbox mgmt will stay OSS.
This is just a thin wrapper over Docker. It still doesn't offer what I want. I can't run macOS apps, and if I'm doing any sort of compilation, now I need a cross-compile toolchain (and need to target two platforms??).
Just use Docker, or a VM.
The other issue is that this does not facilitate unpredictable file access -- I have to mount everything up front. Sometimes you don't know what you need. And even then copying in and out is very different from a true overlay.
It sounds like a big part of your use case is to safely give an agent control of your computer? Like, for things besides codegen?
We're probably not going to directly support that type of use case, since we're focused on code-gen agents and migrating their work between localhost and the cloud.
We are going to add dynamic filesystem mounting, for after sandbox creation. Haven't figured out the exact implementation yet. Might be a FUSE layer we build ourselves. Mutagen is pretty interesting as well here.
This is what I was going for with Treebeard[0]. It is sandbox-exec, worktrees, and COW/overlay filesystem. The overlay filesystem is nice, in that you have access to git-ignored files in the original directory without having to worry about those files being modified in the original (due to the COW semantics). Though, truthfully, I haven’t found myself using it much since getting it all working.
This approach is too complex for what is provided. You're better off just making a copy of the tree and simply using sandbox-exec. macFUSE is a shitshow.
The main issue I want to solve is unexpected writes to arbitrary paths should be allowed but ultimately discarded. macOS simply doesn't offer a way to namespace the filesystem in that way.
Completely agree; my approach was not the most practical. I mostly wanted to know how hard it would be and, as I said, haven’t used it much since. Yes, macFUSE is messy to rely upon.
I feel as though the right abstraction is simply unavailable on macOS. Something akin to chroot jails — I don’t feel like I need a particularly hardened sandbox for agentic coding. I just need something that will prevent the stupid mistakes that are particularly damaging.
It's quite naive to assume that. There is a reason why it is deprecated by Apple.
Apple is likely preparing to remove it for a secure alternative and all it takes is someone to find a single or a bunch of multiple vulnerabilities in sandbox-exec to give a wake up call to everyone why were they using it in the first place.
I predict that there is a CVE lurking in sandbox-exec waiting to be discovered.
On the other hand, the underlying functionality for sandboxing is used heavily throughout the OS, both for App Sandboxes and for Apple’s own system processes. My guess is sandbox-exec is deprecated more because it never was adequately documented rather than because it’s flawed in some way.
> the underlying functionality for sandboxing is used heavily throughout the OS, both for App Sandboxes and for Apple’s own system processes.
The security researchers will leverage every part of the OS stack to bypass the sandbox in XNU which they have done multiple times.
Now, there is a good reason for them to break the sandbox thanks to the hype of 'agents'. It could even take a single file to break it. [0]
> My guess is sandbox-exec is deprecated more because it never was adequately documented rather than because it’s flawed in some way.
You do not know that. I am saying that it has been bypassed before and having it being used all over the OS doesn't mean anything. It actually makes it worse.
You could apply this same reasoning to any feature or technology. Yes there could be a zero day nobody knows about. We could say that about ssh or WebKit or Chrome too.
I hear what you're saying about the deprecation status, but as I and others mentioned, the fact that the underlying functionality is heavily used throughout the OS by non deprecated features puts it on more solid footing than a technology that's an island unto itself.
As I understand it, Chrome, Claude Code, and OpenAI Codex all use sandbox-exec. I’m not sure Apple could remove it even if they were sufficiently motivated to.
Is it though? If the way that I’m going to edit those files is by typing the same natural language command into Claude code, and the edit operation to maintain it takes 20 seconds instead of 10, to me that seems pretty materially the same
Modern developers are predisposed to reach for off the shelf solutions, full stop. They're afraid of, or perhaps allergic to, just reading and writing files.
If you can learn to get past this you can unlock a whole universe of problem solving.
This post is amusing to me because after solving the problem in ~2 seconds the author boils the ocean to get that down further, then finally ends with questioning what the problem statement even is?
Classic software engineer pitfall. First gather the requirements!
Second, if their initial interpretation was correct, and it's a one-shot operation, then the initial solution solves it. Done! Why go any further?
I get that it's fun to muse over solutions to these types of problems but the absurdity of it all made me laugh. Jeff's answer was the best, because it describes a solution which makes the assumptions crystal clear while outlining a straightforward implementation. If you wanted something else, it's obvious you need to clarify.
They don't actually solve the problem in 2 seconds - at that point, they are running on a sample of only 3,000 vectors! Then they get it down further, but still find it will take a loooooong time to get through all 3B:
"With these small improvements, we’ve already sped up inference to ~13 seconds for 3 million vectors, which means for 3 billion, it would take 1000x longer, or ~3216 minutes." ...which is about two days.
In the age of cheap custom software solutions everyone should at least try to make something themselves that's fit for purpose. It doesn't take much to be a dangerous professional these days, and certainly more than ever before can a dangerous professional be truly dangerous.