Hacker Newsnew | past | comments | ask | show | jobs | submit | AlexC04's commentslogin

The analogy meanders a bit in getting to the point, but I generally enjoyed the journey of the read and also, I think this is starting to look like an increasingly important perspective.

Increasingly I've been asking the LLM to do things that are only seconds of work to do manually on my own. Earlier in the process of learning how to use these things I think that was a somewhat reasonable decision in the name of learning how these things work.

Now... It's maybe the case that getting a bit more practice in with keeping on top of how we do things the manual way ... might still have value? both for the token cost and perhaps for the value that gets unlocked through understanding your systems a little better?

I'm also a little torn since the leaders in the field of "tokens for everything" like Cherny and Karpathy are pretty vocal about not writing any code on their own anymore. While they also don't really pay for tokens the way the rest of us do (I'm out of pocket for mine) is there something to be said for joining that frontier wave of developers?


One other thing you might want to check out for running locally. (I have not independently verified yet, it's on the TODO list though)

https://docs.vllm.ai/en/latest/api/vllm/model_executor/layer...

vLLM apparently already has an implementation of turboquant available - which is said to losslessly reduce the memory footprint required by 6x and improve inference speed by 8x.

From what I understand, the steps are:

1. launch vLLM 2. execute a vLLM configure command like "use kv-turboquant for model xyz" 3. that's it

I've got two kids under 8 years old, a full time job, and a developer-tools project that takes like 105% of my mental interests... so there's been a bit of a challenge finding the time to swap from ollama to vLLM in order to find out if that is true.

SO buyer beware :D - and also - if anyone tries it, please let me know if it is worth the time to try it!


this is pretty interesting - just from the screenshots it looks like an electron app?

have you or had you considered pursuing a vscode extension? (and by extension cursor?)

In my free time I'm also building out a set of agentic tools and I've found that there's so much you get "for free" by working in the space where developers already are.

Git integrations, text editing, highlighting, etc...

Now - I'm still in pre-alpha local development only so who knows what my adoption story will look like after I actually get it onto the vscode marketplace, but I dunno - just making converstaion I guess :D

Anyways, this really seems like an interesting project and I'll take a deeper look outside of office hours! Thanks for sharing.


oooh! I really like your graph! https://github.com/adrq/agentbeacon/blob/main/docs/screensho... I'll have to pok in and see what you're using for that!


Thanks! d3-hierarchy for the tree layout, rendered with custom SVG in Svelte.


Thanks! Stack is simpler than it looks. It's a local Rust server that opens in your browser, with embedded frontend (`agentbeacon` → localhost). Desktop shell is planned but not shipped yet.

On the VSCode / Cursor extension angle, I thought about it a lot. Once you're running hierarchies of agents across multiple features in parallel, the work shifts away from any single file or repo. The editor is still a surface, but it's one surface for one thread. The decision queue is a surface for the fleet. An IDE extension would make it hard to manage multiple projects at the same time. There is probably some room for a lightweight glue extension eg jump from queue entry to file or something like that.

That said, you're dead right that there's huge "for free" wins in the IDE space, which is exactly why everyone converges there, though it comes with its own constraints.

Good luck with the pre-alpha. Would be curious to see it when you ship. Thanks for taking a look.


I first leanred about tree sitter a couple months back when I started looking at what was inside the NPM fodler for claude. It's a really cool library.

One of the things it made me think about is whether it made sense for using when editing large markdown files would it be more efficient to convert a document form markdown to DOM then back again for the purposes of editing a large markdown file via code agents? (or a json)

The theory being that agents are always asking me for pemission to use sed in bash to edit markdown files -- could tree-sitter do the same thing using its code-editing capabilities? And would that difference be materially impactful? Could I lower the token cost of writing an extensive plan by choosing a format that allows me to use tree sitter?

I really haven't explored that much yet since I've been working on other things but it was more just one of those things that make you go hmmmmm... maybe someone else knows :)


oooh ... I wonder if it would be sufficient to just write a markdown gramar for TS?! I should ask my AI what it thinks.. I'm sure it'll tell me I'm absoloutely right and a very good and smart boy.


this is really good. it aligns closely with what I've been writing on the same subject (but is way better :) )

nice share. thanks.

OH! I commented before I realized it was a product pitch ... literal seconds before she introduced ACE.

If I'm honest, the idea of a collaborative agentic coding interface is pretty novel and interesting. Still impressed with the article just the nature of my appreciation has changed.


It is just a product pitch with beta access (I don't have access to it) But also agree that the direction is good. I think once everyone using the 12 terminal agent setup emerges from the void, teamwork processes will need to have a rethink. So it's interesting to follow this and other projects. What is yours?


yeah - I thought the first half of the article (up until "introducing ace") was actually really great thoguh.


but how does it perform on pelican riding a bicycle bench? why are they hiding the truth?!

(edit: I hope this is an obvious joke. less facetiously these are pretty jaw dropping numbers)


We are all fans for Simon’s work, and his test is, strangely enough, quite good.


to directly answer this bit:

> Feels like a fundamental bottleneck for production agent systems, so would love to compare how you're thinking about the latency vs accuracy tradeoff.

I'm really not focusing on latency right now. My short term goal is to prove the thesis that `ail` can improve same-model performance on SWEBench Pro vs. their own published results.

Can I run swebp with GLM-4.6 and get a score better than their published `68.20` https://www.swebench.com/?

The argument is that the latency right now just isn't the part we should worry about. If we're reducing the time to code something from ~6 weeks to 1 hour... then does it really matter tha we add an other 30 minutes of tool calls if we get it 100% right vs. 80% right?

Make it work -> Make it right -> make it fast.

I'm still on the first one tbh :rofl-emoji:


so - my approach is still being built and I'm still very hand wavy around how it is going to come together, but effecively I'm building pipelines of prompts. Rather than running our LLM sequences as long running sessions where the entire context gets loaded on every turn (a recipe for rot), we unlock the ability to introduce a thinking layer at each step in between the process.

So before each turn is sent into the LLM we (potentially) run a local process to assemble a bespoke context of only what is required for that specific turn.

If a tool call is not going to be needed on the prompt, we don't include it in the system prompt on that round.

I'm still formalizing the spec at the moment and think I'm about six months to a year out before I have a full human ready UI running.

This is the foundational paper I'm basing the tool on: https://github.com/AlexChesser/ail/blob/main/docs/blog/the-y... while the spec starts here: https://github.com/AlexChesser/ail/blob/main/spec/core/s01-p...

Essentially I'm trying to build an artificial neocortex and frontal lobe to provide a complete layer of Executive Function that operates on top of our agents - like Claude Code (or whatever else).

I'm basing the roadmap on the about 100 years of cognitive science. We've legitimately had names for all these failure modes (in humans) since the 1960's. We have observations of what we're witnessing in agents from 1848.

We have the roadmap from Psychology.


this is a pretty important piece and the research backs you up. Moving that context out of your system prompt dynamically is going to help reduce your lost in the middle effect. Context rots almost immediately. I've got a project that is being built to address this directly as well, but I'm still very early days.

Keep it up! you're on the right track.

Hong, K., & Chroma Research Team. (2025). Context rot: How increasing input tokens impacts LLM performance. Chroma Research. https://research.trychroma.com/context-rot

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://doi.org/10.1162/tacl_a_00638


Hey! This looks a lot like what I'm working on, from a slightly different angle. I think you're on the right track. In fact, cortex as a name is perfect since you're effectively building the executive function layer for search and selection. I also think rust is the right language to go with.

I'm going do a deeper read of your work in a bit. I'd love it if you took a look at my theory of artificial cognition The YAML of the Mind https://alexchesser.medium.com/the-yaml-of-the-mind-8a4f945a..., dropped in to the `ail` project and let me know what you think.

I just have to get the kids to school and I'll pop back into cortex later


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: