Hacker Newsnew | past | comments | ask | show | jobs | submit | xianshou's commentslogin

Even as someone extremely firmly on the other side of the AI debate, I must appreciate the craft.

Now, to give Claude the steganogravy skill...


Maybe call it steakanography so it stands out from mere steganogravy.

From the file: "Answer is always line 1. Reasoning comes after, never before."

LLMs are autoregressive (filling in the completion of what came before), so you'd better have thinking mode on or the "reasoning" is pure confirmation bias seeded by the answer that gets locked in via the first output tokens.


Yeah this seems to be a very bad idea. Seems like the author had the right idea, but the wrong way of implementing it.

There are a few papers actually that describe how to get faster results and more economic sessions by instructing the LLM how to compress its thinking (“CCoT” is a paper that I remember, compressed chain of thought). It basically tells the model to think like “a -> b”. There’s loss in quality, though, but not too much.

https://arxiv.org/abs/2412.13171


For the more important sessions, I like to have it revise the plan with a generic prompt (e.g. "perform a sanity check") just so that it can take another pass on the beginning portion of the plan with the benefit of additional context that it had reasoned out by the end of the first draft.


Is this true? Non-reasoning LLMs are autoregressive. Reasoning LLMs can emit thousands of reasoning tokens before "line 1" where they write the answer.


They are all autoregressive. They have just been trained to emit thinking tokens like any other tokens.


reasoning is just more tokens that come out first wrapped in <thinking></thinking>


there are no reasoning LLMs.


This is an interesting denial of reality.


A "reasoning" LLM is just an LLM that's been instructed or trained to start every response with some text wrapped in <BEGIN_REASONING></END_REASONING> or similar. The UI may show or obscure this part. Then when the model decides to give its "real" response, it has all that reasoning text in its context window, helping it generate a better answer.


I don't think Claude Code offers no thinking as an option. I'm seeing "low" thinking as the minimum.


Ugh. Dictated with such confidence. My god, I hate this LLMism the most. "Some directive. Always this, never that."


I appreciate not having to read this guy again.


Great work! Why no benchmarks though?


Nice! 5 bucks says you can swap this in for your average software kanban and it does a better job.


Safer than clawdbot/moltbot, I'll bet.


What makes you think it isn’t clawdbot under the hood?


it's not :)


Why? It seems just as likely to follow prompt injection commands.


Incidentally, Chroma also produced the single best study on long-context degradation that I've come across:

https://research.trychroma.com/context-rot

Before that, I cited nolima (https://www.reddit.com/r/LocalLLaMA/comments/1io3hn2/nolima_...) constantly to illustrate how difficult tasks involving reasoning or multi-step information gathering degraded much faster than the needle-in-haystack benchmarks cited by the major labs. Now Chroma is the first stop. Nice job on the research!


Came to point out that this is transparently LLM-authored, was not disappointed. The signs:

- neatly formatted lists with cute bolded titles (lower-casing this one just for that)

- ubiquitous subtitles like "Mental Health as Infrastructure" that only a committee would come up with

- emojis preceding every statement: "[sprout emoji] Every action and every word is a vote for who they are becoming"

- em-dash AND "it isn't X, it's Y", even in the same sentence: "Love isn't a feeling you wait to have—it's a series of actions you choose to take."

Could pick more, but I'll just say I'm 80% confident this is GPT-5 without thinking turned on.


Neatly-formatted lists Neatness could be a sign of a machine, or it could be a sign of a diligent human author.

Subtitles only a committee would come up with That seems to me like a matter of opinion and taste — and we all have different tastes.

Emojis preceding every statement I counted three emoji pull quotes in a multi-page document. I suppose it could be an LLM, but it could also just be a nice style.

Em-dashes and ‘it isn’t X, it’s Y' This is why I posted in the first place, and downvoted you. There is nothing wrong with em-dashes — I love them. I use them a lot. Frankly, I probably overuse them. I’ve used them since I was a kid: I am going to use them — and over-use them — as long as I live. As for ‘Love isn’t a feeling you wait to have — it’s a series of actions you choose to take,’ that just seems like normal English to me.

It’s very possible in 2025 that the article was LLM-written, or written by a man and cleaned up by an LLM, or written by a man and proofread by an LLM, or written by a man. It does not have the stilted feel of most LLM works to me, but I might just miss it.

An em-dash isn’t an indicator of an LLM — it’s a sign of someone who discovered typography early.


I initially read the title as "My 2.5 year old can write Space Invaders in JavaScript now (GLM-4.5 Air)."

Though I suppose, given a few years, that may also be true!


Given a few years your 2.5 year old will be a 5.5 year old, too!


Ugh don't remind me. My daughter's fifth birthday is tomorrow and with how fast she's growing I feel like her 15th is on Thursday.


Rug pulls from foundation labs are one thing, and I agree with the dangers of relying on future breakthroughs, but the open-source state of the art is already pretty amazing. Given the broad availability of open-weight models within under 6 months of SotA (DeepSeek, Qwen, previously Llama) and strong open-source tooling such as Roo and Codex, why would you expect AI-driven engineering to regress to a worse state than what we have today? If every AI company vanished tomorrow, we'd still have powerful automation and years of efficiency gains left from consolidation of tools and standards, all runnable on a single MacBook.


The problem is the knowledge encoded in the models. It's already pretty hit and miss, hooking up a search engine (or getting human content into the context some other way, e.g. copy pasting relevant StackOverflow answers) makes all the difference.

If people stop bothering to ask and answer questions online, where will the information come from?

Logically speaking, if there's going to be a continuous need for shared Q&A (which I presume), there will be mechanisms for that. So I don't really disagree with you. It's just that having the model just isn't enough, a lot of the time. And even if this sorts itself out eventually, we might be in for some memorable times in-between two good states.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: