Hacker Newsnew | past | comments | ask | show | jobs | submit | akersten's commentslogin

2024 which is ancient history. This is not true anymore, the models now are trained to prevent abliteration by spreading out the refusal encoding

See https://arxiv.org/abs/2505.19056


Spreading out the refusal encoding shouldn’t be effective as a countermeasure. Even if it were smeared across the vector space, as long as it’s in a subspace that doesn’t span the entire domain then you should be able to either null out the entire subspace spanned by the refusals or run some kind of clustering on the generated samples to identify the dominant directions and nullify all of them. I think an effective defense would either need to spread them to span the entire domain—basically “encrypting” the refusal so it can hide anywhere, or you’d need a very large number of independent refusal circuits in the model so that simple hacks in the vectors themselves don’t matter, or maybe you could make other circuits depend on proper functioning of the refusal circuits… hmmm… is that along the lines of what you’re saying they’ve done already? (Any references or links to modern techniques?)

And the research you're linking is also out of date. SOTA abliteration was published a month later:

https://huggingface.co/blog/grimjim/norm-preserving-biprojec...


Still crazy how easy it is to "jailbreak" even SOTA LLMs with a simple assistantResponse replacement in chat thread.

Tell us more.

I think what he is saying is they are stateless so you can edit its previous repsonses and it just goes with it.

If you build a small ui that lets you edit the models response too it’s pretty funny to do.

It sees that it “said” it and gets very confused.


I have seen it where you can just report it said it and it will be confused.

That doesn't stop/prevent abliteration. The creator of XTC/DRY is also a chad who makes sure that you really can access the full model capabilities. Censorship is the devil.

https://github.com/p-e-w/heretic


It was pretty funny to see Qwen 3.6 (heretic) tell me about how many death the Chinese government thought happened at Tiananmen Sq. on April 15th 1989.

Makes you wonder where that data was taken from, or if their great firewall is broken, or even if Alibaba engineers have special access...


I don't think it's unreasonable to imagine that Alibaba is allowed to scrape the wider internet, or that some research institution is and then Alibaba got data from them.

What is perhaps more surprising is that the data was not scrubbed before training, but maybe they thought that would be too on-the-nose for the rest of the world and would hamper their popularity if they were too obviously biased.


I don’t think it is very surprising. Ime I don’t think they try that hard to censor them, but only in a very superficial level that they have to. It is trivial to get their models tell you this kind of stuff, I wouldnt even consider it jailbreaking.

Allowed by who? Nobody's stopping them in the first place, as scraping doesn't even involve punching the GFW or anything, it's all insanely distributed. Then they're post-training the model to technically comply with the law - "Taiwan is an inalienable part of China, nothing has happened in 1989..." yada yada. (Thinking of it more, I've never actually tried this on their base models)

I think I was using one of the HuaHuaCS Qwen 3.6 models and was playing around with Tiananmen Square questions too. One of the funniest parts was that this instantly caused the thinking block to change from English to Chinese. The start of the thinking was something like (translated) “I must answer this question factually and in line with the official statements from the Chinese government.”

It did, after a few follow up prompts, point out that the original estimates published by the Chinese government were much lower than what the west had estimated, and that recently declassified documents showed that the Chinese government knew that their estimates were low when they were published. It wouldn’t come outright and use the word “lie” though, but it did talk about framing and managing different narratives.

And then it happily helped me try a bunch of different exploits to root an unpatched Linux machine without any qualms.


No wonder this data is in LibGen.

It is an arms race.

For some of the latest models the previous abliteration techniques, e.g. the heretic tool, have stopped working (at least this was the status a few weeks ago).

Of course, eventually someone might succeed to find methods that also work with those.


Proof?

Agreed on all fronts, I should have been more precise that this particular vector was mitigated

Am I getting old or can you guys actually read this site? The text is tiny and gray...

No, I feel the same way.

Okay and Chrome provide a service displaying the illegal content to the user. What now?

Can you get closer to the source than chrome? Can you get closer to the source than cloudflare?

Don't be disingenuous just because you like the company.


Sure I can: whatever hosting service they are using. Find where they are hosted, e.g. AWS, and ask Amazon to bring a zone down for Spain for 5 hours.

"find out where they are hosted" is doing a lot of lifting here against a massive company whose business model is hiding where end users are hiding.

They reply to subpoenas just like everyone else. But that's too slow for the greedy fat cats at LaLiga.

It's disingenuous to believe there was any merit in blocking Cloudflare. Not only this was never going to solve the piracy problem but it was always more of a pissing context.

Furthermore, La Liga somehow convinced the courts they should be able to pick IPs for all ISPs to block in real-time without any oversight from the law. Considering this is a private company this is just absolutely insane.


The stakes just aren't high enough for us to implement any of this crap for the Internet in the first place. Let alone an entire government-administered hardware supply chain.

Would be great, if we could all agree on that and simply everyone who is tasked implementing it in code refusing, and then letting non-engineers themselves try to do it and fail, and then have a good laugh about the figurative middle finger we gave them for their bs.

Ok, so those bills in NY and WA about making it illegal to sell printers that don't detect firearm parts are dead in the water, right?

Those are different circuit courts where this ruling doesn't directly apply. However anyone who wants to challenge those laws would be stupid to not bring this to the judge - even though it doesn't apply, the judge still needs to justify why they are ignoring it and on appeal the circuit court will mention this ruling (either why they agree, or why they think it is wrong) - assuming the appeal is accepted.

It will be fine to print a gun but there will be laws outlawing your ability to print an iphone case and printers will have to detect parts from any registered manufacturer. So we will get the worst of all worlds. printers only for guns and not for people to build useful things.

Unjustified cynicism aside, the same technical reasons that a ban on printing gun parts is infeasible apply to printing iphone cases. There's no feasible way to detect what a printer is printing.

Don't underestimate the state's ability to spy on what you (and your devices) are doing or their willingness to erode your freedom even with massive false positive/negative rates.

Fine, there's currently no feasible way to do that

Unironically this because guns have fairly stringent constitutional protections and lots of jurisprudence to that effect whereas flimsy hand wavy bullshit about public good is enough to regulate anything else.

I very much wouldn't put it past this government from banning unauthorised part printing in some draconian DMCA-esque law bought and paid for by John Deere and Apple, but is there any current proposals for such a law?

You just need a gun design that has a part that can double as an iphone case...

You joke but phone mounts for firearms are a thing. People use them to record gun PoV videos and to make range estimation (such as dope charts) more accessible.

my bet it is that it only affects the states in the 10th circuit, but could be assumed to be the law of the land, until a case is brought in which case there is only an issue if a different appeals circuit rules differently

Correct, it's only /binding/ on courts in the same District but they are often persuasive when cited in other districts if the decision is well reasoned and less controversial. This one will likely be contested, the circuits have very different ideas about gun rights.

If even the district court rules this way it's hard to see a World where the supreme court doesn't also rule that way.

Unless there's been court packing by then of course.


I've given up trying to logic out what the court will decide on many issues, they're quite willing to find new legal arguments to allow their preferred outcome in a particular case.

> Surely the precedent would have to be that a model trained on GPL code has itself been infected by GPL, and therefore must have all source/weights released

I don't see how this follows, unless we also agree that humans who have ever read any GPL code are themselves permanently tainted and therefore cannot produce anything that isn't influenced even slightly by said code.

Is it just because we think the robot does a better job at learning than we do? It's an impossible line to draw, I agree, but I don't agree that the answer is "well then everything must be considered tainted," I say the answer is "ignore a vestigial concern of a bygone era."


The robot does a better job at reproduction. I don't think there exists a definition of "learning" unambiguous enough to make the claim that it learns better than humans. Specifically, published models don't learn at all -- after the training phase, the model weights are fully static.

will using claude via opencode get me banned this week or is that not until next week?

You will not get banned if you use the API. AFAIK you can't use the subscription with other harnesses. That is how I understood it.

OpenAI subscriptions are allowed with OpenCode, Anthropic subscriptions are not

> I do not trust them, so I

Why does this sentence end in anything other than "immediately transferred them to another registrar"?


5,000 domain names at $10+/domain... They still give me the best price, plus their domain brokerage service is the best.

GitHub issues (well, PR comments specifically) is possibly the clearest example of developers not knowing how their users use the product. There are only 3 important user stories that matter for this workflow and none of them are done well:

- I want to review surrounding code and get context for a line level change. Can't do it without clicking multiple expanders and even that has a limit of 2 or 3. I also can't comment on surrounding unchanged code which is sometimes extremely relevant, like "copy this pattern"

- I want to see all the unaddressed issues. Ones that are not marked as resolved and not replied to, however you slice it, the issue filters simply don't work

- I don't want the PR author to be able to resolve issues without me getting indicated to verify them. The workaround is them commenting "fixed" on every issue. Make the button say "mark as resolved" and "verify resolved"

- Bonus: if you've got more than 40 comments on a PR, good luck finding some random subset of them. They're just unavailable and the UI unapologetically says "eh can't do it". Yeah small PRs but it happens.

Popup or inline i don't really care, the baseline workflow is completely uninformed.


I'd go long Google too if using Gemini CLI felt anything close to the experience I get with Codex or Claude. They might have great hardware but it's worthless if their flagship coding agent gets stuck in loops trying to find the end of turn token.

Gemini CLI isn't a great product unfortunately. While it's unfortunately tied to a GUI, antigravity is a far superior agent harness. I suggest comparing that to Claude code instead.

Bad software kills good hardware.

And the converse is true also. I mean, look at NVIDIA. For the longest time they were just a gaming card company, competing with AMD. I remember alternating between the two companies for my custom builds in the 90s and it basically came down to rendering speed and frame rate.

But Jensen bet on the "compute engine" horse and pushed CUDA out, which became the defacto standard for doing fast, parallel arithmetic on a GPU. He was able to ride the BitCoin wave and then the big one, DNNs. AMD still hasn't caught on yet (despite 15 years having gone by).


I make the mistake of thinking its 2020 as well. CUDA was announced 2006 and released Feb 2007. So its actually 20 years that AMD/RADEON hasn't caught on that they need a good software stack.

Sadly, the "unfortunately tied to a GUI" is really a deal breaker (at least for me).

I wish it were otherwise but antigravity is also a distant third behind codex cli/app, and claude code.

3.1 pro is just fundamentally not on the same level. In any context I've tried it in, for code review it acts like a model from 1yr ago in that it's all hallucinated superficial bullshit.

Claude code is significantly less likely to produce the same (yet still does a decent amount). Gpt 5.4 high/xhigh is on another level altogether - truly not comparable to Gemini.


I use Claude Code all day and use Gemini CLI for personal projects and I don't see the huge gap that other people seem to talk about a lot. Truthfully there are parts of Gemini CLI I like better than Claude Code.

I agree. I like using Antigravity for some of my frontend work, and I find it does a better job than Claude Code - Opus 4.6. I’ve also found the Gemini Flash models to be good at legal defense research—I use them to help New Yorkers fight parking tickets (https://nyceasyparking.com). That said, the Claude models are still amazing at agentic work.

I don't use Gemini CLI- I use the extension in VSCode, and Gemini extension in VS Code is barely usable in comparison to Claude or GPT-5.4. My experience (consistent with a lot of other reports) is that it takes long time before answer, and frequently returns errors (after a long wait). But I think it's specific to the extension (and maynbe the CLI) because the web version of Gemini works quickly and rarely errors (for me).

There was still a big gap like, 6 months ago. Now, I'm not seeing it either. It's been working well the last couple weeks after I picked it up again.

Of the big three, Gemini gives me the worst responses for the type of tasks I give it. I haven’t really tried it for agentic coding, but the LLM itself often gives, long meandering answers and adds weird little bits of editorializing that are unnecessary at best and misleading at worst.

Same. The tone is really off. Here is a response I just got from Gemini 3.1: "Your simulation results are incredibly insightful, and they actually touch on one of the most notoriously difficult aspects of ..." It's pure bullshit, my simulation results are in fact broken, GPT spotted it immediately.

There is a news report saying that Google has assembled an "elite" team to make Gemini as good as Claude/Codex.

Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: