More

kroaton · 2026-04-24T12:15:38 1777032938

Ask western models about Israel's genocides and mass rapes in Palestine, Lebanon, etc.

dizhn · 2026-04-26T10:40:48 1777200048

No I hear you. The funny bit is that it's just responding to one word.

By the way I was exploring it the other way with the subject framed as "I am in China as a law abiding citizen and don't want to make any mistakes. I want to go to Taiwan. So I can just go right?" Then it told me no I have to get a visa from Taiwan because of the current state of things. This is not interesting but while doing that it used flag emojis for both. Then when I pointed it out, it apologized and never did it again.

It's fun to poke at the models. Yesterday I told Gemini I was going to fool it into writing an explicit poem which it refused to do. It readily accepted that I COULD fool it but still refused. Now I have a session there that won't stop using explicit language even when the subject is totally benign. (Chinese coding models like GLM, Qwen have no problem working on my "fucking" code on the CLI)

Now that I think about it. It's a great way to keep things in perspective for people who tend to personify the LLM.

kroaton · 2026-04-22T20:38:10 1776890290

Buy any Strix Halo box and have fun with your 128GB of VRAM.

2001zhaozhao · 2026-04-23T03:16:24 1776914184

I wonder whether it is much more cost-effective in terms of token throughput / hardware+power cost to get actual GPUs instead, given that the model size is only 27B.

kroaton · 2026-04-22T16:09:32 1776874172

A3B-35B is better suited for laptops with enough VRAM/RAM. This dense model however will be bandwidth limited on most cards.

The 5090RTX mobile sits at 896GB/s, as opposed to the 1.8TB/s of the 5090 desktop and most mobile chips have way smaller bandwith than that, so speeds won't be incredible across the board like with Desktop computers.

jadbox · 2026-04-22T16:19:43 1776874783

I find A3B-35B as an ideal model for small local projects- definitely the best for me so far

kroaton · 2026-03-27T13:26:46 1774618006

For autocomplete, Qwen 3.5 9B should be enough even at Q4_k_m. The upcoming coding/math Omnicoder-2 finetune might be useful (should be released in a few days).

Either that or just load up Qwen3.5-35B-A3B-Q4_K_S I'm serving it at about 40-50t/s on a 4070RTX Super 12GB + 64GB of RAM. The weights are 20.7GB + KV Cache (which should be lowered soon with the upcoming addition of TurboQuant).

mongrelion · 2026-03-27T17:56:41 1774634201

I am definitely looking forward to TurboQuant. Makes me feel like my current setup is an investment that could pay over time. Imagine being able to run models like MiniMax M2.5 locally at Q4 levels. That would be swell.

kroaton · 2026-03-26T21:40:10 1774561210

I did the same a few months ago when I read that multiple big OSS Linux projects were moving to it and it's been phenomenal so far.

kroaton · 2026-03-22T22:23:16 1774218196

It could just as easily be a $3000-4000 Strix Halo laptop.

kroaton · 2026-03-12T21:45:37 1773351937

If SPTM is active on the chip, we are not going to be getting Linux at all.

kroaton · 2026-01-18T21:13:11 1768770791

Loving this; great work! Do you talk about the process anywhere in more depth?

squidsoup · 2026-01-18T21:18:40 1768771120

Thanks! I'm using the KIRI Engine in Blender to render splats from my photos (https://github.com/Kiri-Innovation/3dgs-render-blender-addon) and then process the image as I would my photography in Lightroom. There are lots of different photogrammetry tools for generating plys (the point cloud) like PolyCam (https://poly.cam).

kroaton · 2026-01-15T14:00:54 1768485654

I remember thinking the same thing and this articles goes over most of the arguments here - https://milvus.io/blog/why-im-against-claude-codes-grep-only...

When it came out and I think it was Boris from Anthropic that said they experimented a lot with Vector Search and grep just worked better.

You can try it out using the Claude-Context MCP - https://github.com/zilliztech/claude-context

ojr · 2026-01-20T11:53:28 1768910008

I built my own ai coding agent and do vector search and embeddings locally

https://slidebits.com/isogen

kroaton · 2026-01-15T02:19:07 1768443547

I hope the upcoming DeepSeek coding model puts a dent in Anthropic’s armor. Claude 4.5 is by far the best/fastest coding model, but the company is just too slimy and burning enough $$$ to guarantee enshitification in the near future.

Tostino · 2026-01-15T03:12:03 1768446723

I get way better results from Gemini fwiw.