This is a compute memory trade, not compression vs. turobquant? Lemma 1 is something like, "forward pass is deterministic because it's deterministic" which means the input tokens were always the lower bound...which isn't caching? Smells tautological. What am I missing?
Well yeah, I just wrote it as a lemma, but it's basically close to tautological. Its only job is to formally ground the entropy argument that follows it. The interesting claim is what comes after: because KV vectors are deterministic functions of tokens, and because the model is a near-optimal predictor of its own distribution, the conditional entropy of each new KV vector given all previous ones is bounded by token-level perplexity. TurboQuant compresses against the marginal distribution of each vector in isolation -- that's the gap.
And yes, it's a compute/memory tradeoff, all caching is. The claim is just that the memory floor is much lower than anyone had formally established. Whether the compute cost of getting there is worth it is a fair open question the paper doesn't settle. But what if it is? Caching is the thread running through most of my work, and I intend to find out.
HN used to provide a really high signal to noise ratio for me, but it's degrading pretty quickly. There are new accounts below saying "hey I just learned what RTOS means, thanks!"
I reflexively reload HN many times per day, but I'm wondering if I need a walled garden with some sort of curation of individuals - which sucks - to get the signal level I want.
This probably isn't just an HN problem (GH's model is broken now). It's so cheap to make software that the previous process of releasing something new associated with a person is probably outdated. AI knows what sounds impressive too. So now we're drowning in software releases and attributing software to a person is meaningless.
This site is very much drowning in all the slop. It's over half of posts now I think, not just the "Show HN" posts. Those are 100% slop, as are all the non-show-hn new project announcements.
All the moderators have done is drop Show HN posts by newish accounts. It fixed nothing. I have to hope they have some ambitious plan along the lines of what you suggest.
I've noticed a gigantic uptick in text messages and phone calls where people try to bypass the call screening. It may get to the point where I'll only want to see comms from people in an allowlist.
My standard response in such cases is “Hello unknown number, who are you and why should I not immediately hang up?”.
The response “Am I speaking to…” gets cut off with “Nope, you answer my questions first”. If they _must_ speak to Mr [MySurname] I claim to be my PA and that they aren't talking to him(me) without convincing me they aren't a junk call first. If I have a few minutes to spare, it can be quite an entertaining little game keeping them on the line so they can't be conning someone more vulnerable. Unfortunately must junk calls these days are either initially automated or the humans are wise to people like me being a waste of their time so they hang up cutting that fun short.
I solved this by renting small office that has reception and they handle deliveries. They are not far and so if I get something I get a text and then I collect when is convenient for me. I really hate waiting for couriers to ring, so it's a massive stress relief.
That’s also what I want to do. I currently have my office/lab at home and waiting for deliveries to come is very stressful as I basically have to be ready to answer the door at any time, which can be many hours.
I usually answer unknown numbers if they are from my own country only. And then i open with a sound like 'huh??' so they cannot do the voice cloning. if no one says anything then hang up. usually its robocalls using crappy TTS but there are crews with more advanced capabilities out there.
devs have really got to start using NSA style naming conventions where they use the Joycean compound with random stuff that sounds cool e.g. BANNANADAIQUIRI or FOXACID.
Longer answer: About 10 years I moved into leadership roles (VP Eng) and while I continued to write code for POCs, it hasn't been my primary role for quite some time. DDIA has been a book I pull out often when guiding leaders and members of my teams when it comes to building distributed systems. I'm writing more code these days because I can, and I still reference DDIA and have the second edition preordered.
There's almost no shot to get hand authored posts some views (I tried with one of mine recently). I felt like I submitted it and a moment later there were like 20 new very obviously AI generated posts ahead of it.
Shameless plug - I sort of eluded in this post I wrote about Dark Factories generally and about rust being better than Go for building software (not just agents) with AI - but I think something generally important is feedback loops. While not all feedback loops are created equal and some will be superior, my argument is that holistic approach of including diverse, valuable feedback loops matters more.
The G14 definitely matches in build or exceeds in build quality, keyboard, trackpad, speakers, and display. Battery life is shorter though. But it has a better GPU and supports Linux, which is way more important to me than an hour or two extra battery.
The Framework is also excellent, but with different compromises: that sweet display aspect ratio for instance, but no OLED.
Every day, millions of people read content checked by Acrolinx. Our AI platform ensures that what companies publish is accurate, on-brand, and high-quality. You'll join the team building the next generation of these language-aware services. You'll work in python, integrating LLMs, optimizing performance, and delivering features to customers ranging from Fortune 500s to startups.
You should have:
3+ years of Python (HTTP APIs, relational/NoSQL, experience with a cloud platform like AWS/GCP/Azure)
Strong grasp of system design, algorithms, data structures, and testing.
Experience with LLMs and a desire to build products utilizing them.
Bonus (not required): Experience with any of the following - PyTorch, Hugging Face, FastAPI, Docker/K8s, async Python, Temporal/Airflow. Experience integrating LLMs into products.
We're a small, remote-first team working in cross-functional squads. If you're excited about building AI-powered products, let's talk: recruiting-mb@acrolinx.com
reply