More

jaccola · 2026-04-29T12:59:27 1777467567

Yes one could potentially increase accuracy greatly. One big problem would be occlusion.

There is already a solution to this that would be very hard to beat (and one can choose to use or not use an LLM to assist): prepare food yourself and use the information provided by the manufacturer.

amazingamazing · 2026-04-29T13:19:48 1777468788

If you consider time at all what you suggest is hardly a solution. It is the most accurate, but even 50% accuracy at orders of magnitude faster to calculate would be more useful for the main use case which is losing weight.

However for diabetes accuracy is likely preferred and I’m not sure any computer vision would be palatable.

jaccola · 2026-04-29T12:49:54 1777466994

It’s just an impossible problem. Photons don’t provide sufficient information to determine calories (at least not in any way they could practically be captured). Inside that sandwich could be drenched with olive oil or it could be hollow cheese with lettuce. It’s impossible to tell.

2ndorderthought · 2026-04-29T12:55:07 1777467307

The average person has no idea this is true. And the average person cannot tell when this is the case. So we have a bunch of people, going their way through school, and then when they get stuck relying on AI. The future is gonna be wild.

lordleft · 2026-04-29T12:57:56 1777467476

Yep. And it doesn't help that the people selling AI products act as if they're going to build God. Going, "well AI can't do that" isn't going to fly when you are lax about communicating its limitations!

2ndorderthought · 2026-04-29T13:06:35 1777467995

It also doesn't help when the messaging is linked to how "there will be no jobs where you use your brain anymore everything will be automated". What motivation does the average 16 year old have to try hard and learn anything beyond what they immediately need.

No jobs, ai Jesus is coming, and if you use ai it will use all of the worlds compute power to try to convince you it's correct even when it's not.

renticulous · 2026-04-29T13:10:34 1777468234

Here's technical literacy of population on display. I love these prank examples which show the true education of populace.

https://www.youtube.com/shorts/B7c9qJcRnVk

dredmorbius · 2026-04-29T16:40:08 1777480808

A more robust measurement might be the (former) US Department of Education's "Adult Literacy in the United States" survey, most recently conducted in 2019. The results of this are sobering enough:

<https://nces.ed.gov/pubs2019/2019179/index.asp>.

There's a related study of adult technical literacy conducted in 33 OECD nations:

<https://www.oecd-ilibrary.org/education/skills-matter_978926...>.

Both show that only a small fraction (5--10%) of adults operate at high levels of literacy (whether of text, numeracy, or technology), and that a large fraction (roughly 50%) operate at a minimal or below-minimal rate.

fcarraldo · 2026-04-29T13:15:58 1777468558

True education? What idiot would say yes to this?

Even if you _know_ the debit card transaction is safe, there’s no reason to risk it when a weirdo is filming you with some wild contraption.

rcxdude · 2026-04-29T13:44:15 1777470255

Anything like this is going to have a very heavy selection bias, don't take any of this kind of content as a reflection of the average person.

WarmWash · 2026-04-29T14:09:18 1777471758

Many of of witnessed the technical literacy of the general population when we ran to show them ChatGPT 3.5, and they just kind of shrugged like "So? What are you showing me?"

engineer_22 · 2026-04-29T12:58:49 1777467529

I am asking a lot here, but school needs to be training people what AI is and what it's weaknesses are and how to use it... My school taught me to use a calculator. It also taught me how to check my work when I relied on the calculator.

AI is a very complicated calculator - you give it an input, magic happens, it gives you an output. Really no different, to a layman.

jaccola · 2026-04-29T13:04:27 1777467867

To be fair, this should probably be covered by basic physics/maybe cooking classes. “You can’t determine the calories in food by looking at it” isn’t really ML specific.

2ndorderthought · 2026-04-29T13:14:42 1777468482

Won't help much if kids are ai'ing their way through physics then ten years later need to go on a diet having not applied the knowledge possibly ever or exercised their critical thinking skills

engineer_22 · 2026-04-30T22:04:22 1777586662

Yes, the documentary movie 'Idiocracy' comes to mind

garciasn · 2026-04-29T13:05:56 1777467956

Considering the lack of basic math skills I encounter each and every day, I don't think schools did enough; they certainly aren't going to do enough w/LLMs.

Ekaros · 2026-04-29T13:27:00 1777469220

Knowing the lack of understanding of basic chemistry and physics like fundamental thermodynamics... I have little hope any population can be trained to understand LLMs sufficiently...

engineer_22 · 2026-04-30T22:05:34 1777586734

My HVAC guy doesn't understand thermodynamics, he can't even spell it. He's just trained to follow a flow chart. LLM will be the same.

2ndorderthought · 2026-04-29T13:04:18 1777467858

It's more complicated than a calculator. Even researchers who have dedicated their lives to the field don't know all of the limitations of any given model. That fact alone isn't helpful when a model is 80% correct in one area but 2% in another.

pirates · 2026-04-29T14:03:30 1777471410

If even experts in the field don’t know all of the limitations then it’s even more important to stress that relying on the output of an LLM is a poor choice without additional checking and verification.

Even with calculators, I was taught that you should double check by hand sometimes to make sure you got it right.

engineer_22 · 2026-04-30T22:10:05 1777587005

Yes OP is making a similar point, limitations need to be better communicated.

lesuorac · 2026-04-29T13:41:20 1777470080

If you're looking for a citation about this, the 1999 Dunning-Kurger paper "Unskilled and unaware" [1] is about this.

People who are unskilled at a task are unaware of what that task performed correctly is. So, somebody who can't count calories is unable to tell that the AI can't perform the task correctly either.

[1]: https://pubmed.ncbi.nlm.nih.gov/10626367/

hombre_fatal · 2026-04-29T14:05:04 1777471504

Fwiw invoking Dunning-Kruger is beyond trite at this point.

Which is a good thing because it means we can talk like normal humans ("people don't know that it's unreliable") instead of acting like we're making such a profound claim that it needs a citation and psychological dissection.

drrotmos · 2026-04-29T13:42:16 1777470136

It is and it isn't. If you ask a human how many calories (or carbs) are in that sandwich, they can give you a qualified guess based on how a sandwich like that is typically constructed. They may not know the calories for a slice of bread or a slice of cheese by heart, but if you give them a food database, they can look it up.

They absolutely won't be 100% correct (bread sizes e.g. are going to be an estimate), but unless it's a trick sandwich drenched in olive oil or with hollow cheese, they're probably going to be in the right ballpark.

I don't think it's outside the realm of possibility for an LLM to be in the right ballpark as well, but that doesn't seem to be where we're at now.

arc-in-space · 2026-04-30T08:38:42 1777538322

I'm surprised how many comments in this thread swear by the position that you literally can't tell based on a picture, as if eating trick foods designed to mislead you was an everyday occurence. Most of the time in typical use you could make a reasonable guess, maybe with some obvious caveats such as "well idk if that Coke is Diet or not so"

muwtyhg · 2026-04-29T16:14:03 1777479243

And furthermore, once the person has "determined" how many calories the sandwich contains, they are likely to give you the same answer next time you ask instead of randomly changing their minds.

ozgung · 2026-04-29T14:15:21 1777472121

As a human, in the photo of that sandwich I see 4 slices of bread and 4 slices of cheese (distributed unevenly). I have no idea about the weight of the bread, flour type or its sugar content. I don't know the type of the cheese, dimensions of the slices or total amount of cheese inside the bread. I don't know if there is butter or anything else inside. I can guess the size of the plate as a size reference but I can't be sure. Human or AI, it's an ill-posed problem. There can be widely different estimates which can be equally plausible.

bcjdjsndon · 2026-04-29T14:20:19 1777472419

But why would the same llm give you wildly different answers EACH TIME you ask?

pkaye · 2026-04-29T15:32:36 1777476756

There is a parameter in LLMs called temperature that controls creativity/randomness. If you set it to 0 it makes the model deterministic. I think some LLMs expose this as a tunable parameter.

jihadjihad · 2026-04-29T17:19:59 1777483199

> If you set it to 0 it makes the model deterministic.

No, it doesn't. It can help make the model more deterministic, but it does not guarantee it.

azakai · 2026-04-29T17:54:12 1777485252

The hardware can also add nondeterminism. GPUs reorder operations, leading to different results.

Vendors might also be running A/B testing or who knows what, even when you ask for a temperature of 0.

But, if you run a fixed model with temperature 0 on your local CPU, it will be deterministic (unless there are bugs).

muwtyhg · 2026-04-29T16:12:55 1777479175

The study used a temperature of 0.01.

> "Thirteen food photographs were each submitted 495–561 times to four LLM vision APIs (GPT-5.4, Claude Sonnet 4.6, Gemini 2.5 Pro, Gemini 3.1 Pro Preview) using an identical structured prompt adapted from the iAPS automated insulin delivery system (26,904 total queries, temperature 0.01)"

zdragnar · 2026-04-29T14:32:26 1777473146

Because that's how they work? They aren't knowledge machines, they are random generators.

bcjdjsndon · 2026-04-29T14:38:56 1777473536

They're next word predictors. They explicitly add in randomness at various stages of the transformer itself, otherwise it'd be too obvious it's not actually intelligent and just a next word predictor

pertymcpert · 2026-04-29T16:20:13 1777479613

No that's not why.

YeGoblynQueenne · 2026-04-29T15:11:59 1777475519

It's not impossible to tell. Diabetics and others with dietary restrictions, have to do that sort of thing every day to decide what they can and can't eat. If you pick up a loaf of bread at the baker's, the baker usually has no idea the amount of carbs, or salt, or sugar is in that loaf of bread. Try it. Ask the baker: "how much carbs in this loaf of bread?". They'll just stare at you. They can tell you whether the loaf has salt or sugar in it but can't tell you how much because they don't calculate the amount by loaf. So if you have dietary restrictions you have to know what you can and can't eat and that requires the ability to judge the contents of a piece of food from the way it looks.

Photons don't carry that information? Sure. But you don't just have photons to go by. You can rely on a large database of prior knowledge about how food is usually made and with what ingredients.

Other people who have to rely on their imperfect human senses to decide what they can and can't eat: people with allergies, people with heart problems, hypertensives, kidney patients, etc. etc.

bryanlarsen · 2026-04-29T13:20:53 1777468853

The question isn't about calories, it's about carbs. Drenching that sandwhich in olive oil won't change its carb count. From the picture it's a thin cheese sandwich -- we can see cheese and we can see it's thin enough that there's little else. Might be no butter, might be lots of butter, but that won't affect carb count. If there's lettuce in the sandwhich there's likely a negligble amount. Hand it to a knowledgeable human and you're going to get a very consistent carb reading -- 30g, the value of two slices of wonder bread.

It could be much different -- it could one of those breads with weird macros, or fake cheese, or it could be hollowed out and packed full of hidden vegetables. But a human is going to give you the answer for two slices of plain white bread.

beached_whale · 2026-04-29T13:11:19 1777468279

From personal experience, one can get practically close guessing such that the error isn't going to be more significant compared to the errors in insulin to carb ratios/sensitivity factors/...

I am pretty good at this and the cheese sandwich example threw me, I would have estimated around 10-15g of carb for each slice. So the 28g is fairly consistent with that, not 40g. The only real way would be to weigh it and use the labeling. Another thing that often gets people is the labeling often has a serving size of say 2 slices and a weight that does not reflect the actual weight of 2 slices.

Luckily with good tools the significance is reduced, people using closed loop insulin pumps will automatically correct for that. Lots more room to wiggle.

Ekaros · 2026-04-29T12:58:10 1777467490

Then it should refuse to answer 100% of time.

falcor84 · 2026-04-29T13:08:59 1777468139

I don't think refusal is the right approach. I would much prefer that it respond with something like:

> There is not enough information to make an accurate estimate, but if you'd like, I can take a stab at it. If so, how much effort to put into it?

> Yes, go ahead and spend up to 5mins and $1 to analyze it.

> Done, I've had 100 subagents analyze the image and have arrived at a 95% confidence interval of the portion containing ...

muwtyhg · 2026-04-29T16:15:23 1777479323

I know this is just an example but my eyes kind of bugged out thinking about paying $1 every time I want to estimate the calories in my sandwich.

jaccola · 2026-04-29T13:03:03 1777467783

Indeed, I think any reasonable human might say “A few hundred calories but without measuring the ingredients I might be way off”. I think LLMs could get there, I don’t see anything stopping that. Though they have been notoriously bad at this so far.

dredmorbius · 2026-04-29T16:43:51 1777481031

If the problem is so evidently impossible then the LLM itself should recognise this, state that the problem isn't solvable, not* provide what's certain to be an inaccurate result, and suggest better approaches to arriving at a reasonable answer.

That said, it's notable that diabetes education materials often suggest estimating glycemic loads by rough portion size / plate ratios. Which is to say that absent accurate weight measurements (themselves subject to variations in ingredients, moisture levels, etc.) current clinical recommendations are themselves pretty rough.

Aurornis · 2026-04-29T13:28:52 1777469332

That’s exactly the point of this article.

Many of the comments here assume the authors are stupid and were surprised by the result, but the point of the article is to inform readers that AI carb counting apps don’t work. That’s why they did the study.

jeroenhd · 2026-04-29T13:06:23 1777467983

It's not even impossible from a technical point of view.

Your cheese sandwich may contain a lot more or a lot less calories, even if you take the numbers from the packaging and calculate the correct ratios by weight. The calories on the label are based on an average and individual packages may contain more or less of any listed nutrient to some margin. Of course, counting calories is meaningless if not done on a long-term scale anyway, but on a long-term scale the LLM doesn't need to guess the correct amount either.

unsupp0rted · 2026-04-29T13:04:00 1777467840

And what if that guy in the surveillance video is just 2 kids in a trench coat? There's no way for AI to be sure from the photons: we should scrap it.

The_Blade · 2026-04-30T22:35:58 1777588558

usually Claude Code and I just tape a bunch of cats together

ge96 · 2026-04-29T13:07:53 1777468073

I was thinking at least if you had an advanced phone with lidar like iPhones can get volume but yeah the hidden/inner mass is a problem plus the oil as mentioned

tsimionescu · 2026-04-29T13:11:05 1777468265

This is a bad take. If LLMs are supposed to work as general purpose assistants, as they are being sold as by both the companies making them and by the majority of AI believers, then it is very much a solvable problem. The LLM could give a high level estimate (a sandwich is not going to be 0 Cal, and it's not going to be 5000 Cal, so you can give some kind of range), and then ask for the type of information needed to make a more accurate estimate.

p-e-w · 2026-04-29T13:00:25 1777467625

Then the correct answer is “I can’t tell.”

Not “Here’s a random guess that I just pulled out of my ass.”

LLMs have picked up the bad habit of trying to give an answer when no answer can be given from scientists, who overall don’t say “I don’t know” nearly as often as they should.

jeroenhd · 2026-04-29T13:20:06 1777468806

I tried asking LLMs about food before. They all say "I can't tell for certain, but this is an estimate based on the ingredients I can spot/infer/guess".

You need to write a specific prompt to avoid any warnings.

Of course a lot of people don't know what limitations LLMs have, so there's some value to a blog post about it, but it's not as black-and-white as the article might suggest with its graphs.

The prompt (documented here: https://www.diabettech.com/wp-content/uploads/2026/04/Supple...) lists specific instructions and a specific output format that doesn't allow the LLM any room for explanation or warning in processable data (only in notes fields). In fact, the prompt explicitly tells the LLM to ignore visual inferencing for some statistics and to rely on a nutrition authority instead.

Even in that intentionally restricted format, the English language output uses words like "roughly" and "estimated" in the LLMs I've tested.

Sure, if you take the numeric values and plot them in graphs, you get wildly inconsistent results, but that research method intentionally restricts the usefulness and reliability of the LLMs being researched.

What's much more troubling is this line from the preprint:

> The open-source iAPS automated insulin delivery (AID) system now offers food analysis through APIs from OpenAI, Anthropic and Google [8]

The linked app does seem to have a disclaimer, though:

> "AI nutritional estimates are approximations only. Always consult with your healthcare provider for medical decisions. Verify nutritional information whenever possible. Use at your own risk."

Ukv · 2026-04-29T13:27:43 1777469263

> Then the correct answer is “I can’t tell.”

From the paper they're using structured JSON schema mode opposed to freeform answers, so it can't. Models do typically caveat their answer for questions like this, in my experience.

professoretc · 2026-04-29T13:47:58 1777470478

They'll qualify their answers in English but as the article mentions, if your prompt asks for a confidence score, that "uncertainty" doesn't translate into low numerical confidence.

Ukv · 2026-04-29T14:31:08 1777473068

Quantifying their own confidence is also something they're not good at, and which the format would prevent them from refusing to do or preceding with a caveat if that's what you'd want of them. Particularly since the response format seems backwards - giving confidence, then carbs estimate, then observations/notes, rather than being able to base carbs estimate off of observations/notes and then confidence estimate off of both of those.

> They'll qualify their answers in English but [...]

That the default user-facing chat as a normal user would use it gives a warning is the key part IMO. I don't think expectations of there being no "wrong way" to use the model can necessarily extend to API usage with long custom system prompt and restricted output format.

agentultra · 2026-04-29T13:02:58 1777467778

LLMs had no agency to choose such a course of action.

They’re algorithms and they were designed this way.

badgersnake · 2026-04-29T15:03:46 1777475026

Correct. But why doesn’t the AI just say that.

therobots927 · 2026-04-29T13:01:08 1777467668

Why is the AI answering questions without answers then?

pohl · 2026-04-29T13:03:11 1777467791

could be because, at the end of the day, it's just predicting the next likely token

SirMaster · 2026-04-29T14:58:36 1777474716

The next likely tokens for a response to a question that can't possibly be answered from the context should be "I" followed by "don't", followed by "know".

therobots927 · 2026-04-29T14:22:52 1777472572

The paper millionaires did NOT like your comment.

jaccola · 2026-04-28T23:15:45 1777418145

How is the revenue recurring? They offer a subscription?

jaccola · 2026-04-22T20:07:47 1776888467

The most common reply to this is: “people said the same about Books/TV/Comics/Games/Facebook stop being such a Luddite”

My answer to this is approximately: “yes they’re all relatively bad compared to the thing that came before so they were all right!”

Even books. If the option were between my child reading about blacksmithing/compsci and DOING blacksmithing/compsci I’d choose the latter every time. It gives you real experience and opinions.

The difference with each successive new wave is that it becomes increasingly addictive. It’s possible to read one book and stop for a while. Shorts can hook you for hours and then draw you back the next day with no natural stopping point.

BrenBarn · 2026-04-22T20:11:20 1776888680

It's also not possible to carry around enough books with you to allow you to jump from one to another whenever you get bored, while it is possible to effectively carry around a device that lets you watch YouTube/TikTok at all times. I think this is an important factor.

jjk166 · 2026-04-22T21:51:51 1776894711

> It's also not possible to carry around enough books with you to allow you to jump from one to another whenever you get bored

It most certainly is.

catcowcostume · 2026-04-23T02:06:53 1776910013

What really is the point of this comment?

jjk166 · 2026-04-24T01:03:25 1776992605

Is it not obvious?

jaccola · 2026-04-22T07:30:51 1776843051

I think this was an “if” scenario

sailingparrot · 2026-04-22T10:39:39 1776854379

This makes more sense that my initial reading of it indeed

jaccola · 2026-04-13T07:17:45 1776064665

I think the only thing that matters is whether the people on the team care deeply about the product; whether they care more about the product than their own careers (in the short term). Without that, any metric or way of thinking can and will be gamed.

Unfortunately, even with all the management techniques in the world, there are just some projects that are impossible to care about. There’s simply a significantly lower cap on productivity on these projects.

jaccola · 2026-04-13T06:40:29 1776062429

It’s funny, I think most people roll their eyes when Trump says things like “you’ll be tired of winning, you’ll say ‘please no more winning’”.

But recommendations to tax efficiency are unironically that (just dressed in more serious language). “Please stop giving us what we want so efficiently, we want to work more for it!”

gostsamo · 2026-04-13T07:08:01 1776064081

Those winning and those asking for this to stop are two different categories of people. The former are the capital holders and the latter are those with no source of capital in their future. If you can merge the two categories, we may talk again. Until then, you need to come up with something better than trickle down in a world where there is no trickling.

palmotea · 2026-04-13T07:08:59 1776064139

> But recommendations to tax efficiency are unironically that (just dressed in more serious language). “Please stop giving us what we want so efficiently, we want to work more for it!”

You're trying to make it sound ridiculous, but most people aren't pure consumers. They're laborers and consumers. Policies that hurt while wearing the consumer hat may be more than justified by the benefits while wearing the labor hat.

jaccola · 2026-04-06T19:33:04 1775503984

It's also a context-specific scale. I work in computer vision. Building the surrounding app, UI, checkout flow, etcetera is easily Level 6/7(sorry...) on this scale.

Building the rendering pipeline, algorithms, maths, I've turned off even level 2. It is just more of a distraction than it's worth for that deep state of focus.

So I imagine at least some of the disconnect comes from the area people work in and its novelty or complexity.

robbiewxyz · 2026-04-06T20:11:51 1775506311

This is exactly true in my experience! The usefulness of AI varies wildly depending on the complexity, correctness-requirements, & especially novelty of the domain.

This attribute plus a bit of human tribalism, social echo-chambering, & some motivated reasoning by people with a horse in the race, easily explains the discord I see in rhetoric around AI.

sigbottle · 2026-04-06T20:04:34 1775505874

am layman. is CV "solved" at this point, or is there more work to be done?

jaccola · 2026-04-06T20:35:31 1775507731

Far from solved! Though, like seemingly everything, it has benefited from the transformer architecture. And computer vision is kind of the "input", it usually sits intersecting with some other field i.e. cv for medical analysis is different to self driving is different to reconstruction for games/movies.

jaccola · 2026-04-05T17:56:14 1775411774

But why are you making projects in so many languages? The language is very rarely the barrier to performance, especially if you don't even understand the language.

jillesvangurp · 2026-04-05T20:35:16 1775421316

I try to pick the language best to the situation rather than giving into my own biases. I need to broaden my horizon to be able to cover the full stack of stuff that I need, not just the things I've been doing myself a lot for years. There's a lot of stuff that used to be out of my comfort zone that I can now tackle easily. Stepping over my own biases is part of that.

I know not everybody is quite ready for this yet. But I'm working from the point of view that I won't be manually programming much professionally anymore.

So, I now pick stuff I know AIs supposedly do well (like Go) with good solid tool and library ecosystems. I can read it well enough; it's not a hard language and I've seen plenty other languages. But I'm clearly not going to micro manage a Go code base any time soon. The first time I did this, it was an experiment. I wanted to see how far I could push the notion. I actually gave it some thought and then I realized that if I was going to do this manually I would pick what I always pick. But I just wasn't planning to do this manually and it wasn't optimal for the situation. It just wasn't a valid choice anymore.

Then I repeated the experiment again on a bigger thing and I found that I could have a high level discussion about architectural choices well enough that it did not really slow me down much. The opposite actually. I just ask critical questions. I try to make sure to stick with mainstream stuff and not get boxed into unnecessary complexity. A few decades in this industry has given me a nose for that.

My lack of familiarity with the code base is so far not proving to be any issue. Early days, I know. But I'm generating an order of magnitude more code than I'll ever be able to review already and this is only going to escalate from here on. I don't see a reason for me to slow down. To be effective, I need to engineer at a macro level. I simply can't afford to micro manage code bases anymore. That means orchestrating good guard rails, tests, specifications, etc. and making sure those cover everything I care about. Precisely because I don't want to have to open an editor and start fixing things manually.

As for Rust, that was me not thinking about my prompt too hard and it had implemented something half decent by the time I realized so I just went with it. To be clear, this one is just a side project. So, I let it go (out of curiosity) and it seems to be fine as well. Apparently, I can do Rust now too. It's actually not a bad choice objectively and so far so good. The thing is, I can change my mind and redo the whole thing from scratch and it would not be that expensive if I had to.

nsvd2 · 2026-04-06T10:54:45 1775472885

In my experience Rust and Go are both opinionated languages with strong types which makes them work well with agentic coding.

jaccola · 2026-04-05T11:07:23 1775387243

Yes because in most contexts it has seen "caveman" talk the conversations haven't been about rigorously explained maths/science/computing/etc... so it is less likely to predict that output.