It also, unsurprisingly, tells a slightly different and less startling story: it's not that glycerine crystallized in one lab and suddenly others around the world had the same problem, it's that glycerine hadn't been crystallizing in one lab but once the lab was sent a sample of crystallized glycerine the stuff always did crystallize there, presumably (assuming the story's true) because of some sort of tiny particles (whether of glycerine or of something else) that float about in the air or adhere to glassware and encourage glycerine to crystallize.
No, it's a lie. I researched it a bunch back in September 2024 (I was curious what the oldest possible edible food was*), and the Smithsonian knows it's BS (because I emailed them about this to get it corrected). I was able to correct Wikipedia, but I see Smithsonian hasn't gotten around to bothering, so this keeps making the social media echo chamber rounds...
To be clear: no edible honey has ever been discovered in Egyptian tombs. Every single anecdote is either unverifiable, or a garbled telephone-game description of some decayed residue which might have been honey thousands of years ago (and often on further chemical testing, proves to not have been).
> The problem is that subjective judgements by streaming platforms on where an AI line is drawn in music production is difficult.
This may be more of an economic problem. There is a stark difference between a music track with 1% human work/effort, and 0%. You can make many musical tracks if you have to do only 1% of the work, but you can't make >100x what you made without AI (Amdahl's law). While the latter can scale infinitely; you could upload a billion tracks if you wished, you're limited basically by bandwidth and automation. So a classifier or policy which permitted the 99% AI but banned the 100% AI may be adequate.
> In addition to detecting, tagging and removing AI-generated music from recommendations, Deezer has now stopped storing hi-res versions of AI-tracks
Important point for anyone out there thinking about generating a lot of samples. Expect to get increasingly filtered out if you don't emphasize quality or uniqueness or something. It's cheaper to detect that something is generated, and apply standard base rate reasoning 'it's probably slop' and filter it out, than to try to do expensive evaluation to look for the rare gems.
> Two blog articles and two preprints of fake academic articles [0] were able to convince CoPilot, Gemini, ChatGPT and Perplexity AI of the existence of a fake disease, against all majority consensus
Wrong. There are no 'majority consensus' against 'bixonimania' because they made it up, that was the point. It's unsurprisingly easy to get LLMs to repeat the only source on a term never before seen. This usually works; made-up neologisms are the fruitfly of data poisoning because it is so easy to do and so unambiguous where the information came from. (And retrieval-based poisoning is the very easiest and laziest and most meaningless kind of poisoning, tantamount to just copying the poison into the prompt and asking a question about it.) But the problem with them is that also by definition, it is hard for them to matter; why would anyone be searching or asking about a made-up neologism? And if it gets any criticism, the LLMs will pick that up, as your link discusses. (In contrast, the more sources are affected, the harder it is to assign blame; some papermills picked up 'bixonimania'? Well, they might've gotten it from the poisoned LLMs... or they might've gotten it from the same place the LLMs did which poisoned their retrievals, Medium et al.)
The LLMs didn't only talk about the disease when prompted by the neologism. They also brought it up when asked about the symptoms. From the article:
> OpenAI’s ChatGPT was telling users whether their symptoms amounted to bixonimania. Some of those responses were prompted by asking about bixonimania, and others were in response to questions about hyperpigmentation on the eyelids from blue-light exposure.
And yes, sure, in this example the scientific peer-review process may have eventually criticised and countered 'bixonimania' as a hoax were the researcher to have never revealed its falsity—emphasis on 'may', few researchers have the time and energies to trawl through crap papermill articles and publish criticisms. Either way, that is a feature of the scientific process and is not a given to any online information.
What happens when false information is divulged by other means that do not attempt to self-regulate? And how do we distinguish one-off falsities from the myriad of obscure true things that the public is expecting LLMs to 'know' even when there is comparatively little published information about them and therefore no consensus per se?
"hyperpigmentation on the eyelids from blue-light exposure" is a super specific query almost definitionally 'bixonimania' which probably brought up the 'bixonimania' poison at the time (the search hits for that query right now in Google are weak and poorly relevant so it would not be hard to outrank them or at least get into the top 50 or so where a retrieval LLM would see them and would followup), and so still an instance of what I mean.
> Either way, that is a feature of the scientific process and is not a given to any online information.
Which does not distinguish it in any way from human errors like a crank or activist etc.
And I don't know, how did we handle false information before on niche topics no one cared about and which were unimportant? It's just noise. The worldwide corpus has always been full of extremely incorrect, mislabeled, corrupted, distorted, information on niche topics of no importance. But it's generally not important.
Unfortunately, that is an oversimplification for a highly RLed/chatbot trained LLM like Claude-4.7-opus. It may have started life as a base model (where prompting it with correctly spelled prompts, or text from 'gwern', would - and did with davinci GPT-3! - improve quality), but that was eons ago. The chatbots are largely invariant to that kind of prompt trickery, and just try to do their best every time. This is why those meme tricks about tips or bribery or my-grandmother-will-die stop working.
LLM APIs sell on value they deliver to the user, not the sheer number of tokens you can buy per $. The latter is roughly labor-theory-of-value levels of wrong.
reply