More

theophrastus · 2026-05-13T15:36:50 1778686610

After spending an entire career doing 'by hand' (and a helluva lot of molecular orbital calculations) on the problem this post is about, i've got to tersely weigh in with: there's (still) not enough available data given the size of protein 'phase space' to hope for a proper covering with one's trained up linear algebra model. Or typed another way: you've got to include at some stage some physical modeling parameters, like molecular orbitals [1], otherwise the 'response curve' will only optimize if one gets quite lucky, (which is actually unlucky as then you'll delude yourself into thinking it's a generally applicable, which it isn't). For instance, swap in a carboxylic acid moiety where there was previously an aldehyde, a protein side-chain flips over, and you're in a completely different corner of the energetic 'galaxy'.

[1] e.g. https://proteindf.github.io/

phreeza · 2026-05-13T16:18:58 1778689138

That seems possible for generating completely new proteins.

Do you think it's also the case for lead optimization where you typically have some degree of measurements around your starting point, and you are expecting to stay in that local neighborhood for the generated candidates, too?

(Disclaimer: former Cradle employee here)

patrickkidger · 2026-05-13T16:25:22 1778689522

Oh hello Thomas, fancy seeing you here :D ex-Cradlers unite!

phreeza · 2026-05-13T18:06:15 1778695575

A rare breed!

patrickkidger · 2026-05-13T16:29:29 1778689769

I'll offer a +1 to the sibling comment here.

Yeah it's totally true you can't build a one-size-fits-all foundation model, the data just isn't there. But also... no-one needs that. It's totally fine to tweak a foundation model for any individual problem, and that's the bulk of what is being described in the linked blog post / in the underlying paper.

FWIW whilst at Cradle we had a lot of doubts going into this. Like, thermostability is clearly evolutionarily correlated so it was always pretty likely that by hook or by crook the models could do that correctly. But, binding? Aggregation? Not at all clear that the same principles should hold. And the exciting finding was that yes, yes they do.

theophrastus · 2026-03-21T23:28:40 1774135720

Jameco sells one [1]

[1] https://www.jameco.com/z/KIT-EFK-BUNDLE-Jameco-Kitpro-Compon...

theophrastus · 2026-01-04T23:11:42 1767568302

Had a QuantSci Prof who was fond of asking "Who can name a data collection scenario where the x data has no error?" and then taught Deming regression as a generally preferred analysis [1]

[1] https://en.wikipedia.org/wiki/Deming_regression

moregrist · 2026-01-05T01:19:49 1767575989

Most of the time, if you have a sensor that you sample at, say 1 KHz and you’re using a reliable MCU and clock, the noise terms in the sensor will vastly dominate the jitter of sampling.

So for a lot of sensor data, the error in the Y coordinate is orders of magnitude higher than the error in the X coordinate and you can essentially neglect X errors.

sigmoid10 · 2026-01-05T09:02:12 1767603732

That is actually the case in most fields outside of maybe clinical chemistry and such, where Deming became famous for explaining it (despite not even inventing the method). Ordinary least squares originated in astronomy, where people tried to predict movement of celestial objects. Timing a planet's position was never an issue (in fact time is defined by celestian position), but getting the actual position of a planet was.

Total least squares regression also is highly non-trivial because you usually don't measure the same dimension on both axes. So you can't just add up errors, because the fit will be dependent on the scale you chose. Deming skirts around this problem by using the ratio of variances of errors (division also works for different units), but that is rarely known well. Deming works best when the measurement method for both dependent and independent variable is the same (for example when you regress serum levels against one another), meaning the ratio is simply one. Which of course implies that they have the same unit. So you don't run into the scale-invariance issues, which you would in most natural science fields.

jmpeax · 2026-01-04T23:32:58 1767569578

From that wikipedia article, delta is the ratio of y variance to x variance. If x variance is tiny compared to y variance (often the case in practice) then will we not get an ill-conditioned model due to the large delta?

kevmo314 · 2026-01-05T07:24:27 1767597867

If you take the limit of delta -> infinity then you will get beta_1 = s_xy / s_xx which is the OLS estimator.

In the wiki page, factor out delta^2 from the sqrt and take delta to infinity and you will get a finite value. Apologies for not detailing the proof here, it's not so easy to type math...

ghc · 2026-01-05T13:26:39 1767619599

In my field, the X data error (measurement jitter) is generally <10ns, which might as well be no error.

Beretta_Vexee · 2026-01-05T09:28:25 1767605305

For most time series, noise in time measurement is negligible. However, this does not prevent complex coupling phenomena from occurring for other parameters, such as GPS coordinates.

RA_Fisher · 2026-01-05T11:48:55 1767613735

The issue in that case is that OLS is BLUE, the best linear unbiased estimator (best in the sense of minimum variance). This property is what makes OLS exceptional.

theophrastus · 2025-12-25T04:39:05 1766637545

Merry Christmas!! you gentle nerds you! Peace to us all.

theophrastus · on Dec 19, 2024

What was the source of the oxygen to maintain ethyl alcohol combustion in a sealed WWII torpedo?

ceejayoz · on Dec 19, 2024

Compressed air. https://en.wikipedia.org/wiki/Torpedo#Wet-heater

theophrastus · on Oct 30, 2024

Curiously the default has audio output off. That is, was the little speaker icon unmuted?

theophrastus · on June 16, 2024

Here's from 2014 with some capacity calculations [0]. "ssh private key (900 bytes): 15 feet of tape."

[0] https://heepy.net/index.php/Data_storage_capacity_of_teletyp...

Nzen · on June 16, 2024

If we prefer paper but relax the teletype constraint, Oleh Yuschuk's PaperBack [0] allows encoding 500 KB on a sheet of printer paper.

[0] https://ollydbg.de/Paperbak (posted various times to HN)

theophrastus · on May 17, 2024

Circa 1980, as a hobbyist beekeeper with six hives nearby Seattle, they came down with foul-brood. (It never was certain if it was American Foul-brood or European) Duly reported and the county agent came in, sealed them, and carted them away. For an additional fee (which I paid) they would fumigate them and return just the hive bodies, but none of the frames, (some of which would contain the infected brood). Those were burned. I believe the fumigant at the time was phosphine[1]

[1] https://en.wikipedia.org/wiki/Phosphine

theophrastus · on April 19, 2024

Shouldn't this make some initial direct reference to the author: Silvanus P. Thompson[1]?

[1] https://en.wikipedia.org/wiki/Calculus_Made_Easy

Jtsummers · on April 19, 2024

https://calculusmadeeasy.org/

The link isn’t to the front page with his name.

aaronbrethorst · on April 19, 2024

Ohh, this really is from 1910. And here I just thought the author was being obnoxiously cutesy with their language.

theophrastus · on Jan 13, 2024

Additional small molecule pharmaceutical candidates via molecular descriptors