More

convolvatron · 2026-04-23T18:35:51 1776969351

I have been in the same position. Maybe I was naive but I believed that weapons design wasn't the most moral thing in the world, but sadly necessary, and I actually trusted the military to .. I guess act in legitimate and legal ways. That if those weapons were used in a conflict, it would be defensive and defendable morally.

Of course that was before the inexplicable adventurism in the Middle East.

convolvatron · 2026-04-23T15:29:02 1776958142

as a comment about a particular project and its goals and timelines, this is fine. as a general statement that we should never revisit things its pretty offensive. llvm makes a lot of assumptions about the structure of your code and the way its manipulated. if I were working on a language today I would try my best to avoid it. the back ends are where most of the value is and why I might be tempted to use it.

we should really happy that language evolution has started again. language monoculture was really dreary and unproductive.

20 years ago you would be called insane for throwing away all the man-years of optimization baked into oracle, and I guess postgres or mysql if you were being low rent. and look where we are today, thousands of people can build databases.

convolvatron · 2026-04-22T16:38:37 1776875917

that's strictly not true. if I move the code that does TCP from the kernel into the application (not some other daemon, which is perhaps what you're suggesting), then the performance is to the first order the same.

ok, what are the niggly details. we don't have interrupts, and we're running under the general scheduler, so there may be some effects from not getting scheduled as aggressively.

we still need to coordinate through the kernel wrt port bindings, since those are global across the machine, but that's just a tiny bit.

clearly we may be re-opening a door to syn-flooding, since the path to rejection is maybe longer. maybe not, but maybe we can leave the 3-way handshake in the kernel and put the datapath in userspace.

we probably lose rtt-estimates hanging off of routes.

none of that suggests 'several times slower'

convolvatron · 2026-04-19T21:01:50 1776632510

are you kidding? if spent all that money on food you guys would just use it to bullshit all day and make funny pictures, while if we spend it on AI..

convolvatron · 2026-04-19T19:55:22 1776628522

you repeat several times that IAB was too ivory tower and refused to address the critical issues of the day, but don't really go into much detail. I wrote an early implementation of v6, before ratification (and even won the UNH interop prize!). and I struggle to understand exactly what blame you are placing at their feet. just that maybe they took the e2e principle too seriously and should have backed the awful bodge that was NAT?

perennialmind · 2026-04-19T22:23:12 1776637392

CLNP had existing implementations and was fundamentally sound. On its technical merits, RFC1347 TCP and UDP with Bigger Addresses (TUBA) wins hands-down. But it took too long for the ISO to agree to a hand-off (the IETF wanted to be able _fork_ it, which seems nuts to me) and the IAB required ownership.

But aside from that, I actually do think we could have baked address extensions into the existing packet format's option fields and had a gradual upgrade that relied on that awful bodge that was (and is) NAT. And had a successful transition wherein it died a well-deserved death by now. :-)

Dagger2 · 2026-04-20T03:18:01 1776655081

> we could have baked address extensions into the existing packet format's option fields and had a gradual upgrade that relied on that awful bodge that was (and is) NAT

We did and do have this. I wrote about the option fields part in [1] but we also have NAT as part of the migration, in the form of NAT64.

Not only was doing these things not enough for us to be done by now, they weren't even enough to stop you from moaning that we didn't do them! How could anything have been good enough if these are the standards it's judged by?

[1]: https://news.ycombinator.com/item?id=47829991

perennialmind · 2026-04-20T16:00:05 1776700805

My point was meant purely as an intellectual exercise, not a critique of engineering choices made in the face of adverse practical realities. My apologies if it came across otherwise.

With the luxury of hindsight, allowing an admixture of 32-bit and 64-bit addresses strikes me as an obviously clean solution to the one real problem IPv6 solves. But in 1992, that was a complete non-starter.

Dagger2 · 2026-04-21T02:29:00 1776738540

But mine was that you don't need to do this as an intellectual exercise, because we got basically all the things you're asking for.

We have address extensions in v4 packets, we have NAT to help with partial upgrades, and we have a mix of 32-bit and 128-bit addresses (which should be just as obviously clean as a mix of 32-bit and 64-bit addresses, or rather more so due to 64-bits being too small). You don't need to think about whether any of this would have been doable, because we already went and did it.

convolvatron · 2026-04-19T23:54:03 1776642843

I didn't have too much visibility in the CLNP world, although we did have a test network where I worked. My personal issue was that I just couldn't read the massively overwrought ISO specs. My admittedly biased viewpoint there wasn't anything really wrong with Ipv6, but the providers were quite happy with the way things were and actually kind of liked the internet-as-television model that we ended up with.

I do think that the IETF didn't realize that they were losing their agency, so its very likely that TUBA would have made the difference. not for any technical reason, but that it would have been a few years earlier when people were still listening.

perennialmind · 2026-04-20T00:32:03 1776645123

I only read up on CLNP based on a fascination with counterfactuals. I will say there is a fair bit to IS-IS and ES-IS that's directly relevant to the original articles points on the circuits-to-bus-to-circuits physical evolution. There was no blanket assumption that the underlying layer look like Ethernet. The subnet equivalent was at a higher level and the assumptions were that there would be an actual network of links to manage.

The fact that IS-IS survived as a relevant IP routing protocol says a lot on its own.

nyrikki · 2026-04-19T23:23:26 1776641006

It is hard to cover decades of politics in one post on here, but rather than the IAB being in an ivory tower, at least for the first 15 years, I think it was ruled by inertia that was changing, and suffering a bit from The Mythical Man Month second system syndrome.

In the beginning it was an experiment and should have been ambitious, the IETF had just moved to CIDR which bought almost a decade of time, and they should have aimed high.

It is just when you significantly change a system, you need to show users how to accomplish the work they are doing with the old system, even if how they do that changes. If you can't communicate a way to replace their old needs, or how that system is fitting new needs that you could never have predicted, you need to be flexible and demonstrate that ability.

If you look at the National Telecommunications and Information Administration. [Docket No. 160810714-6714-01] comments

Microsoft: https://www.ntia.gov/sites/default/files/publications/micros... ARIN: https://www.ntia.gov/sites/default/files/publications/arin_c...

You will see that the address space argument is the only real one they make. It isn't coincidence that rfc7599 came about ~20 years later when 160810714-6714-01 and federal requirements for IPv6 were being discussed.

If you look at the #nanog discussions between RFC 1883 (ipv6) (late 1996) being proposed and Ipv4 exhaustion in early in (2011) it wasn't just the IAB that was having philosophical discussions around this.

Both rfc3484 and rfc6724 suffered from the lack of executive sponsorship as called out in the above public comments. And the following from rfc6724's intro is often ignored with just pure compliance:

> They do not override choices made by applications or upper-layer protocols, nor do they preclude the development of more advanced mechanisms for address selection.

There are many ways that could have played out different, but I noticed Avery Pennarun's last update to that post pretty much says the same in different words.

https://tailscale.com/blog/two-internets-both-flakey

> IPv6 was created in a new environment of fear, scalability concerns, and Second System Effect. As we covered last time, its goal was to replace The Internet with a New Internet — one that wouldn’t make all the same mistakes. It would have fewer hacks. And we’d upgrade to it incrementally over a few years, just as we did when upgrading to newer versions of IP and TCP back in the old days

convolvatron · 2026-04-18T21:31:34 1776547894

I was going to reply that just because intel did something funny doesn't mean that it was the beginning of the story. but it turns out that the release of the 8087 predates the ratification of IEEE floats by 2 years. in addition, the primary numeric designer for the 8087 was apparently Kahan, which means that they were both part of the same design process. of course there were other formats predating both of these

adrian_b · 2026-04-20T12:52:01 1776689521

The Intel 8087 design team, with Kahan as their consultant, who was the author of most novel features, based on his experience with the design of the HP scientific calculators, have realized that instead of keeping their new much improved floating-point format as proprietary it would be much better to agree with the entire industry on a common floating-point standard.

So Intel has initiated the discussions for the future IEEE standard with many relevant companies, even before the launch of 8087. AMD was a company convinced immediately by Intel, so AMD was able to introduce a FP accelerator (Am9512) based on the 8087 FP formats, which were later adopted in IEEE 754, also in 1980 and a few months before the launch of Intel 8087. So in 1980 there already were 2 implementations of the future IEEE 754 standard. Am9512 was licensed to Intel and Intel made it using the 8232 part number (it was used in 8080/8085/Z80 systems).

Unlike AMD, the traditional computer companies agreed that a FP standard is needed to solve the mess of many incompatible FP formats, but they thought that the Kahan-Intel proposal would be too expensive for them, so they came with a couple of counter-proposals, based on the tradition of giving priority to implementation costs over usefulness for computer users.

Fortunately the Intel negotiators eventually succeeded to convince the others to adopt the Intel proposal, by explaining how the new features can be implemented at an acceptable cost.

The story of IEEE 754 is one of the rare stories in standardization where it was chosen to do what is best for customers, not what is best for vendors.

Like the use of encryption in communications, the use of the IEEE standard has been under continuous attacks during its history, coming from each new generation of logic designers, who think that they are smarter than their predecessors, and who are lazy to implement properly some features of the standard, despite the fact that older designs have demonstrated that they can in fact be implemented efficiently, but the newbies think that they should take the easy path and implement inefficiently some features of the standard, because supposedly the users will not care about that.

indolering · 2026-04-18T21:38:34 1776548314

The floating point "standard" was basically codifying multiple different vendor implementations of the same idea. Hence the mess that floating point is not consistent across implementations.

jcranmer · 2026-04-18T22:12:59 1776550379

IEEE 754 basically had three major proposals that were considered for standardization. There was the "KCS draft" (Kahan, Coonen, Stone), which was the draft implemented for the x87 coprocessor. There was DEC's counter proposal (aka the PS draft, for Payne and Strecker), and HP's counter proposal (aka, the FW draft for Fraley and Walther). Ultimately, it was the KCS draft that won out and become what we now know as IEEE 754.

One of the striking things, though, is just how radically different KCS was. By the time IEEE 754 forms, there is a basic commonality of how floating-point numbers work. Most systems have a single-precision and double-precision form, and many have an additional extended-precision form. These formats are usually radix-2, with a sign bit, a biased exponent, and an integer mantissa, and several implementations had hit on the implicit integer bit representation. (See http://www.quadibloc.com/comp/cp0201.htm for a tour of several pre-IEEE 754 floating-point formats). What KCS did that was really new was add denormals, and this was very controversial. I also think that support for infinities was introduced with KCS, although there were more precedents for the existence of NaN-like values. I'm also pretty sure that sticky bits as opposed to trapping for exceptions was considered innovative. (See, e.g., https://ethw-images.s3.us-east-va.perf.cloud.ovh.us/ieee/f/f... for a discussion of the differences between the early drafts.)

Now, once IEEE 754 came out, pretty much every subsequent implementation of floating-point has started from the IEEE 754 standard. But it was definitely not a codification of existing behavior when it came out, given the number of innovations that it had!

convolvatron · 2026-04-17T20:17:11 1776457031

I don't think that's accurate. Mojo is explicitly compiling tensor graphs that run on accelerators. it's not like PyTorch where python is providing the chassis but not the engine.

I don't think its going to be a good general HPC language just because its targeting a specific set of AI workloads, but they have shown some examples of synthesizing code which is comparable to hand-written kernels.

but its not out of the question from first principles

convolvatron · 2026-04-17T16:30:39 1776443439

I worked in parallel computing in the late 80s and early 90s when parallel languages were really a thing. in HPC applications memory bandwidth is certainly a concern, although usually the global communications bandwidth (assuming they are different) is the roofline. by saying c++ you're implying that MPI is really sufficient, and its certainly possible to prop up parallel codes with MPI is really quite tiresome and hard to play with the really interesting problem which is the mapping of the domain state across the entire machine.

other hugely important problems that c++ doesn't address are latency hiding, which avoids stalling out your entire core waiting for distributed message, and a related solution which is interleave of computation and communication.

another related problem is that a lot of the very interesting hardware that might exist to do things like RDMA or in-network collective operations or even memory-controller based rich atomics, aren't part of the compiler's view and thus are usually library implementations or really hacky inlines.

is there a good turnkey parallel language? no. is there sufficient commonality in architecture or even a lot of investment in interesting ideas that were abandoned because of cost, no. but there remains a huge potential to exploit parallel hardware with implicit abstractions, and I think saying 'just use c++' is really missing almost all of the picture here.

addendum: even if you are working on a single-die multicore machine, if you don't account for locality, it doesn't matter how good your code generator is, you will saturate the memory network. so locality is an important and languages like Chapel are explicitly trying to provide useful abstractions for you to manage it.

convolvatron · 2026-04-17T14:32:33 1776436353

if you're using thread level parallelism, there is always a benefit to having a per-thread allocator so that you don't have to take global locks to get memory, they become highly contended.

if you take that one step further and only use those objects on a single core, now your default model is lock-free non-shared objects. at large scale that becomes kind of mandatory. some large shared memory machines even forgo cache consistency because you really can't do it effectively at large scale anyways.

but all of this is highly platform dependent, and I wouldn't get too wrapped up around it to begin with. I would encourage you though to worry first about expressing your domain semantics, with the understanding that some refactoring for performance will likely be necessary.

if you have the patience and personally and within the project, it can be a lot of fun to really get in there and think about the necessary dependencies and how they can be expressed on the hardware. there's a lot of cool tricks, for example trading off redundant computation to reduce the frequency of communication.

kmaitreys · 2026-04-17T15:34:01 1776440041

Thank you for such a great reply!

There's a lot of useful advice here that'll surely come in handy to me later. For now, yeah I'm just going to try to make things work. So far I have mostly written intra-node code for which rayon has been adequate. I haven't gotten around to test the ergonomics of rs-mpi. But it feels quite an exciting prospect for sure.

convolvatron · 2026-04-16T20:41:15 1776372075

in general these aren't in conflict. in particular once I have a system which can distribute work among faulty nodes and maintain serializability, exploiting parallelism _within_ a fault domain just falls out.