Decompilation to C (and even C++!) has been done automatically for 2-3 decades at least. I am not sure what has changed in recent years other than people playing fast and loose with copyright (and GitHub allowing it, likely because their LLMs also stand to benefit). Introducing LLMs here is only going to introduce errors, delays and likely push you away from a reliable result.
The challenge here is readability. Reading the TP source leak you link I think it's even behind the current state of the art, as it's barely above assembly. This is where I suspect even the smallest of LLMs may help, since you don't care that much if it introduces errors.
>Decompilation to C (and even C++!) has been done automatically for 2-3 decades at least.
Only in a very rudimentary sense and definitely not in a working compilation (much less binary equivalent) sense. LLMs have turned this from a gimmick for static analysis into something that actually works pretty well for recompilation projects.
> Only in a very rudimentary sense and definitely not in a working compilation (much less binary equivalent) sense.
Working is the easy part; the hard part is getting something that classifies as readable C. LLMs do not really help reach the "working compilation" part but benefit from it.
We are way past "working compilation" when it comes to LLMs. They are already really good at writing readable, compliable code. The big problem with LLMs is making sure the output binary actually does what you wanted it to do. But if you define the goal not merely as instructions in a vague, unspecific human language and rather as recreating a given set of binary instructions after compilation, this big drawback goes away. So in a sense they are better suited for recompilation projects than for developing new applications.
My point is that we have been past the "working compilation" way before LLMs, and I do not think anything in LLMs help with it, at best agents use these tools with the same efficiency. I disagree that they're good at writing compilable code, but agree on the readable part.
Which decompiler reliably produced working, high level C/C++ from assembly? I would have loved to use this thing you are describing here 15 years ago. Compilation is inherently lossy, so any system that could have given you this would have needed pretty heavy LLM-like features anyways.
>I disagree that they're good at writing compilable code
That was never part of the discussion, because as explained several times now it is irrelevant in this case. The existence of the original binary means all you need to do is match up things, which can be automated completely.
The problem with this argument is that you can justify an infinite amount of crap with it, the security equivalent of cockroach papers; which inevitably people ends up treating as real security.
One example I remember is Pidgin storing its passwords in plain text in $HOME. They could have encrypted them with some hardcoded string, and made a lot of people happy that they would no longer grep their $HOME and find their passwords right there. However this had the side effect that now people were dropping the ball and sharing their config files with others. Or forgetting to setup proper permissions for their $HOME, etc.
In addition, these layers of obscurity are also not overhead free: they may complicate debugging, hey may introduce dangerous dependencies, they may tie you to a vendor, they may reduce computing freedom (e.g. Secure Boot), etc.
Why a hardcoded string and not a user specific password the user used for pidgin? Then you’ve got real security and even using a password stored in the user’s keychain means that the passwords are not trivially accessible.
The whole point of security in depth is that you use non colinear layers of protection to raise the cost of an attack and reduce the blast radius of a successful attack.
Pidgin predates keychains, but if I remember correctly you had the option to set up a master password or to simply disable storing passwords, which were the only options that were truly incrementing security. But most users would not do that (they want autologin for a reason), so the example still applies.
(Note also most keychain implementations are not truly improving security in any way, but this is a separate topic)
That said, purple3/pidgin3 (still in development) only supports for keyrings and doesn't try to do any password management on its own even though password managers fall into the "Store a password(s) behind a password" as detailed on the above page.
> The problem with this argument is that you can justify an infinite amount of crap with it, the security equivalent of cockroach papers; which inevitably people ends up treating as real security.
I almost missed the twist at the end because I had no idea what the hell cockroach papers were. I still don't understand the reference, but at least it sounds mildly interesting. So, well done.
Now, as for this strawman argument of yours about justifying an infinite amount of crap, that's true of all manner of disingenuous arguments. Who cares about that in this case?
> Or forgetting to setup proper permissions for their $HOME, etc.
This is Pidgin's fault how?
Now, if you wanted to argue that Pidgin should have put the passwords into a separate file and chmod400'ed it that would make much more sense.
> In addition, these layers of obscurity are also not overhead free: they may complicate debugging, hey may introduce dangerous dependencies, they may tie you to a vendor, they may reduce computing freedom (e.g. Secure Boot), etc.
Not many good things have zero cost, do they... The point of TFA is that a little bit of well thought out obscurity pays huge dividends when applied in the real world. His example about the WP exploit ought to be all you need to read to get on board with that.
But NAT is _not_ a firewall. It's entire purpose is to allow traffic through. There's a million dozen tricks attackers can play, e.g. tricking a PC into sending traffic to some address will usually allow all traffic from that address back into one's precious network.
This is a common misconception and very dangerous -- I see a lot of people, ISPs even, who seem to think NAT is enough and you only need the firewall for IPv6.
You absolutely aren't copying the work, recompilation projects are intensive work and a re-imagining of what the source code could look like. Compilation is still a one way process.
And then for the legal part, that's why it's called an exception.
There is little to no creative work whatsoever if you end up with exactly the same game; and often they end up with exactly the same binary as well. Source translations are derivative works almost by definition. It doesn't matter what magic you use to generate it.
And again, where is the interoperability here? Interoperability exception would apply if there was whitebox cryptography, Nintendo logo-style things or anything else where the only method for the work to run would be to violate copyright of _exactly that_. Under no circumstances you can simply copy & distribute the entire work (or derivates) while claiming "interoperability exception!". It makes utterly no sense.
I disagree, the creative work is in figuring out what the game does, and the resulting recompilation is completely different from the original source code.
And then for the interoperability, these decompilation projects are primarily made to target other systems, not the original platform. That's the textbook definition of interoperability.
Let's be real, N64 and the PS1/PS2 (where most of these projects are based) are crumbling old platforms at this point and these projects are sometimes the best way to run games when they exist.
Decompilation produces a derivative work. This is not up for debate, or disagreement.
The exception for interoperability only applies to _the minimum required_ for interoperability. You can use this exception to distribute e.g. game authorization code even if copyright would not allow you to do it.
You _cannot_ use this as an excuse to pirate the entire program, much less to create your own derivative work and distribute it!
This is just wishful thinking that comes up every so often in these threads (now it is the 5th time I see this parroted here). And then, when Nintendo inevitably shuts everything down, cue the crying. This ignorance is simply setting these projects for failure.
Your interpretation, I have mine. As far as I know, none of these recompilation projects ended up in any EU court yet so your interpretation is as valid as mine.
And Nintendo can pound sand, sorry. The only realistic ways to play those aging games is on an emulator or recompilation projects nowadays.
Nintendo also didn't strike these projects, maybe they are afraid of making a precedent.
There is a bazillion of jurisprudence about decompilation in the EU . Just search for your favorite case. I'm based in the EU (France). But FYI, despite what you may think, in practice the US is more lax about this than the EU is.
In the EU, for example, decompilation even if you don't distribute may very well be illegal (because it would be an unauthorized temporary copy of the program); the US courts are way more lax when it comes to these temporary never-distributed copies (which are almost always fair use, a concept that doesn't exist per-se in the EU). This is a big problem in the EU for security research (which obviously does not fall into interoperability).
Emulation would be acceptable, which is yet another reason the interoperability clause does not apply (since you _already_ have a way to interoperate that doesn't require distributing copyrighted software, and the EU interoperability clause very explicitly says that then it does _not_ apply).
Derivative works aren't some unknowable arcane legal term. They're a pretty fundamental aspect of copyright law. The canonical examples of derivative works are things like adaptation of a book to a film, translation of a book, or a sequel.
And given these examples, it's very clear that recompilation to play on modern hardware is quite similar in spirit to translating a book into a different language, which makes it a derivative work. The other alternative is that there is insufficient creativity in the recompilation effort to merit independent copyright at all, in which case it's just plain copying of the original work. In either case, it's infringement.
And in the process immediately convert huge numbers of devices into ewaste. Then check the excuse calendar again for tomorrow's reason to deprecate yet another batch of "legacy" ciphers from openSSL.
It's not another story, the quality of the reasons for scrapping / upgrading devices is the most important thing here.
If the reasons are "the current devices are insecure or likely to become insecure" that's very different from "the new encryption system is a little bit better so there's not much point in upgrading".
If quantum computing never becomes a practical thing, the current hardware and software will stay secure. If it becomes practical, they won't. Seems simple enough.
> given the price point of these switches when buying new I would highly recommend that you instead look for a used managed Gigabit switch.
Power, power, power .... one of these older gigabit switches is going to use 10x power at idle, compared with one of the crappy and cheapest realtek-based "unmanaged" switches. Which is kinda important for something that is going to be 24h on. And of course since no one reviews these things, you'll only know once you have spend the money.
So if you really can get away with a crappy web interface onto the crappy low-power realtek chip, you may get the best of both worlds.
Gigabit seems undersized for a home LAN these days. 2.5Gb equipment isn't significantly more expensive and any Cat6 should handle it. Fiber is cheap enough too if you want some 10Gb devices. Only expensive thing is SFP ethernet adapters but you can put an SFP NIC in your PC and bypass the problem.
I've been using this equipment in my home LAN for a mix of 10Gb fiber, 2.5Gb ethernet, and a small number of devices that came with 10Gb ethernet ports (Tyan motherboards) get SFP ethernet adapters.
Unmanaged 4x2.5Gb ethernet 2x10Gb fiber - https://a.co/d/08J99UjH - I daisy chain these with fiber connections to have a kind of 10Gb backbone that terminates at my main PC with the fiber NIC.
Managed 10x fiber - https://a.co/d/06927QeJ - This is the most economical 10Gb fiber switch I could find at the time and it's had no problems for the low price. Has a serial management interface in addition to web. Extensive management capabilities. I've used its link aggregation successfully.
Managed 4x2.5Gb ethernet 2x10Gb fiber - https://a.co/d/0fud7jzF - First hope off my modem before the fiber switch, good management capabilities.
It's kind of funny, my LAN is all random Amazon brands people would warm against relying on, but I picked out ones that have been solid and reliable for years of use. No need to break the bank if you find the right stuff.
> 2.5Gb equipment isn't significantly more expensive and any Cat6 should handle it. Fiber is cheap enough too if you want some 10Gb devices
You don't need Cat6 for 2.5G. Regular Cat5e that has been installed everywhere for years is fine for 2.5G.
Cat6 is enough for 10G ethernet within the lengths you find in a typical residential house. You don't need fiber.
For short runs even 10G works on quality Cat5e.
I think some home lab fans overestimate network cabling requirements. With Cat6 I could string a cable from one end of my house and back and not even be close to breaking the spec for 10G. For 2.5G ethernet cheap Cat5e will get you anywhere you need to reach in a residential home.
- Equipment with 10Gb ethernet ports is much more expensive than similar fiber equipment, and it runs hot at the ports - 10GBASE-T RJ45 runs at 2.0W – 5.0W per port, often enough to burn your finger. Especially if something's going to be inside your walls, generating less heat is a plus
- Fiber's somewhat easier to run since it's lighter; it's easier to break but the bend radius is much more forgiving than you might assume. I have yet to damage a fiber cable myself.
- More electronic isolation between equipment is always a benefit, which fiber naturally gives
The tradeoffs lead me to prefer running fiber for 10G which then branches out to 2.5G ethernet for most equipment in the house, but if I didn't have these Tyan boards prompting me to try out 10G equipment then I would probably stick to 2.5G ethernet for everything for simplicity. If you're aiming for 10G then I don't think ethernet would make sense in most situations for both upfront cost and heat generation/power usage.
The biggest problem with fiber is that you cannot do cable work without equipment, which partially neglects the advantages of the thiner wires since you have to always account for the plug.
But it is true that, otherwise, and in a surprising turn of events, fiber is cheaper to run than 10GBASE-T.
I think there's probably two ways to address this.
1) Most likely we make self retracting reels, or just very easy cable/slack management and sell fixed length cables with a hard connector one end and a snap on connector the other end (https://youtu.be/6dop-9_0_g8?t=43&si=DdAXLMU_A7wTuCTn). That solves the problem of accounting for the plug in drilling holes. We could easily do this tomorrow on 2mm white 1f or 3mm 2f cable. This size is important as it is about the maximum you can just use adhesive to stick to the top of skirting.
2) we use plastic optical fibre and build a whole bunch of infrastructure around that. That is much easier to terminate, cut and safer to use but a load of work will be required.
> Fiber is cheap enough too if you want some 10Gb devices.
The problem with Fiber for now will remain that so few consumer devices can actually connect to it without first converting to RJ45. You are p much limited to some enthusiast networking gear and server gear and everything else needs you to convert.
I recently had my families home ethernet situation upgraded and we went with Cat8 for now (it wasn't meaningfully more expensive to doing any other Cat cable all things considered). It is compatibile with networking stuff that is commonly available today and hopefully in the future some switch will appear to make full use of it (I am slightly sceptical, but I assume 10G will at least still be seen over Cat for consumers).
I'm not sure if we'll see >10G over twisted pair/CAT but I'm sure we'll definitely see 5G and 10G baseT become far cheaper with 2.5G the baseline (e.g standard on cheap things like raspberry pi).
Base level Mac studio is already 10G as standard and it's only $100 extra on a mac mini.
Long time until 10G per device isn't enough at home.
I also doubt above 10G will be seen at least on consumer grade hardware, but until we start seeing SFP+ or similar on consumer hardware (and not just enthusiast and server hardware) there is a realistic chance. But that is so far out in the future making predictions on it doesn't make sense.
The problem with all these arguments is that DOS extenders of the day used to do the same, or more. Do you call DOS extenders "operating systems not shells" too ?
On one hand you have that the technical answer to the above question is likely "yes". DOS is so simple that any non-trivial application likely qualifies as an operating system. Implementing some kind of virtual memory support is almost a given, and process control is not unheard of.
But on the other hand most people would refer to anything pre-95 as "shell" for the simple reason that it requires DOS to boot ; even when complexity-wise, 95 and later versions of 3.x are practically the same: if you call one an OS, you ought to call the other one a full OS too.
So this question is on the "angels on the head of a pin"-level; but this simply means there's no answer that doesn't require a lot of nuance, and this also applies to the "it can't be a shell, it does too many things" answer.
doslinux is some tricky sleight of hand where it looks like Linux is running inside DOS, but it's actually the other way around (even though DOS boots first).
WSL9x takes quite a different approach. Windows boots first, but once Linux starts both kernels are running side-by-side in ring 0 with full privileges. They are supposed to cooperate, but if either crashes then both go down.
Magnetic core memory final form were single large perforated plates with many conductors plated on the ferrite surface through-holes, and only the vertical stack of bus wires were threaded through the plates. This meant weaving was less of an issue, and higher >1kiB modules were feasible in a smaller area. The main draw back is it sill had destructive read-once access, so always had higher latency in addition to being slow.
The DDR market will adapt, as China grey market state fab smells the opportunity. They have been counterfeiting cmos chips for decades already, and dram is not as complex as people like to assume.
Neuromorphic computing will likely kick over the LLM sand pile at some point, and all that discounted hardware will need re-homed. We can wait for the bubble to run its course, and actual investors realize they were conned. =3
> Neuromorphic computing will likely kick over the LLM sand pile at some point, and all that discounted hardware will need re-homed.
I don't see a lot of work going on in neuromorphic - there was some work at Intel, IIRC. Not saying you're wrong, but just wondering where you think it's going to come from?
> just wondering where you think it's going to come from?
Will likely evolve like any regular biological system, and consume translated LLM weight sets into its initial training condition 3D propagation structure.
The speed at which this occurs will likely initially be measured in weeks due to slower growth state writes, but once bootstrapped the GC is self-propagating.
I normally don't like to speculate, but the barrier to entry would actually be much lower than traditional silicon fabrication processes. It was an old idea from science fiction, that until recently was highly impractical. Asimov was likely wrong about the physical process, but not about how it is made.
Due to theoretical constant morphological changes under GC, one must acknowledge the inherent lack of safety such systems would pose. Have a nice day. =3
I expect you'd carry out most such work in the form of simulations, only moving to hardware once you'd demonstrated an efficient algorithm. If I'm right about that then it would be easy for any corporate R&D on the topic to fly under the radar indefinitely.
On the academic side of things there's a steady drip of papers on things like spiking neutral networks so I'd say the general theme is being explored.
I figure if a breakthrough happens it will be overnight just like what happened with transformers.
The next big shift will be HBF. All that DRAM holding essentially static weights that are read in nice, long linear reads in inference machines is wasted; if you had a proper interface to it you could replace it all with flash for a tenth of the cost.
The challenge here is readability. Reading the TP source leak you link I think it's even behind the current state of the art, as it's barely above assembly. This is where I suspect even the smallest of LLMs may help, since you don't care that much if it introduces errors.
reply