No it isn't, and it's frustrating when the "common wisdom" tries to boil it down to this. If this was true, then the models with "infinitely many" parameters would be amazing. What about just training a gigantic two-layer network? There is a huge amount of work trying to engineer training procedures that work well.
The actual reason is due to complex biases that arise from the interaction of network architectures and the optimizers and persist in the regime where data scales proportionally to model size. The multiscale nature of the data induces neural scaling laws that enable better performance than any other class of models can hope to achieve.
> The actual reason is due to complex biases that arise from the interaction of network architectures and the optimizers and persist in the regime where data scales proportionally to model size. The multiscale nature of the data induces neural scaling laws that enable better performance than any other class of models can hope to achieve.
That’s a lot of words to say that, if you encode a class of things as numbers, there’s a formula somewhere that can approximate an instance of that class. It works for linear regression and works as well for neural network. The key thing here is approximation.
No, it is relatively few words to quickly touch on several different concepts that go well beyond basic approximation theory.
I can construct a Gaussian process model (essentially fancy linear regression) that will fit _all_ of my medical image data _exactly_, but it will perform like absolute rubbish for determining tumor presence compared to if I trained a convolutional neural network on the same data and problem _and_ perfectly fit the data.
I could even train a fully connected network on the same data and problem, get any degree of fit you like, and it would still be rubbish.
Also massive human work done on them, that wasn't done before.
Data labeling is pretty big industry in some countries and I guess dropping 200 kilodollars on labeling is beyond the reach of most academics, even if they would not care about ethics of that.
normally more parameters leads to overfitting (like fitting a polynomial to points), but neural nets are for some reason not as susceptible to that and can scale well with more parameters.
Thats been my understanding of the crux of mystery.
Would love to be corrected by someone more knowledgable though
This absolutely was the crux of the (first) mystery, and I would argue that "deep learning theory" really only took off once it recognized this. There are other mysteries too, like the feasibility of transfer learning, neural scaling laws, and now more recently, in-context learning.
bro do you really not understand that that's a game played for your sake - it checks boxes yes but you have no idea what effect the checking of the boxes actually has. like do you not realize/understand that anthropic/openai is baking this kind of stuff into models/UI/UX to give the sensation of rigor.
The checkboxes inform the model as well as the user, and you can observe this yourself. For example in a C++ project with MyClass defined in MyClass.cpp/h:
I ask the model to rename MyClass to MyNewClass. It will generate a checklist like:
- Rename references in all source files
- Rename source/header files
- Update build files to point at new source files
Then it will do those things in that order.
Now you can re-run it but inject the start of the model's response with the order changed in that list. It will follow the new order. The list plainly provides real information that influences future predictions and isn't just a facade for the user.
Are you seriously saying that breaking a large complex problem down into it's constituent steps, and then trying to solve each one of them as an individual problem is just a sensation of rigour?
To some extent, I could agree with that idea. One purpose of that process is to match the impedance between the problem, and human cognition. But that presumes problem solving inherently requires human cognition, which is false; that's just the tool that we have for problem solving. When the problem-solving method matches the cognitive strengths and weaknesses of the problem solvers, they do have a certain sensation of having an upper hand over the problem. Part of that comes from the chunking/division allowing the problem solvers to more easily talk about the problem; have conversations and narratives around it. The ability to spin coherent narratives feels like rigor.
I'm saying that's not what the stupid bot is actually doing, it's what anthropic added to the TUI to make you feel good in your feelies about what the bot is actually doing (spamming).
Edit: I'll give you another example that I realized because someone pointed it out here: when the stupid bot tells you why it fucked up, it doesn't actually understand anything about itself - it's just generating the most likely response given the enormous amount of pontification on the internet about this very subject...
I'm not disagreeing in principle, but the detritus left after an anthropic outage is usually quite usable in a completely fresh session. The amount of context pulled and stored in the sandbox is quite hefty.
Whist I can't usually start from the exact same point in the decisioning, I can usually bootstrap a new session. It's not all ephemeral.
To your edit: I find that the most galling thing about finding out about the thinking being discarded at cache clear.
Reconstruction of the logical route it took to get to the end state is just not the same as the step by step process it took in the first place, which again I feel counters your "feelies".
"The market sees all, knows all and will be there from the beginning of time until the end of the universe (the market has already priced in the heat death of the universe)."
i'll bet you $1000 my vocab is wider, deeper, and more sophisticated than yours despite my profuse use of profanity. interested? happily able to provide various standardized tests (SAT/GRE/LSAT/etc.) and/or your preferred method (wordle/crossword/etc.).
> because of the truly marvelous human experiences that they’ll miss
when people wax philosophical/poetical about what is essentially capital production already i'm always so perplexed - do you not realize that you're not doing art/you're not an artisan? your labor is always actively being transformed into a product sold on a market. there are no "marvelous human experiences", there is only production and consumption.
> They’ll be impoverished and confuse output with agency
> your labor is always actively being transformed into a product sold on a market. there are no "marvelous human experiences", there is only production and consumption.
The first time I used Mac OS/X, circa 2004-2005, I was blown away by the design and how they managed to expose the power of the underlying Unix-ish kernel without making it hurt for people who didn't want that experience. My SO couldn't have cared less about Terminal.app, but loved the UI. I also loved the UI and appreciated how they took the time to integrate cli tools with it.
I would say it was a marvelous human experience _for me_.
Sure it was the Apple engineers' and designers' labor transformed into a product, but it was a fucking great product and something that I'm sure those teams were very proud of. The same was true with the the iPod and the iPhone.
I work on niche products, so I've never done something as widely appreciated as those examples, but on the products I've worked on, I can easily say that I really enjoy making things that other people want to use, even if it's just an internal tool. I also enjoy getting paid for my labor. I've found that this is often a win-win situation.
Work doesn't have to be exploitive. Products don't have to exploit their users.
Viewing everything through the lens of production and consumption is like viewing the whole world as a big constraint optimization problem: (1) you end up torturing the meaning of words to fit your preconceived ideas, and (2) by doing so you miss hearing what other people are saying.
> Sure it was the Apple engineers' and designers' labor transformed into a product, but it was a fucking great product and something that I'm sure those teams were very proud of. The same was true with the the iPod and the iPhone.
...
> Work doesn't have to be exploitive. Products don't have to exploit their users.
bruh do people have any idea what they're writing as they write it? you're talking about "work doesn't have to be [exploitative]" in the same breath as Apple which is the third largest market cap company in the world and who's well known for exploiting child labor to produce its products. like has this comment "jumped the shark"?
> Viewing everything through the lens of production and consumption
i don't view everything through any lens - i view work through the lens of work (and therefore production/consumption). i very clearly delineated between this lens and at least one other lens (art).
The guy in Cupertino aren't the ones behind bars so they can't jump their deaths; for someone who supposedly "clearly delineated", you sure are mixing up those who are being exploited with the people who benefitted.
Ultimately the exploitative pyramid always terminates in a peak, and the guys working up there can for sure be having a hecking great time doing their jobs.
Maybe you'll dismiss it as another poetic waxing but what I understand they're saying is that capitalism hasn't yet captured all the inefficiencies of the human experience.
just repeating the same mistake as op: sadness/happiness is completely outside the scope here. these are aspects of a job - "design" explicitly relates to products not art. and wondering about the sadness/happiness of a job is like wondering about the marketability of a piece of art - it's completely besides the point!
OP never talked about art. Design is not art, it's problem solving. And good design according to Dieter Rams:
1. Good design is innovative
2. Good design makes a product useful
3. Good design is aesthetic
4. Good design makes a product understandable
5. Good design is unobtrusive
6. Good design is honest
7. Good design is long-lasting
8. Good design is thorough down to the last detail
9. Good design is environmentally friendly
10. Good design is as little design as possible
Generative AI just tries to predict based on its training data.
a product can be a piece of art and design can and does in practice often go hand in had with art, practically most designers also other than the utilitarian role practice the artistic one, wether you would want to group art within design as one is a matter of definitions
Whatever the merits or demerits of 'marvelous human experiences' are from the point of view of production and consumption, the OP's conclusion leaves out the important point that Alexander's 'rationalization of forces that define a problem' produces designs that come closer to solving real-life problems (even in production and consumption) than simply putting attractive lipstick on an economic utility pig. If production isn't solving real human problems, consumers will go elsewhere.
> If production isn't solving real human problems, consumers will go elsewhere.
of course but that's well within the scope of the whole paradigm (as opposed to how it is originally phrased it in relation to a loss of "marvelous human experiences"): if i use a bad tool to solve my customer's problems in an unsatisfactory way then my customers will no longer be my customers (assuming the all knowing guiding hand of the free market). so there's no new observation whatsoever in OP.
Sometimes the only way you can get basic engineering practices done like "have tests", "have a build system", "run the tests and the builds automatically", "insist that the above work" without management freaking out is to pay someone a lot of money.
You're wrong but for the right reasons: all of academic software pedagogy is about "abstractions" because academics do not work for a living (they teach). That's why whenever I hear anyone use the word abstraction I bucket them under roughly the same category (people who write software that does not matter). Think about it: if you can afford to not care about cache misses or latencies or memory hierarchies or any of the other physical details which are extremely specific (the opposite of abstract) then you are writing code that has no constraints. no scale, no externalities.
lol you joking? point to a single piece of software on your computer that is maintained by academics/researchers ("came from" means absolutely nothing - this isn't a discussion about royalties or credit).
Weasely moving the goalposts. If we were stuck with mere "maintaining", you'd still be using the most primitive CPU and OS. Besides, once something is invented and shaped and studied, even a monkey can maintain it.
The point is the things that you get to use, and tech industry gets to maintain, come from research in the academic fields, in corporate R&D research labs from people with PhDs and everything (from Xerox's to Googles and Anthropics), and of course from direct parterships with universities as well.
Not as in "they created that in 1976", as in: the past, the current, and the next things you'll use, will come from that too. This includes anything from Algol, Lisp and OO and TeX to Monads, and Futures, and Prototype inheritance, and NNs and LLMs.
my guy this is the most bog standard defense of academia that exists - that they are the original progenitors of everything. it's not even true (industry pioneers plenty of things, especially in tech/swe) but even if it were, it would still be banal because by the same logic i might as well be worshipping prokaryotes instead of academics.
> in corporate R&D research labs from people with PhDs
lol tell me you've never been in a research group without telling me. hate to break it to you, as someone with a PhD and as someone who spent some time in an industry research group at the beginning of their career, almost nothing comes out of these groups in tech (the stuff that does see the light of day is the exception that proves the rule).
> I have ever had on cold-apply, including internships
FYI cold-applying to bigtech (e.g., FAANG) is like throwing your application away. Pro-tip: ping people on LinkedIn and ask for a referral. If you're a decent candidate they'll happily do it because there's a O(1k) referral/hiring bonus at all of these companies.
The only people for whom this is an open question are the academics - everyone else understands it's entirely because of the bagillions of parameters.
reply