The problem with deep learning is opposite. You can understand most of it with j...

ftxbro · on April 25, 2023

> Advanced math is mostly useless because of the dimensionality of neural nets.

It depends what you mean by advanced math. There is a lot of math that only really comes into play because of the high dimensionality! For example math related to tensor wrangling, low rank approximations, spectral theory, harmonic theory, matrix calculus derivatives, universality principles, and other concepts that could be interesting or bewildering or horrifying depending how you react to it. Of course some of it is only linear algebra of the 'just high school math' kind but that's not how I would normally describe it. If you look at the math in the proofs in the appendices of the more technical AI papers on arxiv there is often some weird stuff in there, not just matrix multiply and softmax.

amelius · on April 26, 2023

Yes but do you have examples of "higher" math not being just a curiosity and actually making it into real world models and training algorithms?

ftxbro · on April 26, 2023

Well I suppose that in some sense you are right. You can do deep learning without even knowing any math at all, by plugging together libraries and frameworks that other people wrote.

Also maybe you will say that "higher" math is by definition a curiosity and if it's practical then it's not "higher".

But if those aren't your arguments, then you can consider one example that the tensor 'differentiable programming' libraries used in deep learning use automatic differentiation and matrix calculus. Matrices are taught in high school, and calculus is taught in high school, but matrix calculus generally isn't as far as I know. Or at least not at my high school. https://en.wikipedia.org/wiki/Matrix_calculus

uoaei · on April 25, 2023

That's like saying you understand state-of-the-art CFD code because you can read Fortran.

There are many aspects to learning systems that we still don't have any kind of grasp on, and will take more than a little advanced math (statistics/probability theory, transport theory, topology, etc.) to understand as a community.

Dunning-Kruger is probably more common in spaces like this one, where people carry social capital for being able to "spin up quickly". But the true meta-skill of upskilling is turning unknown unknowns (UU) into known unknowns (KU), and then into known knowns (KK). It's not enough to just jump from UU to KK through osmosis by reading blog posts on a news aggregator, because there will still be a huge space of unknowns not covered by that approach.

mach1ne · on April 25, 2023

Yes, it’s really rather like alchemy in some sense. Stuff works, and often nobody knows exactly why.

uoaei · on April 25, 2023

"I don't follow the latest ML scaling and theory research" does not in any way equate to "these things are unknowable".

lhnz · on April 25, 2023

Hm, watching Neel Nanda videos recently and I do get the feeling that there are lots of unknowns in ML and also in what trained networks have learnt.

Utkarsh_Mood · on April 25, 2023

can you elaborate further on what you mean by 'dimensionality of neural nets'? Thanks!

amelius · on April 25, 2023

Yes, I mean the huge number of trainable parameters.