Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You're not wrong, but I believe you're so far off on the necessary scale that it'll never solve the problem.

For an AI to learn to play Bomberman at an acceptable level, you need to run 2-3 billion training steps with RL learning where the AI is free to explore new actions to collect data about how well they work. I'm part of team CloudGamepad and we'll compete in the Bomberland AI challenge finals tomorrow, so I do have some practical experience there. Before I looked at things in detail, I also vastly overestimated reinforcement learning's capabilities.

For an AI to learn useful policy without the ability to confirm what an action does, you need exponentially more data. There's great papers by DeepMind and OpenAI that try to ease the pain a bit, but as-is, I don't think even a trillion miles driven would be enough data. Letting the AI try out things, of course, is dangerous, as we have seen in the past.

But the truly nasty part about AI and RL in particular is that the AI will act as if anything that it didn't see often enough during training simply doesn't exist. If it never sees a pink truck from the side, no "virtual neurons" will grow to detect this. AIs in general don't generalize. So if your driving dataset lacks enough examples of 0.1% black swan events, you can be sure that your AI is going to go totally haywire when they happen. Like "I've never seen a truck sideways before => it doesn't exist => boom."



> But the truly nasty part about AI and RL in particular is that the AI will act as if anything that it didn't see often enough during training simply doesn't exist. If it never sees a pink truck from the side, no "virtual neurons" will grow to detect this. AIs in general don't generalize. So if your driving dataset lacks enough examples of 0.1% black swan events, you can be sure that your AI is going to go totally haywire when they happen. Like "I've never seen a truck sideways before => it doesn't exist => boom."

Let's not overstate the problem here. There are plenty of AI things which would work well to recognize a sideways truck. Look at CLIP, which can also be plugged into DRL agents (per the cake); find an image of your pink truck and text prompt CLIP with "a photograph of a pink truck" and a bunch of random prompts, and I bet you it'll pick the correct one. Small-scale DRL trained solely on a single task is extremely brittle, yes, but trained over a diversity of tasks and you start seeing transfer to new tasks and composition of behaviors and flexibility (look at, say, Hide-and-Seek or XLAND).

These are all in line with the bitter hypothesis that much of what is wrong with them is not some fundamental problem that will require special hand-designed "generalization modules" bolted onto them by generations of grad students laboring in the math mines, but simply that they are still trained on too undiverse problems for too short a time with too little data using too little models, and that just as we already see strikingly better results in terms of generalization & composition & rare datapoints from past scaling, we'll see more in the future.

What goes wrong with Tesla cars specifically, I don't know, but I will point out that Waymo manages to kill many fewer people and so we shouldn't consider Tesla performance to even be SOTA on the self-driving task, much less tell us anything about fundamental limits to self-driving cars and/or NNs.


> What goes wrong with Tesla cars specifically, I don't know, but I will point out that Waymo manages to kill many fewer people and so we shouldn't consider Tesla performance to even be SOTA on the self-driving task, much less tell us anything about fundamental limits to self-driving cars and/or NNs.

Side note, but I think Waymo is treating this more like a JPL "moon landing" style problem and Tesla is trying to sell cars today. It's very different to start with making it possible and then scaling it down vs trying to build something working backwards from the sensors and compute economical to ship today.


My theory is that gradient descent won't work for AGI and we'll need a completely new approach before we get safe general driving. But that's just my theory.

"Let's not overstate the problem here. There are plenty of AI things which would work well to recognize a sideways truck"

Yes, but nothing reliable enough to trust your life with. Clip also has some pretty weird bugs.

https://aiweirdness.tumblr.com/post/660687015733559296/galle...


Gradient descent has ushered out many challengers who were sure they could do things gradient descent could never do even on bigger compute. I'm not worried. (My own belief is that gradient descent is so useful that any better optimization approach will simply evolve gradient descent as an intermediate phase for bootstrapping problem-specific learning. It's a tower of optimizers all the way down/up.)

You can't call that a 'CLIP bug' because using CLIP for gradient ascent on a diffusion model is not remotely what it was trained or intended to do, and is not much like your use-case of detecting real world objects. It's basically adversarial pixel-wise hacking, which is not what real world pink trucks are like. Also, that post was 7 months ago, and the AI art community advances really fast. (However bad you think those samples are, I assure you the first BigSleep samples, back in February 2021 when CLIP had just come out, would have been far worse.) 'Unicorn cake' may not have worked 7 months ago, but maybe it does now... Check out the Midjourney samples all over AI art Twitter the past month.


The sensors self-driving cars use are far less sensitive to color than human eyes.

You can generalize your concept to the other sensors, but sensor fusion compensates somewhat... The odds of an input being something never seen across all sensor modalities become pretty low.

(And when it did see something weird, it can generally handle it the way humans do... Drive defensively).


> with RL learning where the AI is free to explore new actions to collect data about how well they work

Self driving cars aren’t free to explore new actions. That would be frightening. Self driving cars use a limited form of AI to recognise the world around them, but the rules that decide what they do with that information are simple algorithms.


What were the new data augmentation methods for optical flow you referred to on a previous comment on this topic ?


I'm not quite sure what you mean, but I do remember writing this last time there were AI approach discussions: https://news.ycombinator.com/item?id=29898425

And https://news.ycombinator.com/item?id=29911293 was about my Sintel approach.

But that was a while ago, so nowadays new submissions have pushed me further down, but archive.org confirms that I was leading in "d0-10" and "EPE matched" on 25 Sep 2020:

https://web.archive.org/web/20200925202839/http://sintel.is....

On a completely unrelated note: I'm so looking forward to the Bomberland finals in 15 minutes :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: