While this is an interesting project and work, I do want to call attention to the fact that this is a highly specialized AI that's trained and customized per-track that is being raced on. It's not a generic one that can beat humans (or even make it to the finish line) on a random track that you plop it down on.
Impressive work though, and amazing what the community as a whole has accomplished for even enabling this sort of thing to be done. Doubly so on a game that as far as I'm aware doesn't have anything to enable API access that would make training this model easy without all the tools that have been developed for it in the endless quest for more speed.
I wonder how much of the computer advantage is due to actual intelligence vs superhuman frequency of actions and sensitivity of perception.
A human cannot tell if the car is one-pixel screwed but AI can correct it. As the round track is unstable equilibrium on top, having sensitive perception gives great advantage. Same with high frequency of correction and control.
Yes, I'd be interested to see how it compares if it's limited to input frequency more similar to a human and then run that second test against Wirtual.
I think though, that the main reason that it's almost always beating humans is that it is able to do the run thousands of times, something that a human will not have actual time to do. It also doesn't have that human element of worry or "if I fall now, it will be a wasted run" feeling so it can just commit to everything. If a human player was able to devote that much time to it and not care about any particular run, I feel like humans could still win that second map.
This is a very difficult thing to make a good metric for: take alphastar (deepmind's starcraft 2 AI), for example. They made a big deal about keeping its average APM (actions per minute, more or less how many button presses and mouse clicks a player makes) within human limits, and yet the general sentiment is that it's better because of extreme micro performance that's not replicable by humans. The issue is twofold: firstly average APM is not a particularly good constraint because humans cannot generally burst APM as high as an AI can (and it's generally useful for them to 'waste' APM in slower periods to keep a pace going). Secondly, not all actions are equal: the vast majority of actions a human player takes are pretty low-impact, and the high-impact actions are harder to do at the same rate as the low ones, wheras the AI can far more efficiently spend an APM budget even if a peak rate is enforced. Which basically leads to an AI that beats pro players because it can perfectly micro blink stalkers, making that army substantially more powerful, as opposed to being particularly good at strategy.
(Which all-in-all means I would consider this kind of AI more comparable to a TAS (tool-assisted-speedrun) than a human run. It would be very interesting if such an approach can beat human-created TAS)
Yep and then alphastar started winning matches against top players by individually microing ~20 stalkers at the same time. Not something any human, even a world champion, can do.
The fun thing is that alphastar did have influence on the starcraft scene. Overbuilding drones used to be taboo, now everybody does it. And even individually microing stalkers ... now it's quite normal for players to go out with 2-3 stalkers and indeed individually control them. These are good tactically, even when you're not an AI.
I'm very out of loop on SC 2 meta, I've played only in the past and up to Gold as Protoss; why is overbuilding drones considered to be a good thing now? Is it to reduce the economic setback in case you get harassed?
More reasons than just that. Overbuild drones while expanding (or even before) and then transfer to have a new base up and fully droned in construction time + 10s. You overbuild because you're going to lose 1-2 drones to any capable attack minimum. You can risk more on the defense when overbuilt. You have bigger chances of being able to expand during a base trade, ...
It's hard to tell what the "input frequency" even means in this case. For example for keyboard players tapping is a common technique, while controller/joystick players will rely more on fluid control. Does it mean AI can do a single frame tap/release? What about gradual increase in steering between steps? There's lots more edge cases than that too.
That's not really how it works. It's trained specifically for one track, it doesn't really process the visual information in terms of "the car is off course, correct it". It's more like "when the image looks like this, do this". If you kept the same map but added a bunch of trees off the track it would probably need to be retrained. Assuming I'm right about that, it shows that the "AI" doesn't really look at the track as you might think it does.
It's basically just a fancy brute force algorithm.
Am I wrong in thinking that falling off is not punished enough in this approach? Looking at the numbers provided, falling off seems to still add some distance and not get any punishment, just an end of episode. I'd be tempted to subtract 1 for each step after the car falls off, otherwise the RL will accept the distance gain as progress.
Can anyone spot better opportunities for improvement?
The current trend for yt videos really seems to be 30 minute videos instead of the old meta of 10 minutes. Frankly that makes it much less accessible for adults.
Impressive work though, and amazing what the community as a whole has accomplished for even enabling this sort of thing to be done. Doubly so on a game that as far as I'm aware doesn't have anything to enable API access that would make training this model easy without all the tools that have been developed for it in the endless quest for more speed.