Jump to content

Google DeepMind’s AI agents exceed "human-level" gameplay in Quake III CTF


Recommended Posts

Interesting that they used procedurally generated maps. I can only imagine the strategies that would develop if you allowed an AI like this to play 500k games on a standard map.

 

Now that I think about it, I wonder if that will become a standard way to find map glitches or loopholes while still in development. Let an AI play a few hundred thousand games and see if it figures out anything interesting.

Link to comment
Share on other sites

3 hours ago, TwinIon said:

Interesting that they used procedurally generated maps. I can only imagine the strategies that would develop if you allowed an AI like this to play 500k games on a standard map.

 

Now that I think about it, I wonder if that will become a standard way to find map glitches or loopholes while still in development. Let an AI play a few hundred thousand games and see if it figures out anything interesting.

 

That would probably be a really good idea for finding flaws.

Link to comment
Share on other sites

2 hours ago, Raggit said:

How do they handle the AIs ability to aim? Because that would strongly impact their performance. If they're basically using aim bot, it's kind of unfair. :p 

Despite using Q3 as the basis, they don't actually have guns. They can only tag opponents to get them to steal the flag.

 

They were mostly interested in how multiple AI agents would work together in a team exercise.

 

The Deepmind site has a better look at the game they played.

Link to comment
Share on other sites

3 hours ago, TwinIon said:

Interesting that they used procedurally generated maps. I can only imagine the strategies that would develop if you allowed an AI like this to play 500k games on a standard map.

 

Now that I think about it, I wonder if that will become a standard way to find map glitches or loopholes while still in development. Let an AI play a few hundred thousand games and see if it figures out anything interesting.

 

13 minutes ago, Scape Zero said:

 

That would probably be a really good idea for finding flaws.

This is exactly what they did in The Talos Principle

 

https://blog.us.playstation.com/2015/09/25/the-talos-principle-on-ps4-designing-ai-to-test-a-game-about-ai/

Link to comment
Share on other sites

2 minutes ago, Rev said:

That's very cool. That's a bit more like a test script for the whole game than the kind of machine learning that Deepmind is doing, but it's a good example of what I was thinking about.

Link to comment
Share on other sites

This is cool work. Using evolutionary algorithms to learn the reward function is an idea that's been explored in literature before (as well work in which I was involved), but not on this scale. Their reports of learning time are a bit unclear and potentially misleading. I'll have to look at this paper, but I think it's 500,000 games for each member of the population (which is huge).

 

3 hours ago, TwinIon said:

Interesting that they used procedurally generated maps. I can only imagine the strategies that would develop if you allowed an AI like this to play 500k games on a standard map.

 

Now that I think about it, I wonder if that will become a standard way to find map glitches or loopholes while still in development. Let an AI play a few hundred thousand games and see if it figures out anything interesting.

 

It actually might not do as well as you'd think. Correlated experience, which is common if you only have one agent learning with its own experience, can really fuck with learning and make it behave badly. Increasingly, you're seeing papers that do these huge distributed learning spaces to try and avoid that issue. That may not have been the motivation for lots of maps here, but there is a reasonable chance.

 

It's also a direction that grinds my gears a bit. The reinforcement-learning problem is at its heart about learning when you have to suffer your consequences. This direction of huge numbers of parallel actors is side stepping that issue and isn't practice for many real world scenarios.

 

I do like that they're looking at the evolution angle though because I've always found the interaction of evolution with internal reward functions interesting. For that reason, I'll cut them some slack in this case :p 

 

3 hours ago, Raggit said:

How do they handle the AIs ability to aim? Because that would strongly impact their performance. If they're basically using aim bot, it's kind of unfair. :p 

 

Its just looking at the image and deciding when, but they do say its reaction time and aiming is a result for some of its excellent performance, so in the further challenges they artificially added some random inaccuracy and it still did well.

Link to comment
Share on other sites

11 minutes ago, legend said:

It actually might not do as well as you'd think. Correlated experience, which is common if you only have one agent learning with its own experience, can really fuck with learning and make it behave badly. Increasingly, you're seeing papers that do these huge distributed learning spaces to try and avoid that issue. That may not have been the motivation for lots of maps here, but there is a reasonable chance.

 

It's also a direction that grinds my gears a bit. The reinforcement-learning problem is at its heart about learning when you have to suffer your consequences. This direction of huge numbers of parallel actors is side stepping that issue and isn't practice for many real world scenarios.

I'm not sure what you mean when you say the number of parallel actors sidesteps the issue of consequences.

Link to comment
Share on other sites

16 minutes ago, TwinIon said:

I'm not sure what you mean when you say the number of parallel actors sidesteps the issue of consequences.

 

So in RL, you have an agent. It doesn't know how to behave, so it takes an action and observes what happens. If that action was bad, too bad. If you want to know how something else would have turned out if you behaved differently, too bad. You don't get to magically reset to where you were. You'll have to somehow work your way back to that scenario (or something similar where similar is something you also have to learn) to see what else could have been. The agent is forced to deal with the consequences of what it did.

 

Because of how challenging that scenario is, a seriously important concept in RL is how the agent balances exploration vs exploitation. Is the current state a good state to try something new? Or should the agent instead try to exploit what it's learned at this point? This balance, and even how you choose to explore, are huge challenging topics, for which we by and large don't have good answers.

 

So what are people doing now instead? They've given the agent the ability to clone itself, or versions of itself, at huge scales so that in effect, any single decision isn't all that important and the agents don't have to worry so much about careful and useful exploration.

 

 

The way these huge numbers of actors learn is also often not merely equivalent to learning longer because of how the sequence of learning updates work. But that might require a bit more mathematical discussion to explain.

 

 

As I said, I think that they were explicitly looking at the evolution of reward functions is interesting and you have to investigate learning across a population to do that. So I'm less down on it here for the interest in that. But other stuff lately has been especially abusive IMO.

Link to comment
Share on other sites

53 minutes ago, legend said:

 

Psh. The Talos Principle is one of the best puzzle games out there.

 

I played it through from start to finish and it was literally the same puzzle mechanic from beginning to end. By the time I finished I was so fuckin' tired of pointing lasers, and the whole sentient A.I. philosophy got old about halfway through. I literally forced myself to finish it.

Link to comment
Share on other sites

1 hour ago, XxEvil AshxX said:

 

I played it through from start to finish and it was literally the same puzzle mechanic from beginning to end. By the time I finished I was so fuckin' tired of pointing lasers, and the whole sentient A.I. philosophy got old about halfway through. I literally forced myself to finish it.

 

It literally was not the same puzzle mechanic from beginning to end. Not only are you being reductive, you're also wrong! :p 

Link to comment
Share on other sites

1 minute ago, Rev said:

I need to finish that game. The only part that annoyed me was the philosophy part. I'm not sure why I shelved it.

 

What annoyed me about the philosophy part is I had great answers for all the questions, but they were never options :p 

 

In that way, it's a lot like too many modern philosophers who haven't kept up with what we learned in math since the 50s.

Link to comment
Share on other sites

2 minutes ago, legend said:

 

What annoyed me about the philosophy part is I had great answers for all the questions, but they were never options :p 

 

In that way, it's a lot like too many modern philosophers who haven't kept up with what we learned in math since the 50s.

Yep, exactly the same experience for me. These are topics I've devoted a *ton* of time to thinking about and I was forced to answer badly. Not a deal-breaker, just annoying.

Link to comment
Share on other sites

16 minutes ago, mo1518 said:

I wonder if we'll get to a stage where instead of watching human streamers play games, we'll watch various AIs built by different teams compete against each other. Like battle bots but for video games. 

It's huge in chess already so maybe.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...