Sign in or Register

Fictron Industrial Supplies Sdn Bhd
No. 7 & 7A,
Jalan Tiara, Tiara Square,
Taman Perindustrian Sime UEP,
47600 Subang Jaya,
Selangor, Malaysia.
+603-8023 9829
+603-8023 7089
Fictron Industrial
Automation Pte Ltd

140 Paya Lebar Road, #03-01,
AZ @ Paya Lebar 409015,
Singapore.
+65 31388976
sg.sales@fictron.com

New IIoT Tool Demonstrates Potential of Digitized Pneumatics

31 May 2019
New IIoT Tool Demonstrates Potential of Digitized Pneumatics
View Full Size
Chess and Go were originally developed to mimic warfare, but they do a bad job of it. War and most other competitions always involve more than one opponent and more than one ally, and the play generally unfolds not on an orderly, flat matrix but in a variety of landscapes built up in three dimensions.
 
That’s why Alphabet’s DeepMind, having crushed chess and Go, has now resolved the far harder challenge posed by the three-dimensional, multiplayer, first-person video game. Writing today in Science, lead author Max Jaderberg and 17 DeepMind colleagues describe how a totally unsupervised program of self-learning allowed software to exceed human performance in playing “Quake III Arena.” The experiment involved a version of the game that calls for each of two teams to capture as many of the other teams’ flags as possible.
 
The teams begin at base camps set at contrary ends of a map, which is generated at random before each round. Players roam about, interacting with buildings, trees, hallways and other features on the map, as well as with allies and opponents. They try to use their laser-like weapons to “tag” members of the opposing team; a tagged player must drop any flag he might have been carrying on the spot and return to his team’s base.
 
DeepMind represents each player with a software agent that sees the same screen a human player would see. The agents have no way of knowing what other agents are witnessing; again, this is a much closer version of real strategic contests than most board games provide. Each agent begins by making choices at random, but as evidence trickles in over successive iterations of the game, it is used in a process called reinforcement learning. The result is to cause the agent’s behavior to converge on a purposeful behavior pattern, called a “policy.”
 
Each agent evolves its policy on its own, which indicates it can specialize a bit. However, there’s a limit: After every 1000 iterations of play the system compares policies and estimates how well the overall team would do if it were to mimic this or that agent. If one agent’s winning chances turn out to be less than 70 percent as high as another’s, the weaker agent copies the stronger one. Meanwhile, the reinforcement learning is itself tweaked by comparing it to other metrics. Such tweaking of the tweaker is known as meta-optimization.
 
Agents start out as blank slates, but they do have one feature built into their way of evaluating things. It’s called a multi–time scale recurrent neural network with external memory, and it keeps an eye not only on the score at the end of the game but also at earlier points. The scientists note that “Reward purely based on game outcome, such as win/draw/loss signal...is very sparse and put off, resulting in no learning. Hence, we get more frequent rewards by considering the game points stream.”
 
The program generally beats human players when starting from a randomly generated position. Even after the humans had practiced for a total of 12 hours, they still were able to win just 25 percent of the games, drawing 6 percent of the time, and losing the rest.
 
However, when two expert game testers were granted a particularly complex map that had not been used in training and were allowed to play games on that map against two software agents, the pros needed just 6 hours of training to come out on top. This result was not discussed in the Science paper but in a supplementary document made available to the press. The pros used their in-depth study of the map to identify the routes that the agents preferred and to work out how to avoid those routes.
 
So for the time being people can still beat software in a well-studied set-piece battle. Of course, real life rarely provides such opportunities. Robert E. Lee got to fight the Battle of Gettysburg just one time.



This article is originally posted on Tronserve.com

You have 0 items in you cart. Would you like to checkout now?
0 items
Switch to Mobile Version