visit
agent — The AI player. In this case it will be the shooter at the bottom attacking the aliens
environment — The complete surroundings of the agent (i.e. the barriers in front of it and the aliens above)
action — Something the agent has the option of doing (i.e. move left, move right, shoot, do nothing)
step — Choosing and performing 1 action
state — The current situation the AI is in
When we apply this to Space Invaders, the agent extracts observations from the environment which will help it decide the right action to take. For example, if it sees a bullet coming towards it, it will move out of the way!
Neural Network Diagram Neural Networks (NN) are modeled after the human brain. Like the brain has neurons which transmit information between themselves, a NN has “nodes” (the white circles) which process information and perform operations. In a standard NN, there are three basic types of layers: an input layer, hidden layers, and an output layer. The input layer takes the input, which is processed by the hidden layers, and finally spit out the output in the output layer.
Convolutional Neural Network Diagram We will be using a special type of Neural Network known as a Convolutional Neural Network (CNN). What makes CNNs unique is their ability to analyze imagery. They contain convolutional layers which take images as inputs and sweep over them pixel by pixel. Each convolutional layer is designed like a filter, searching for something specific in the image. You can have layers searching for something as general as a square to something more complicated like a bird! Okay, so our AI can tell what is what in our environment. AGENT LOOK OUT A BULLET!
Static state Which way is it going? How fast is it going? Will we be able to go across without getting hit? All of these things can’t be determined with a static image, so just like we don’t have answers to these questions, neither does the AI.
The last thing we need to do so that our AI can properly understand the environment is stack images to add directional sense. If we take 1 image, wait a couple frames, take another image, and so on, we are able to create something similar to a GIF for our AI to perceive.
4 states stacked on top of each other It’s a lot easier to see that the bullets are coming towards us and at what speed!
Example Q-Table
A Q-Table contains a column for every possible action in your game, and a row for every possible state in your game. In the cell where they meet, we put the Maximum Future Expected Reward we expect by doing that action in the given state. Intuitively, the AI will always perform the action which will render the highest reward.
We can’t expect to know all of the future rewards right away because then we wouldn’t need an AI to play the game in the first place. We predict these numbers through the Bellman Equation.
If you are like me, you probably got a headache just by looking at it. To break it down, we take the reward from an action/state pair and use a neural network to predict the highest expected reward in the next state. We do this for each possible action and take the highest value. Finally, we perform the action that will yield the calculated reward. Our AI is always looking one step ahead!
When the AI actually starts playing the game, we use a training method called epsilon-greedy. At the start, the value of epsilon is very high. While epsilon is high, the AI will choose its actions randomly since there is no knowledge to base its actions upon. As the AI unintentionally stumbles upon rewards or penalties, it starts to associate performing certain actions in certain states as either good or bad and fills in the Q-Table accordingly. As training goes on, the value of epsilon slowly decays. With a lower value for epsilon, there is a higher likelihood that the AI will make decisions based on its Q-Table instead of randomly generating it.
By the time epsilon becomes practically 0, the AI should have a well filled out Q-Table to help it make decisions.Overall the AI plays pretty well considering the amount of training it did. AIs generally take an EXTREMELY long time to train and a lot of computational power. When you think your AI has learned it all, it stumbles across a new problem! My AI recently discovered that it can lose if the Aliens reach the bottom. As seen in the video, it completely ignored the Aliens which were getting closer and closer. With some more training, it should realize that it can increase its total rewards by shooting the Aliens before they reach the bottom since that will prolong the game.
Reinforcement Learning is a system where the AI is rewarded for doing the right thing thus it learns what to do and what not to do
Convolutional Neural Networks are a special type of Neural Network which can recognize images using Convolutional Layers
Q-Learning is when the AI plays the game and records what actions in what states give what rewards in a Q-Table. After a lot of training, they used the develop Q-Table to play to the best of their abilities