Adaptive AI Engine for RTS Games

Discussing the theory and practice

Archive for the ‘Reinforcement Learning’ Category

Introduction to Reinforcement Learning

Posted by merothehero on November 3, 2009


Posted in Presentations, Reinforcement Learning | Leave a Comment »

Reinforcement Learning – A Fast Overview

Posted by merothehero on October 25, 2009

Reinforcement Learning is one of the Machine Learning Techniques. It may be considered a combination of supervised and unsupervised learning; it also may be considered an approach to machine intelligence to solve problems using both: Dynamic programming and supervised learning.

Reinforcement learning appeals to many researchers because of its generality. In RL, the computer is simply given a goal to achieve. The computer then learns how to achieve that goal by trial-and-error interactions with its environment. Thus, many researchers are pursuing this form of machine intelligence and are excited about the possibility of solving problems that have been previously unsolvable.

The Reinforcement Learner starts with a certain state, it can choose from a list of actions an action that would transfer it to another state. With each new state acquired the Reinforcement Learner gains a reward or a punishment. The Rewards and Punishments affect the values of the states and the actions. The Aim of the Reinforcement Learner is to achieve the state with the highest value by gaining more rewards on the long run. The Reinforcement Learner uses a certain policy for choosing its actions and assigning values its states.

One of the most crucial problems in RL is the exploration-exploitation problem, whether to try a new unguaranteed action (which could a better one) or to select the best known action (which could not be the REAL best one). There are many algorithms used to solve this.

Another Problem is what’s called the credit assignment problem where we should determine what rewards should be given to which behaviors.

There are 3 main Reinforcement Learning Approaches:

Dynamic Programming

1-Needs a complete model about the environment.

2- Uses bootstrapping (Learn their estimates on the basis of other estimates or learn a guess from a guess!)


1-needs only a sample experience of the model

2-doesn’t use bootstrapping

3-Rewards are given after a complete episode (an episode maybe a game in chess or a lap in a racing game)

Temporal-Difference Learning

1-U need not wait until the end of the episode (this is very useful because some applications have very long episodes (such as our domain) and some has no episodes at all)

2-uses bootstrapping (learns a guess from the next without waiting for an actual income)

3-does not require a model of the environment like Dynamic programming.

4-It’s considered to be a combination of dynamic programming and Monte-Carlo

We will talk later about Monte-Carlo and Temporal-Difference Learning in Detail!

Posted in Reinforcement Learning | 5 Comments »