Adaptive AI Engine for RTS Games

Discussing the theory and practice

Reinforcement Learning – A Fast Overview

Posted by merothehero on October 25, 2009

Reinforcement Learning is one of the Machine Learning Techniques. It may be considered a combination of supervised and unsupervised learning; it also may be considered an approach to machine intelligence to solve problems using both: Dynamic programming and supervised learning.

Reinforcement learning appeals to many researchers because of its generality. In RL, the computer is simply given a goal to achieve. The computer then learns how to achieve that goal by trial-and-error interactions with its environment. Thus, many researchers are pursuing this form of machine intelligence and are excited about the possibility of solving problems that have been previously unsolvable.

The Reinforcement Learner starts with a certain state, it can choose from a list of actions an action that would transfer it to another state. With each new state acquired the Reinforcement Learner gains a reward or a punishment. The Rewards and Punishments affect the values of the states and the actions. The Aim of the Reinforcement Learner is to achieve the state with the highest value by gaining more rewards on the long run. The Reinforcement Learner uses a certain policy for choosing its actions and assigning values its states.

One of the most crucial problems in RL is the exploration-exploitation problem, whether to try a new unguaranteed action (which could a better one) or to select the best known action (which could not be the REAL best one). There are many algorithms used to solve this.

Another Problem is what’s called the credit assignment problem where we should determine what rewards should be given to which behaviors.

There are 3 main Reinforcement Learning Approaches:

Dynamic Programming

1-Needs a complete model about the environment.

2- Uses bootstrapping (Learn their estimates on the basis of other estimates or learn a guess from a guess!)


1-needs only a sample experience of the model

2-doesn’t use bootstrapping

3-Rewards are given after a complete episode (an episode maybe a game in chess or a lap in a racing game)

Temporal-Difference Learning

1-U need not wait until the end of the episode (this is very useful because some applications have very long episodes (such as our domain) and some has no episodes at all)

2-uses bootstrapping (learns a guess from the next without waiting for an actual income)

3-does not require a model of the environment like Dynamic programming.

4-It’s considered to be a combination of dynamic programming and Monte-Carlo

We will talk later about Monte-Carlo and Temporal-Difference Learning in Detail!


5 Responses to “Reinforcement Learning – A Fast Overview”

  1. ZiKaS said

    Thanks Omar for this post and I’ve this question
    How to assignment of a reward in a problem? Simply the activated behavior will take the reward

    Thanks Dude

  2. ferasferas said

    First , Very nice overview to read at morning :D.

    Second ,
    ‘credit assignment problem’ i think same as Omar said.
    maybe in RTs for ex: if the mine collector went to collect mine
    it will be given Credit for example but if it went to one that is far
    then it should take a punishment or a mine that isnt Guarded also there is a safer one.

    Actually this is a real problem made me nuts in Red ALert 1 :D.

    AnyWay , Have Fun and Keep Going.

    ربنا معاكم إن شاء الله
    والله الموفق و المستعان
    والسلام عليكم ورحمه الله و بركاته

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: