Adaptive AI Engine for RTS Games

Discussing the theory and practice

Author Archive

Thank You Kariem Mohamed Ali !

Posted by merothehero on February 7, 2010

Kariem Mohamed Ali – our friend in our very same college- managed to design our Project’s Poster and Brochure yesterday to be used in the ICT Conference today.He proved to be a true professional in his work. This was observed through his great art work and his speed. His Patience to our many demands made us even much more grateful.

Our Designer Friend Kariem Mohamed Ali

Our Designer Friend Kariem Mohamed Ali

Oh ! I Forgot to say that we have chosen the name “I-Strategizer” as the “cool” or “commercial” name of our project instead of the current scientific name.

Some of Kariem’s Work in less than 2 days :

Thank You Kariem … We Owe you a lot !

The I-Strategizer Team


Posted in Thanks | Tagged: , | 4 Comments »

Thank You Hassan Ibraheem !

Posted by merothehero on December 5, 2009

Thanks to Hassan Ibraheem -our colleague and friend- we were able to compile and run the Wargus Game on windows at last. In case we switch our test-bed from BosWars to Wargus , that would be a great favor to us.

Hassan spent with us a lot of hours and effort to do this, THANK YOU HASSAN !

Posted in Thanks | 5 Comments »

Introduction To Heuristically accelerated Hierarchical RL in RTS Games

Posted by merothehero on December 5, 2009

Posted in HAHRL-RTS Platform | Tagged: , , , | 1 Comment »

Just an Intro : Heuristically Accelerated Hierarchical Reinforcement Learning in RTS Games

Posted by merothehero on December 3, 2009


In this document I will analyze the game play and strategies of RTS Games, and then I will give a brief about how the Heuristically Accelerated Hierarchical Reinforcement Learning System (HAHRL-RTS System) will work.

The Strategy Game: An Analysis

There’s no doubt that strategy games are complex domains: Gigantic set of allowed Actions (almost infinite), Gigantic set of Game States (almost infinite), imperfect information, nondeterministic behavior, and with all this: Real-time Planning and Reactions are required.

Many of the Approaches to applying learning or planning to RTS Games considered the AI as one solid learning part; this leads to the great complexity at applying the techniques used.

I thought about: How can I simplify everything?

Firstly, I listed all the primitive actions which could be done by a human player:

1-      Move a unit

2-      Train/Upgrade a unit

3-      Gather a resource

4-      Make a unit attack

5-      Make a unit defend

6-      Build a building

7-      Repair a building

NB: Upgrading units or buildings is not available in BosWars but found in most RTS Games.

Any player wins by doing 2 types of actions simultaneously, either an action that strengthens him or an action that weakens his enemy (Fig 1).
NB: We neglect useless actions here and suppose the player is perfect.

When a human plays a strategy game, he doesn’t learn everything at the same time. He learns each of the following 6 sub-strategies separately:

1-      Train/Build/Upgrade attacking Units: What unit does he need to train??

  1. Will he depend on fast cheep units to perform successive fast attacks or powerful expensive slow units to perform one or two brutal attacks to finish his enemy? Or will it be a combination of the two which is often a better choice?
  2. Does his enemy have some weak points concerning a certain unit? Or his enemy has units which can infiltrate his defenses so he must train their anti-units?
  3. Does he prefer to spend his money on expensive upgrades or spend it on more amounts of non-upgraded units?

NB: I deal with attacking Buildings as static attacking units

2- Defend: How will he use his current units to defend?

  1. Will he concentrate all his units in one force stuck to each other or will he stretch his units upon his borders? Or a mix of the two approaches?
  2. Will he keep the defending units (which maybe an attacking building) around his buildings or will he make them guard far from the base to stop the enemy early. Or a mix of the two approaches?
  3. If he detects an attack on his radar, will he order the units to attack them at once, or will he wait for the opponent to come to his base and be crushed? Or a mix of the two approaches?
  4. How will he defend un-armed units? Will he place armed units near them to for protection or will he prefer to use the armed units in another useful thing? If an un-armed unit is under attack how will he react?
  5. What are his reactions to different events while defending?

3- Attack: How will he use his current units to attack?

  1. Will he attack the important buildings first? Or will he prefer to crush all the defensive buildings and units first? Or a mix of the two approaches?
  2. Will he divide his attacking force to separate small forces to attack from different places, or will he attack with one big solid force? Or a mix of the two approaches?
  3. What are his reactions to different events while attacking?

4- Gather Resources: How will he gather the resources?

  1. Will he train a lot of gatherers to have a large rate of gathering resources? Or will he train a limited amount because it would be a waste of money and he wants to rush (attack early) in the beginning of the game so he needs that money? Or a mix of the two approaches?
  2. Will he start gathering the far resources first because the near resources are more guaranteed? Or will he be greedy and acquire the resources the nearer the first? Or a mix of the two approaches?

5-      Construct Buildings: How does he place his buildings? Will he stick them to each other in order to defend them easily? Or will he leave large spaces between them to make it harder for the opponent to destroy them? Or a mix of the two approaches?

6-      Repair: How will he do the repairing? Although it’s a minor thing, but different approaches are used. Will he place a repairing unit near every building in case of having an attack, or will he just order the nearest one to repair the building being attacked? Or a mix of the two approaches?

The HAHRL-RTS System

Since the 6 sub-strategies do not depend on each other (think of it and you’ll find them nearly independent),  So, I will divide the AI system to a hierarchy as shown in figure 1, each child node is by itself a Semi-Marcov decision process (SMDP) where Heuristically Accelerated Reinforcement Learning Techniques will be applied. Each child node will be later divided into other sub-nodes of SMDPs.

So now you’ve understood why it is hierarchical, but you may be wondering why “heuristically accelerated”?

Well, because heuristic algorithms will be applied to guide and accelerate the learning process. I hear you asking what the heuristic algorithms are. But that’s another story will be said later!

Posted in HAHRL-RTS Platform | Tagged: , , , , | Leave a Comment »

2apl Language : A Fast Overview

Posted by merothehero on November 6, 2009

2apl is a agent oriented prolog-like language which helps the programming of the interaction between different BDI ( Belief-Desire-Intention ) agents in a multi-agent system.

The Whole System is specified through a (.mas) file , the code related to each agent is put in (.2apl) files . Agents could inherit from others.

A 2APL agent may have beliefs and goals which change during the agent’s execution.

Basic Actions an agent may do

Belief Update Action

A belief update action updates the belief base of an agent when executed.

Communication Action

A communication action passes a message to another agent

External Action

An external action is supposed to change the external environment in which the agents operate.

Abstract Action (similar to a procedure)

An abstract action is an abstraction mechanism allowing the encapsulation of a plan by a single action.

Belief and Goal Test Actions

A belief test action is to test whether a belief expression is derivable from an agent’s belief base, tests whether the agent has a certain belief.

Goal Dynamics Actions

The adopt goal and drop goal actions are used to adopt and drop a goal to and from the agent’s goal base, respectively.


In order to reach its goals, a 2APL agent adopts plans. A plan consists of basic actions composed by process operators.

Reasoning rules

The 2APL programming language provides constructs to implement practical reasoning rules that can be used to implement the generation of plans.

Planning Goal Rules (PG rules)

A planning goal rule specifies that an agent should generate a plan if it has certain goals and beliefs.

Procedural Rules (PC rules)

Procedural rules generate plans as a response to 1) the reception of messages sent by other agents, 2) events generated by the external environment, and 3) the execution of abstract actions.

Plan Repair Rules (PR rules)

The execution of an agent’s action might fail. To repair such actions 2APL provides so-called plan repair rules.

The Deliberation cycle

The beliefs, goals, plans and reasoning rules form the mental states of the 2APL agent. What the agent should do with these mental attitudes is defined by means of the deliberation cycle. The deliberation cycle states which step the agent should perform next, e.g. execute an action or apply a reasoning rule. The deliberation cycle can thus be viewed as the interpreter of the agent program, as it determines which deliberation steps should be performed in which order. 2APL provides the deliberation cycle as illustrated in this figure :

Posted in 2apl Language | Leave a Comment »

Introduction to Reinforcement Learning

Posted by merothehero on November 3, 2009

Posted in Presentations, Reinforcement Learning | Leave a Comment »

Our Project’s Research Current Situation – 25 October 2009

Posted by merothehero on October 25, 2009

We have come to the most crucial part of our project’s life cycle. It’s This CREATIVE part concerned with creating new techniques or developing existing ones using one or more approaches.

It’s now clear to us that offline Learning (Learning in the company developing the game before releasing the game) is done through Evolutionary Algorithms, also through Cased-based Planning (based on case-based reasoning) through learning from simulation. At the moment we do not wish to enter this field, our main goal is Real-Time Learning.

On the other hand, Online Learning is done through Reinforcement Learning, as well as Case-Based planning as well. Most of the online learning papers make the learning just after the game is finished, but little theses of these are doing Real-Time Learning (learning while playing).

It seems that the most approaches used are Reinforcement Learning and Case-Based Reasoning.  Genetic Algorithms are extremely offline learning.

Since more than a week, we divided ourselves to two groups, me and Saqr studying Reinforcement Learning and trying to implement it, Abdurrahman and A. Atta studying Case-Based Planning and trying to implement it.

I and Saqr are currently:

  • Mastering RL through the ultimate Reinforcement Learning book “RL : An Intro”
  • Understanding Eric Kok’s Paper “Adaptive Reinforcement Learning agents in RTS Games – 2008” as well as its implementation
  • Understanding the paper “Transfer Learning in RTS Games using Hybrid CBR/RL”
  • Implementing a simulation for a Mini-RTS Game to be able to test our Techniques on it.

While Abdurrahman and A. Atta are currently:

  • Understanding more Case-Based Planning.
  • Understanding more CBP papers and their implementation.
  • Starting to think about a design for a platform with CBP to be used with Learning in BosWars (Our Open Source Game)

We have till the 11th of November to complete at least 80% of these Tasks. So may ALLAH be with us!

Posted in Project Plan and Tasks | 3 Comments »

Reinforcement Learning – A Fast Overview

Posted by merothehero on October 25, 2009

Reinforcement Learning is one of the Machine Learning Techniques. It may be considered a combination of supervised and unsupervised learning; it also may be considered an approach to machine intelligence to solve problems using both: Dynamic programming and supervised learning.

Reinforcement learning appeals to many researchers because of its generality. In RL, the computer is simply given a goal to achieve. The computer then learns how to achieve that goal by trial-and-error interactions with its environment. Thus, many researchers are pursuing this form of machine intelligence and are excited about the possibility of solving problems that have been previously unsolvable.

The Reinforcement Learner starts with a certain state, it can choose from a list of actions an action that would transfer it to another state. With each new state acquired the Reinforcement Learner gains a reward or a punishment. The Rewards and Punishments affect the values of the states and the actions. The Aim of the Reinforcement Learner is to achieve the state with the highest value by gaining more rewards on the long run. The Reinforcement Learner uses a certain policy for choosing its actions and assigning values its states.

One of the most crucial problems in RL is the exploration-exploitation problem, whether to try a new unguaranteed action (which could a better one) or to select the best known action (which could not be the REAL best one). There are many algorithms used to solve this.

Another Problem is what’s called the credit assignment problem where we should determine what rewards should be given to which behaviors.

There are 3 main Reinforcement Learning Approaches:

Dynamic Programming

1-Needs a complete model about the environment.

2- Uses bootstrapping (Learn their estimates on the basis of other estimates or learn a guess from a guess!)


1-needs only a sample experience of the model

2-doesn’t use bootstrapping

3-Rewards are given after a complete episode (an episode maybe a game in chess or a lap in a racing game)

Temporal-Difference Learning

1-U need not wait until the end of the episode (this is very useful because some applications have very long episodes (such as our domain) and some has no episodes at all)

2-uses bootstrapping (learns a guess from the next without waiting for an actual income)

3-does not require a model of the environment like Dynamic programming.

4-It’s considered to be a combination of dynamic programming and Monte-Carlo

We will talk later about Monte-Carlo and Temporal-Difference Learning in Detail!

Posted in Reinforcement Learning | 5 Comments »

Learning – Artificial Intelligence for Games – A Summary

Posted by merothehero on July 28, 2009


A summary of selected topics in the chapter

From the Book: Artificial Intelligence for Games – 2006

Chapter 7

Author: Ian Millington

  1. I.            Introduction

Online or Offline

  • Majority of Learning is done offline (either between levels of the game or at the development studio)
  • Online Learning is while the game is playing.

Intra-Behavior Learning Algorithm

  • The simplest kind of learning that changes a small part of the character’s behavior
  • Good Example: learning how to evade, learning to find cover points in a room
  • Bad Example: if a character is trying to jump a wall it won’t tell him to use the stairs instead

Inter-Behavior Learning Algorithm

  • The character learns “new” things (not in a specific scope)
  • Example: A Character that learns to tie a rope across the street to stop an escaping motorbike
  • Unfortunately this kind of AI is almost pure fantasy.
  • It can be implemented by choosing to between a range of different behaviors, but still each distinct behavior will require implementation by the developer
  • It’s not economically good to learn everything, basic movement systems for example is easier and faster to be implemented directly.
  • Developers are experimenting on replacing the ordinary decision making systems (FSM, FUSM, Scripting, Decision Trees, and Markov Systems) with Learning Techniques


  • The biggest problem with learning is reproducibility and quality control.
  • Learning the Wrong thing is a big problem that industrial companies found
  • The more flexible your learning is the less control you have on your game-play
  • Normal solution to this is to limit the kind of things that can be learned in a game

Over-Learning (Over-fitting)

  • When the AI learns only from specific situations or experiences, but we normally want the AI to be able to generalize the learning.
  • Neural Networks for example can over-fit during learning if they are wrongly parameterized or if the network is too large for the learning task at hand. (From my Readings over-fitting happens because of local maximum points)

The Balance of Effort

  • Learning is done to decrease the implementation work ,because you don’t need to handle every event or trick happening to the character, instead it will learn to handle it itself.
  • Unfortunately, Learning often increases the implementation work, because they require a lot of work such as: presenting data in the correct way, making sure the results are valid and testing the AI to avoid it learning the wrong things.
  1. II.            Parameter Modification


  • Not considered a “learning” technique.
  • The simplest Learning technique is to calculate the value of one or more parameters such as: cost functions for path finding and possibilities in decision making.
  • Most commonly learning parameters like these is done offline, but can usually be controlled when performed online.
  • For each value of the parameter there is some energy value (or fitness value) which represents how good the value of the parameter is for the game.
  • Fitness landscape is used when we’re looking for maximum, while Energy Landscape is used when we’re looking for minimum. So they are just different terminologies.

Getting the score out of the parameter values

  • If the value can be generated from some function or formula, if it’s a simple mathematical formula then we can differentiate it so the best values can be found explicitly. In this case there’s no need for parameter optimization.
  • In most cases, this formula doesn’t exist, the only way to find out the value is to try the game and see how it performs and produce a fitness or energy score for the parameter.
  • We should have a mechanism to determine the value for a set of a parameters quickly such as: allowing the game to run at high speed without rendering the screen or use a set of heuristics that generate a value based on some assessment criteria without ever running the game, if there is no way to determine the value for the parameters other than running the game with the player, then all the techniques in the chapter are unlikely to be practical.

Hill Climbing

  • An Algorithm that tries to work out in what direction to change the parameter in order to improve its score.
  • It often fails to get the optimal solution if the score of the parameter values is random.
  • Its Performance is O(n) where n is the number of parameters while it’s O(1) in memory.

Extensions for Hill Climbing

  • There are Several extensions to support hill climbing with more random score:
    • Momentum: if the search is consistently improving in one direction, then it should continue in that direction for a little while, even if it seems that things aren’t improving anymore. It takes much longer to reach the best parameter value, but it doesn’t get stuck easily in a local maximum point on its way to the peak.
  • Adaptive Resolution: when a parameter is long away from the best value, taking small steps in learning is slow but if the steps are large, the best value could never be reached. This problem is solved by Adaptive Resolution, where long jumps are made earlier and small jumps later on, and as long as the score is improving the steps are bigger, when its stops improving the steps become smaller.
  • Multiple Trials: Hill Climbing is very dependent on the initial guess, so trying the algorithm from a different initial starting location will get the optimum score after some trials, However this Extension is not suitable for online learning because the player doesn’t expect that the AI suddenly gets worse (because it starts the hill climbing again with an initial value which can be a low value)
  • In Most Problems, we can’t even recognize the best solution when we find it, for example in an RTS game the best use of resources into construction and research, if we ran 200 trials we can’t guarantee that the best score had been scored in these trials, there’s no formula that tells us that this is the best solution, so we must just choose the most satisfying solution even if it might not be the best result.


  • Similar idea to Hill Climbing, and will be read later due to time constraints.
  1. Action Prediction

N-Gram Predictors

  • N-Grams are used in various statistical techniques and are not limited to prediction , they have applications in analysis of human languages
  • A 3-Gram –for example- will keep track of the number of times each sequence of 3 choices is seen. For prediction, the first 2 choices form the window and the probability is calculated by looking at the proportion of times each window is taken for the third choice.
  • Example :In This string  “LRRLRLLLRRLRLRR” The 3-Gram Probabilities :
Window/3rd choice ..R ..L
LL ½ ½
LR 3/5 2/5
RL ¾ ¼
RR 0/2 2/2
  • Increasing the window size increases the efficiency of the algorithm but decreases the prediction.
  • As the Sequence of Actions gets more random, the window size needs to be reduced.
  • In Predictions with more than 2 choices, the minimum window size will need to be increased a little, also bigger window size get much poorer prediction results.
  • There are mathematic al models that are used to tell you how well an N-gram predictor will predict a sequence and they are used to tune the optimal window size.
  • A Hash Table could be used for better memory performance especially for large number of choices.

Hierarchical N-Grams

  • Several N-Gram algorithms work in parallel, each with increasingly large window sizes, A Hierarchical 3-Gram will have regular 1-Gram, 2-Gram and 3-Gram algorithms working on the same data.
  • A sequence of “LRR” is passed to hierarchical 3-Gram, gets registered in the 3-Gram, the “RR” portion gets registered in the 2-gram, and “R” gets registered in the 1-Gram.
  • When a prediction is requested, the algorithm first looks up the window actions in the 3-Gram, if there have been sufficient examples of the window then it uses the 3-Gram to generate its prediction, otherwise it looks at the 2-Gram, likewise, the prediction may come from 1-Gram. If none of the N-Grams have sufficient Examples, then the algorithm returns no prediction or random prediction.
  • There is no single correct threshold value for the number of entries required for confidence. To some extent it needs to be found by trial and error. In Online learning, it’s common for the AI to make decisions based on very sketchy information, so the confidence threshold  is small (3 or 4)
  • The algorithm is O(n) in memory and O(n) in time where n is the highest number N-Gram used.
  • The most widespread application of N-Gram is in combat games.
  1. Decision Learning

Weak Vs Strong Supervision

  • In order to improve performance, we need to provide feedback to the learning algorithm. This feedback is called “supervision,”
  • Strong supervision takes the form of a set of correct answers; these correct answers are often provided by a human player. The developer may play the game for a while and have the AI watch. The AI keeps track of the sets of observations and the decisions that the human player makes. It can then learn to act in the same way.
  • Weak supervision doesn’t require a set of correct answers. Instead, some feedback is given as to how good its action choices are.
  • Strong supervision is easier to implement and get right, but it is less flexible: it requires somebody to teach the algorithm what is right and wrong. Weak supervision can learn right and wrong for itself, but is much more difficult to get right.
  • Each of the remaining learning algorithms in this chapter works with this kind of model. It has access to observations, and it returns a single action to take next. It is supervised either weakly or strongly.
  1. V.            Decision Tree Learning

ID3 Algorithm

  • Algorithm for generating Decision Trees from a set of rules
  • Example: The Input could be the following :

Healthy – In Cover – With Ammo: Attack

Hurt – In Cover – With Ammo: Attack

Healthy – In Cover – Empty: Defend

Hurt – In Cover – Empty : Defend

Hurt – Exposed – With Ammo: Defend

  • The Output will be the following Decision Tree:

  • The Details of the ID Algorithm will be discussed Later
  1. Reinforcement Learning


  • Reinforcement learning is the name given to a range of techniques for learning based on experience.
  • In its most general form a reinforcement learning algorithm has three components: an exploration strategy for trying out different actions in the game, a reinforcement function that gives feedback on how good each action is, and a learning rule that links the two together. Each element has several different implementations and optimizations, depending on the application.
  • One reinforcement learning technique is Q-learning which is a good start, easy to implement and can be tuned without a deep understanding of its theoretical properties.

Q-Learning – Introduction

  • Q-learning treats the game world as a state machine. At any point in time, the algorithm is in some state. The state should encode all the relevant details about the character’s environment and internal data.
  • In a game the states are made up of many factors: position, proximity of the enemy, health level, and so on. Q-learning doesn’t need to understand the components of a state. As far as the algorithm is concerned they can just be an integer value: the state number.
  • Q-learning is known as a model-free algorithm because it doesn’t try to build a model of how the world works. It simply treats everything as states. Algorithms that are not model-free try to reconstruct what is happening in the game from the states that it visits. Model-free algorithms, such as Q-learning, tend to be significantly easier to implement.
  • These four elements—the start state, the action taken, the reinforcement value, and the resulting state—are called the experience tuple, often written as (s, a, r, s).

How It Works

  • The experience tuple is split into two sections. The first two elements (the state and action) are used to look up a Q-value in the store. The second two elements (the reinforcement value and the new state) are used to update the Q-value based on how good the action was and how good it will be in the next state.
  • The update is handled by the Q-learning rule:

Q(s, a) = (1α) Q(s, a) +α(r +γ max (Q (s, a))),

Where α is the learning rate, and γ is the discount rate. Both are parameters of the algorithm.

  • The first component Q(s, a) is simply the current Q-value for the state and action.
  • The r value is the new reinforcement from the experience tuple
  • γ max(Q(s_, a_)), looks at the new state from the experience tuple. It looks at all possible actions that could be taken from that state and chooses the highest corresponding Q-value. This helps bring the success (i.e., the Q-value) of a later action back to earlier actions: if the next state is a good one, then this state should share some of its glory.
  • The discount parameter controls how much the Q-value of the current state and action depends on the Q-value of the state it leads to. A very high discount will be a large attraction to good states, and a very low discount will only give value to states that are near to success. Discount rates should be in the range [0, 1]. A value greater than 1 can lead to ever-growing Q-values, and the learning algorithm will never converge on the best solution.
  • So, in summary, the Q-value is a blend between its current value and a new value, which combines the reinforcement for the action and the quality of the state the action led to.

Exploration Strategy

  • It’s the policy for selecting which actions to take in any given state.
  • The basic Q-learning exploration strategy is partially random. Most of the time, the algorithm will select the action with the highest Q-value from the current state. The remainder, the algorithm will select a random action. The degree of randomness can be controlled by a parameter.

Convergence and Ending

  • If the problem always stays the same, and rewards are consistent (which they often aren’t if they rely on random events in the game), then the Q-values will eventually converge. Further running of the learning algorithm will not change any of the Q-values. At this point the algorithm has learned the problem completely. For very small toy problems this is achievable in a few thousand iterations, but in real problems it can take a vast number of iterations. In a practical application of Q-learning, there won’t be nearly enough time to reach convergence, so the Q-values will be used before they have settled down. It is common to begin acting under the influence of the learned values before learning is complete.


  • O(i) in time , where i is the number of iterations of learning
  • O(as) in memory where a is the number of actions, and s is the number of states per action, in case of using arrays
  • Using a Hash Table will decrease the order of the memory. However it will increase execution time

Tuning the Parameters

  • Will be discussed later

Choosing Rewards

  • Will be discussed later

Temporal-Difference learning

  • Will be discussed later

Applications and Limits

  • Reinforcement learning is most suitable for offline learning. It works well for problems with lots of different interacting components, such as optimizing the behavior of a group of characters or finding sequences of order-dependent actions. Its main strength is its ability to seamlessly handle uncertainty. This allows us to simplify the states exposed to it; we don’t have to tell the algorithm everything.
  • It is not suitable for problems where there is an easy way to see how close a solution is (we can use some kind of planning here), where there are too many states, or where the strategies that are successful change over time (i.e., it requires a good degree of stability to work).
  • It’s strong in board games and turn-based strategy games
  • It’s possible to use a neural network to act as a storing medium for Q-Values, the author is not aware of any developers who have used this combination successfully.
  1. Artificial Neural Networks


  • Developers who have experimented with neural networks for large-scale behavior control have been left in no doubt of the approaches weaknesses. The combined hype and disappointment has clouded the issue. AI-savvy hobbyists can’t understand why the industry isn’t using them more widely, and developers often see them as being useless and a dead end.
  • It is a huge subject, full of different kinds of network and learning algorithms specialized for very small sets of task. Very little neural network theory is applicable to games, however.
  • The Focus will be on only one type of neural networks, which is the multi-layer perceptron with one particular learning rule, which is the back-propagation algorithm. This type of neural network is the most common one that’s why the author chose it.


  • Will be discussed later


  • Will be discussed later


  • Will be discussed later


  • O (nw) in memory, where n is the number of perceptrons, and w is the number of inputs per perceptron.
  • In time, the performance is also O (nw).

Other Approaches: Radial Basis Function

  • Will be discussed later

Other Approaches: Weakly Supervised Learning

  • Will be discussed later

Other Approaches: Hebbian Learning

  • Will be discussed later

Posted in AI for Games | 2 Comments »

Paper Analysis : Adaptive reinforcement learning agents in RTS games 2008 – 24 June 2009

Posted by merothehero on June 25, 2009

Adaptive reinforcement learning agents in RTS games

-paper url:

-game implemented using:  2apl – agent programming language

-Uses Dynamic Scripting instead of static scripting which is used in most commercial games.

-uses BDI Agents: Beliefs, Desires and Intentions.

-Uses Monte-Carlo Learning Process.

-Implicit adaptation: adaptation from the beginning of the game by including opponent’s data

-Explicit adaptation: adaptation as a result of opponent’s actions.

-Integrates agent technology and reinforcement learning-

Future Research in Reinforcement Learning:

1-Faster Learning algorithms are wanted, in order to make agents learning from the entire environment (not only from a small portion of the environment as implanted).

2-Learning shall cover also low-level actions not only high-level strategy actions.

3-multi-task learning agents aren’t available

4-a strategy visualization tool would be added to our goals.

5-Explicit Player modeling (WHAT WE THOUGHT ABT)

6-The Framework can be a base for research in the following :

  • High-performance reinforcement learning and planning
  • multi-agent communication and negotiation
  • agent societies
  • Data mining and human player modeling.

Posted in Papers Summaries | 1 Comment »