Adaptive AI Engine for RTS Games

Discussing the theory and practice

Archive for the ‘AI for Games’ Category

Summary of Artificial Intelligence for Games

On-line Planning for Resource Production in Real-Time Strategy Games -part5

Posted by ferasferas on November 5, 2010

Heuristic Scheduler

The previous component(sequential planner) finds a sequential plan. However, to accurately estimate the utility of any renewable resources, we need to reschedule actions to allow concurrency and decrease the makespan of the found plan. We do this by using a heuristic scheduling procedure that traverses the found action sequence in order. For each action Ai , the procedure moves the start time of Ai to the earliest possible time such that its preconditions are still satisfied.

Assume that Ai starts at time ts , and the state R+ (ts ) is the resource state at time ts after the effects of all actions that end at time ts are added to the game state, and R− (ts ) is the resource game state before the effects are added.

Obviously, the preconditions of Ai are satisfied by R+ (ts ). If they are also satisfied by R− (ts ), this means the satisfaction of the preconditions of Ai is not due to any of the actions that end at time ts , Then we can now move action Ai to start earlier than ts , to the previous decision epoch (time where an action starts or ends).This is repeated until the preconditions of A are satisfied by some R+ (ts ) but not R− (ts ), i.e., the satisfaction of the preconditions of A is due to the actions that end at time ts .

The plan is now rescheduled such that action A starts at time ts , and we can proceed to attempt to reschedule the next action in our sequential plan. The pseudo-code for this procedure is given in Algorithm 3.

We can show this procedure is sound using the following informal argument. We need to ensure that when we reschedule an action, every action between the new start time and the old start time remains executable.

Now when processing each action, we can always schedule it before a previous action if they do not consume or borrow the same resource:

1- Consider a pair of actions A and B, A before B, in a valid plan that both consume or borrow resource R, and assume we are about to reschedule B. First, if A and B are adjacent, the state before A must have enough R for A and B to execute. This means that at that state, A and B could be executed concurrently, or B could be scheduled before A, if possible.

2- On the other hand, if A and B were separated by any actions that produced R in order to satisfy B’s precondition, then our procedure would not shift B before the effects of those actions. If A and B are separated by actions not producing R, this effectively reduces to the adjacent case. Thus, this procedure is sound.

Future Work

While this procedure is not guaranteed to produce a plan with the optimal makespan, it is fast (at most quadratic in the number of actions) and performs well in practice. Investigating more complex scheduling algorithms for this problem is a topic of future work.

Posted in AI for Games, Papers Summaries, Planning, Technical Documents | Leave a Comment »

On-line Planning for Resource Production in Real-Time Strategy Games -part4

Posted by ferasferas on November 4, 2010

Sequential Planner

The first component of our online planner is a sequential planner which outputs a sequential plan to achieve the goal from the given initial state.

In principle, any off-the-shelf sequential planner can be used in this step. However, given the specific properties of our domain discussed earlier, a simple sequential planner based on means-ends analysis suffices.

MEA operates by selecting a subgoal to solve which will decrease the difference between the initial state and the goal state, and then executing the necessary actions to solve the sub-goal. Then from the new state which satisfies the subgoal the process is recursively applied until we reach the goal state. Notice that if one subgoal becomes unsolved while solving another, it will be resolved later, in the recursive step.


The pseudo-code is given in Algorithm 2. For simplicity this pseudo-code assumes that there is no resource that is both produced and consumed by a single action. It is straightforward to lift this assumption while still maintaining the polynomial-time guarantee outlined below.

First, means-ends analysis repeatedly picks an unsatisfied sub-goal Ri ≥ gi , and constructs a sub-plan Plan which satisfies it.

We will first solves all renewable resource goals before any non-renewable resource goals (because renewable resource goals, once solved, always stay solved using the monotonic increase property of renewable resources). In this ordering, every action added to the plan is necessary to solve the final goal.

Further, if we choose any other ordering of goals that necessitates revisiting previously solved goals, we will only generate permutations of the set of actions produced by the “canonical” ordering. This is because, if we revisit a goal, it must be because some actions used to solve that goal were “used up” by the preconditions of some other goal. Thus, we are effectively permuting the sequence of necessary actions if we choose a different ordering.

Since the plan found by MEA has the minimal set of actions, it consumes the minimal set of resources necessary to reach the goal. Finally, because each step of means-ends analysis adds at least one useful action to the plan, its running time is bounded by the minimum number of actions to the goal.

Notice that if the dependency graph between resources contains cycles, it is possible for means-ends analysis to get stuck in an infinite loop for certain initial states.

For example, in Wargus, collecting gold requires a townhall and borrows a peasant, while building a townhall or a peasant consumes certain amounts of gold. The presence of these cycles means there is a possibility that there is no plan to achieve a goal in some cases.

However, we can easily extend our algorithm to detect such cases if they happen. Further, as we have noted above, if the initial game state contains a peasant and a townhall, we can guarantee that there is always a plan no matter what the goal state is.

Posted in AI for Games, Papers Summaries, Planning, Technical Documents | Leave a Comment »

On-line Planning for Resource Production in Real-Time Strategy Games -part3

Posted by ferasferas on November 4, 2010

Quick Revision

Some key properties of our domain are:

1. Actions have durations.

2. There are multiple units, so actions can be executed concurrently.

3. Units and buildings can be created as the game progresses.

4. Many actions involve numeric fluents.

5. Solution plans typically involve a large number of actions compared to most standard planning benchmarks, and

6. In our setting, the planner must find a plan in real-time.

Planning Architecture

As the game proceeds, the world map and the planner’s action models may change, requiring fast re-planning with respect to the changed environment. In fact, a changing environment may make it impossible for the agent to execute a plan that is constructed offline ahead of time. Therefore, a planner which takes a long time to plan or does not provide a bound on its planning time is undesirable for our domain even if it may otherwise return the optimal plan. Instead, we aim to a planner which finds a good plan quickly while being able to scale with the number of resources, and the number of actions in the plan.

Characteristics of the Online-Planner:

1- To adapt to changing goals and environments, planner re-plans every certain Game Cycle with the current goal and game state.

2- To find a new plan, it carries out a bounded search over possible intermediate goals. The set of possible intermediate goals includes all states that have an extra renewable resource of every type. For each such goal, the planner employs means-ends analysis followed by a heuristic scheduling process to generate a plan to reach the overall goal via the intermediate goal.

3- To select an action to be executed, the planner chooses the plan with the smallest makespan. If this plan has any action that is executable at the current game state, that action (or actions) is started.


Notice that the plans generated by the planner are not usually completely executed when the planner replans at the next decision epoch using the game state at that point, it may not obtain a suffix of the plan it found at the current epoch. However, constructing such plans are valuable because they help in action selection at the current step.


Algorithm :

The pseudo-code of the main loop of the algorithm is shown in Algorithm 1. Every few game “cycles” (the time unit in Stratagus), the client checks if there are some available actions that can be executed. If so, it calls the planner to find plans which satisfy our goal from the current state. One plan will aim to satisfy the overall resource goal without any intermediate goals, while the others will aim to satisfy each of the intermediate goals which explicitly creates additional renewable resources. The plan with the shortest makespan is preferred, and actions in the plan that should start now are executed.

The planner consists of two main components:

A sequential planner that uses means-ends analysis to find a plan from game state S to goal G with the minimum number of actions, M EA(S, G), and

A heuristic scheduler which reorders actions in a sequential plan to allow concurrency and decrease its makespan, Schedule(Plan).

Next, we discuss the sequential planner and the heuristic scheduler components.

Posted in AI for Games, Papers Summaries, Planning, Technical Documents | Leave a Comment »

Introducing – “On-line Planning for Resource Production in RTS”

Posted by ferasferas on October 31, 2010

On-line Planning for Resource Production in Real-Time Strategy Games

Hei Chan, Alan Fern, Soumya Ray, Nick Wilson and Chris Ventura

School of Electrical Engineering and Computer Science

Oregon State University

Corvallis, OR 97330


Goal :

Develop action-selection mechanism in producing certain amount of resources as fast as possible.

Planner :

Computationally efficient “action-selection” mechanism which at each epoch creates a possibly Sub-Optimal concurrent plan from the current state to the goal, then begin executing the set of initial actions.

How it’s done :

formed via a combination of MEA(means-ends-analysis) and Scheduling and Bounded Search over sub-goals that aren’t required for goal achievements but may improve the make span.

Two Key problem domains :

-Resource production and tactical battles.

In Resource Production : player has to produce ( or gather ) varies raw materials, buildings, civilian and military Units to improve their economic and military power.

Tactical Battles : a player uses military units to gain territory and defeat enemy Units ( offensive or defensive actions ).

“Winning the Resource Production race is often a key factor in overall success”.

Uses :

1- either in computer opponent A.I.

2- Human can use it to specify resources needed and the Planner will get the best way to get this “RTS resource production is interesting from a pure A.I. Planning perspective as it encompasses a number of challenging issues”.

First, resource production involves temporal actions with numeric effects.

Second, performing well in this task requires highly concurrent activity.

Third, real-time constraints of the problem require action selection be computational efficient in apractical sense.

Why? :

Most existing planners are :

1- not handling temporal and numerical domains.

2- simply too inefficient to be useful.

3- produce highly Sub-Optimal plans.

The planning component used by online planner is based on a combination of means-ends analysis (MEA) and scheduling.

Given an initial state and resource goal, is used to compute a sequential plan that reaches the goal using the minimum number of actions and resources in sense that any valid plan must include all of the actions in the MEA plan.

Importantly, the special structure of our domain guarantees that MEA will produce such a plan and do so efficiently (linear time in the plan length).

Given the minimum sequential plan, we then reschedule those actions, allowing for concurrency, in an attempt to minimize the make span. This scheduling step is computationally hard, however, we have found that simple worst-case quadratic time heuristic methods work quite well. Thus both the MEA step and scheduling step are both low-order polynomial operations in the minimum number of actions required to achieve the goal, allowing for real-time efficiency.

Refrences :

MEA( means-ends analysis) :

Posted in AI for Games, Case-Based Planning, Case-Based Reasoning, Papers Summaries, Planning | 3 Comments »

Game AI

Posted by Ogail on August 20, 2009

Game AI

  • It is a common mistake to think that the more complex the AI in a game, the better the characters will look to the player. Creating good AI is all about matching the right behaviors to the right algorithms
  • Knowing when to be complex and when to stay simple is the most difficult element of the game AI programmer’s art. The best AI programmers are those who can use a very simple technique to give the illusion of complexity
  • You need to make sure that a characters’ AI matches their purpose in the game and the attention they’ll get from the player
  • Perceptions should only be what exactly AI character need not more
  • Take care of changing behaviors (2 solders conversion example – 63)
  • For in-game AI, behaviorism is the way to go. We are not interested in the nature of reality or mind; we want characters that look right. In most cases, this means starting from human behaviors and trying to work out the easiest way to implement them in software
  • Developing Game AI Could Use:
    • Hacks: ad-hoc solutions and neat effects
    • Heuristics: rules of thumb that only work in most, but not all, case
    • Algorithms (the “proper” stuff)
  • Developers rarely build a great new algorithm and then ask themselves, “So what can I do with this?” Instead, you start with a design for a character and apply the most relevant tool to get the result
  • Some if the AI doesn’t require an algorithm or a technique it only requires a simple bit of code that performs an animation at the right point
  • Human beings use heuristics all the time. We don’t try to work out all the consequences of our actions. Instead, we rely on general principles that we’ve found to work in the past (or that we have been brainwashed with, equally)
  • Problems with heuristics all the time: range units when shooting peons (65)
  • Common Heuristics:
    • Most constrained
      • For example, a group of characters come across an ambush. One of the ambushers is wearing phased force-field armor. Only the new, and rare, laser rifle can penetrate it. One character has this rifle. When they select who to attack, the most constrained heuristic comes into play: it is rare to be able to attack this enemy, so that is the action that should be taken
    • Do the most difficult first
      • For example, an army has two squads with empty slots. The computer schedules the creation of five Orc warriors and a huge Stone Troll. It wants to end up with balanced squads. How should it assign the units to squads? The Stone Troll is the hardest to assign, so it should be done first. If the Orcs were assigned first, they would be balanced between the two squads, leaving room for half a Troll in each squad, but nowhere for the Troll to go
    • Try the most promising first:
  • Hacks and heuristics will get you a long way, but relying on them solely means you’ll have to constantly reinvent the wheel. General bits of AI, such as movement, decision making, and tactical thinking, all benefit from tried and tested methods that can be endlessly reused (use algorithms here)
  • Just remember that for every situation where a complex algorithm is the best way to go, there are likely to be at least five where a simpler hack or heuristic will get the job done
  • One of the major reasons that new AI techniques don’t achieve widespread use is their processing time or memory requirements
  • Processor Issues in Game AI:
    • Complex AI that does work in games needs to be split into bite-size components that can be distributed over multiple frames. The chapter on resource management shows how to accomplish this. Applying these techniques to any AI algorithm can bring it into the realm of usability
    • SIMD:
      • Most modern CPUs have dedicated SIMD processing. SIMD (single instruction multiple data) is a parallel programming technique where a single program is applied to several items of data at the same time
      • Steering algorithms benefit from this feature
    • Multi-Core Processing and Hyper-Threading:
      • Modern processors have several execution paths active at the same time. Code is passed into the processor, dividing into several pipelines which execute in parallel. The results from each pipeline are then recombined into the final result of the original code. When the result of one pipeline depends on the result of another, this can involve backtracking and repeating a set of instructions. There is a set of algorithms on the processor that works out how and where to split the code and predicts the likely outcome of certain dependent operations; this is called branch prediction. This design of processor is called super-scalar
      • Normal threading is the process of allowing different bits of code to process at the same time. Since in a serial computer this is not possible, it is simulated by rapidly switching backward and forward between different parts of the code.
      • A multi-core processor effectively has multiple separate processing systems (each may be super-scalar in addition). Different threads can be assigned to different processor cores
    • Virtual Functions/ Indirection
      • Virtual functions add flexibility to the code but it’s very costly
  • Memory Concerns:
    • Cache:
      • If processors had to rely on the main RAM, they’d be constantly stalled waiting for data
      • All modern processors use at least one level of cache: a copy of the RAM held in the processor that can be very quickly manipulated. Cache is typically fetched in pages; a whole section of main memory is streamed to the processor. It can then be manipulated at will. When the processor has done its work, the cached memory is sent back to the main memory.
      • In my experience (author), dramatic speed ups can be achieved by making sure that all the data needed for one algorithm is kept in the same place
  • PC Constraints:
    • If the AI gets less time to work, how should it respond? It can try to perform less work. This is effectively the same as having more stupid AI and can affect the difficulty level of the game. It is probably not acceptable to your quality assurance (QA) team or publisher to have your game be dramatically easier on lower specification machines
  • Develop your AI code to be an AI Engine that could be reused
  • Try to develop tools that can generate you AI Code ( as in steering behaviors)
  • AI-Implant’s Maya module, for example, exposes complex Boolean conditions, and state machines, through graphical controls


Posted in AI for Games | 1 Comment »


Posted by Ogail on August 19, 2009


  • What is AI?
  • Many of trivial problems ( playing Connect 4) were solved by computers but there are many things that computers aren’t good at which we find trivial: recognizing familiar faces, speaking our own language, deciding what to do next, and being creative. These are the domain of AI: trying to work out what kinds of algorithms are needed to display these properties
  • In academia, some AI researchers are motivated by philosophy: understanding the nature of thought and the nature of intelligence and building software to model how thinking might work. Some are motivated by psychology: understanding the mechanics of the human brain and mental processes. Others are motivated by engineering: building algorithms to perform human-like task. Where game developers concerns with the last motivation
  • History of Academic AI:
    • The Early Days:
      • In the early days (before computers) some questions appeared (in philosophy of mind) as:
        • What produces thought?
        • Could you give life to an inanimate object?
        • What’s the difference between cadavers (جثة) and human it previously was?
      • Pioneers of the field these days were: Alan Turing (father of AI), von-Neumann, Shannon
    • The Symbolic Era:
      • From 1950s till 1980s main thrust in AI research was in “symbolic” systems
      • A symbolic system: is one in which the algorithm is divided into two components (as Expert Systems):
        • Set of knowledge: represented as symbols such as words, numbers, sentences, or pictures
        • Reasoning algorithm: that manipulates those symbols to create new combinations of symbols that hopefully represent problem solutions or new knowledge
      • Other symbolic approaches in games: blackboard architecture, pathfinding, decision trees, state machines, steering algorithms
      • Common disadvantage of symbolic systems: when solving a problem the more knowledge you have, the less work you need to do in reasoning
      • The more knowledge you have, the less searching for an answer you need; the more search you can do (i.e., the faster you can search), the less knowledge you need
    • The Natural Era:
      • From 1980s to 1990s frustration symbolic approaches come into two categories:
        • From engineering point:
          • early success on simple problems didn’t seem to scale to more difficult problems
        • From philosophical point:
          • Symbolic approaches are not biologically plausible (i.e. You can’t understand how a human being plans a route by using a symbolic route planning algorithm)
          • The effect was a move toward natural computing: techniques inspired by biology or other natural systems (like ANN, GA and simulated annealing)
  • The no-free-lunch theorem and subsequent work has shown that, over all problems, no single approach is better than any other
  • The narrower the problem domain you focus on, the easier it will be for the algorithm to shine. Which, in a roundabout way, brings us back to the golden rule of AI: search (trying possible solutions) is the other side of the coin to knowledge (knowledge about the problem is equivalent to narrowing the number of problems your approach is applicable to)
  • Game AI:
  • Till 1990 all computer-controlled characters used FSM
  • In 1997 the new technique included was ability to see colleagues and notify them when killed
  • In mid-1990s RTS games (Warcraft II) was the first time popular game having robust pathfinding implementation
  • The AI in most modern games addresses three basic needs:
    • The ability to move characters,
    • The ability to make decisions about where to move
    • The ability to think tactically or strategically
  • Model of Game AI:

  • Movement:
    • Movement refers to algorithms that turn decisions into some kind of motion
    • Examples (49):
      • Super Mario example when attacking enemies with bullets
      • Guard that want to reach alarm example
  • Decision Making:
    • Involves character working out what to do next
    • Examples: take the decision to attack, defend, patrol…
  • Strategy:
    • To coordinate whole team you need a strategic AI
    • In the context of this book, strategy refers to an overall approach used by a group of characters
    • In this category are AI algorithms that don’t control just one character, but influence the behavior of a whole set of characters
    • Example: surrounding a player in FPS Game
  • Infrastructure:
    • These are Information Gatherer (perception) and execution management issues
  • Agent-Based-AI:
    • agent-based AI is about producing autonomous characters that take in information from the game data, determine what actions to take based on the information, and carry out those actions
  • Techniques in this book are implemented into 3 categories: Algorithms, Data Structures, Game Infrastructure
  • Key elements to know when implementing algorithms:
    • Know the problem the algorithm want to solve
    • A general description of the working mechanism of the algorithm including diagrams
    • A pseudo-code presentation of the algorithm
    • Indication to the data structure used in the algorithm
    • Particular implementation node
    • Analysis of the algorithm performance: execution speed, memory footprint and scalability

Weaknesses in the approach

Posted in AI for Games | Leave a Comment »

Before Chapter # 1

Posted by Ogail on August 18, 2009

Before Chapter # 1

  • This book is an outworking of that experience. It doesn’t tell you how to build a sophisticated AI from the ground up. It gives you a huge range of simple (and not so simple) AI techniques that can be endlessly combined, re-used, and parameterized to generate almost any character behavior that you can conceive


Posted in AI for Games | Leave a Comment »

Learning – Artificial Intelligence for Games – A Summary

Posted by merothehero on July 28, 2009


A summary of selected topics in the chapter

From the Book: Artificial Intelligence for Games – 2006

Chapter 7

Author: Ian Millington

  1. I.            Introduction

Online or Offline

  • Majority of Learning is done offline (either between levels of the game or at the development studio)
  • Online Learning is while the game is playing.

Intra-Behavior Learning Algorithm

  • The simplest kind of learning that changes a small part of the character’s behavior
  • Good Example: learning how to evade, learning to find cover points in a room
  • Bad Example: if a character is trying to jump a wall it won’t tell him to use the stairs instead

Inter-Behavior Learning Algorithm

  • The character learns “new” things (not in a specific scope)
  • Example: A Character that learns to tie a rope across the street to stop an escaping motorbike
  • Unfortunately this kind of AI is almost pure fantasy.
  • It can be implemented by choosing to between a range of different behaviors, but still each distinct behavior will require implementation by the developer
  • It’s not economically good to learn everything, basic movement systems for example is easier and faster to be implemented directly.
  • Developers are experimenting on replacing the ordinary decision making systems (FSM, FUSM, Scripting, Decision Trees, and Markov Systems) with Learning Techniques


  • The biggest problem with learning is reproducibility and quality control.
  • Learning the Wrong thing is a big problem that industrial companies found
  • The more flexible your learning is the less control you have on your game-play
  • Normal solution to this is to limit the kind of things that can be learned in a game

Over-Learning (Over-fitting)

  • When the AI learns only from specific situations or experiences, but we normally want the AI to be able to generalize the learning.
  • Neural Networks for example can over-fit during learning if they are wrongly parameterized or if the network is too large for the learning task at hand. (From my Readings over-fitting happens because of local maximum points)

The Balance of Effort

  • Learning is done to decrease the implementation work ,because you don’t need to handle every event or trick happening to the character, instead it will learn to handle it itself.
  • Unfortunately, Learning often increases the implementation work, because they require a lot of work such as: presenting data in the correct way, making sure the results are valid and testing the AI to avoid it learning the wrong things.
  1. II.            Parameter Modification


  • Not considered a “learning” technique.
  • The simplest Learning technique is to calculate the value of one or more parameters such as: cost functions for path finding and possibilities in decision making.
  • Most commonly learning parameters like these is done offline, but can usually be controlled when performed online.
  • For each value of the parameter there is some energy value (or fitness value) which represents how good the value of the parameter is for the game.
  • Fitness landscape is used when we’re looking for maximum, while Energy Landscape is used when we’re looking for minimum. So they are just different terminologies.

Getting the score out of the parameter values

  • If the value can be generated from some function or formula, if it’s a simple mathematical formula then we can differentiate it so the best values can be found explicitly. In this case there’s no need for parameter optimization.
  • In most cases, this formula doesn’t exist, the only way to find out the value is to try the game and see how it performs and produce a fitness or energy score for the parameter.
  • We should have a mechanism to determine the value for a set of a parameters quickly such as: allowing the game to run at high speed without rendering the screen or use a set of heuristics that generate a value based on some assessment criteria without ever running the game, if there is no way to determine the value for the parameters other than running the game with the player, then all the techniques in the chapter are unlikely to be practical.

Hill Climbing

  • An Algorithm that tries to work out in what direction to change the parameter in order to improve its score.
  • It often fails to get the optimal solution if the score of the parameter values is random.
  • Its Performance is O(n) where n is the number of parameters while it’s O(1) in memory.

Extensions for Hill Climbing

  • There are Several extensions to support hill climbing with more random score:
    • Momentum: if the search is consistently improving in one direction, then it should continue in that direction for a little while, even if it seems that things aren’t improving anymore. It takes much longer to reach the best parameter value, but it doesn’t get stuck easily in a local maximum point on its way to the peak.
  • Adaptive Resolution: when a parameter is long away from the best value, taking small steps in learning is slow but if the steps are large, the best value could never be reached. This problem is solved by Adaptive Resolution, where long jumps are made earlier and small jumps later on, and as long as the score is improving the steps are bigger, when its stops improving the steps become smaller.
  • Multiple Trials: Hill Climbing is very dependent on the initial guess, so trying the algorithm from a different initial starting location will get the optimum score after some trials, However this Extension is not suitable for online learning because the player doesn’t expect that the AI suddenly gets worse (because it starts the hill climbing again with an initial value which can be a low value)
  • In Most Problems, we can’t even recognize the best solution when we find it, for example in an RTS game the best use of resources into construction and research, if we ran 200 trials we can’t guarantee that the best score had been scored in these trials, there’s no formula that tells us that this is the best solution, so we must just choose the most satisfying solution even if it might not be the best result.


  • Similar idea to Hill Climbing, and will be read later due to time constraints.
  1. Action Prediction

N-Gram Predictors

  • N-Grams are used in various statistical techniques and are not limited to prediction , they have applications in analysis of human languages
  • A 3-Gram –for example- will keep track of the number of times each sequence of 3 choices is seen. For prediction, the first 2 choices form the window and the probability is calculated by looking at the proportion of times each window is taken for the third choice.
  • Example :In This string  “LRRLRLLLRRLRLRR” The 3-Gram Probabilities :
Window/3rd choice ..R ..L
LL ½ ½
LR 3/5 2/5
RL ¾ ¼
RR 0/2 2/2
  • Increasing the window size increases the efficiency of the algorithm but decreases the prediction.
  • As the Sequence of Actions gets more random, the window size needs to be reduced.
  • In Predictions with more than 2 choices, the minimum window size will need to be increased a little, also bigger window size get much poorer prediction results.
  • There are mathematic al models that are used to tell you how well an N-gram predictor will predict a sequence and they are used to tune the optimal window size.
  • A Hash Table could be used for better memory performance especially for large number of choices.

Hierarchical N-Grams

  • Several N-Gram algorithms work in parallel, each with increasingly large window sizes, A Hierarchical 3-Gram will have regular 1-Gram, 2-Gram and 3-Gram algorithms working on the same data.
  • A sequence of “LRR” is passed to hierarchical 3-Gram, gets registered in the 3-Gram, the “RR” portion gets registered in the 2-gram, and “R” gets registered in the 1-Gram.
  • When a prediction is requested, the algorithm first looks up the window actions in the 3-Gram, if there have been sufficient examples of the window then it uses the 3-Gram to generate its prediction, otherwise it looks at the 2-Gram, likewise, the prediction may come from 1-Gram. If none of the N-Grams have sufficient Examples, then the algorithm returns no prediction or random prediction.
  • There is no single correct threshold value for the number of entries required for confidence. To some extent it needs to be found by trial and error. In Online learning, it’s common for the AI to make decisions based on very sketchy information, so the confidence threshold  is small (3 or 4)
  • The algorithm is O(n) in memory and O(n) in time where n is the highest number N-Gram used.
  • The most widespread application of N-Gram is in combat games.
  1. Decision Learning

Weak Vs Strong Supervision

  • In order to improve performance, we need to provide feedback to the learning algorithm. This feedback is called “supervision,”
  • Strong supervision takes the form of a set of correct answers; these correct answers are often provided by a human player. The developer may play the game for a while and have the AI watch. The AI keeps track of the sets of observations and the decisions that the human player makes. It can then learn to act in the same way.
  • Weak supervision doesn’t require a set of correct answers. Instead, some feedback is given as to how good its action choices are.
  • Strong supervision is easier to implement and get right, but it is less flexible: it requires somebody to teach the algorithm what is right and wrong. Weak supervision can learn right and wrong for itself, but is much more difficult to get right.
  • Each of the remaining learning algorithms in this chapter works with this kind of model. It has access to observations, and it returns a single action to take next. It is supervised either weakly or strongly.
  1. V.            Decision Tree Learning

ID3 Algorithm

  • Algorithm for generating Decision Trees from a set of rules
  • Example: The Input could be the following :

Healthy – In Cover – With Ammo: Attack

Hurt – In Cover – With Ammo: Attack

Healthy – In Cover – Empty: Defend

Hurt – In Cover – Empty : Defend

Hurt – Exposed – With Ammo: Defend

  • The Output will be the following Decision Tree:

  • The Details of the ID Algorithm will be discussed Later
  1. Reinforcement Learning


  • Reinforcement learning is the name given to a range of techniques for learning based on experience.
  • In its most general form a reinforcement learning algorithm has three components: an exploration strategy for trying out different actions in the game, a reinforcement function that gives feedback on how good each action is, and a learning rule that links the two together. Each element has several different implementations and optimizations, depending on the application.
  • One reinforcement learning technique is Q-learning which is a good start, easy to implement and can be tuned without a deep understanding of its theoretical properties.

Q-Learning – Introduction

  • Q-learning treats the game world as a state machine. At any point in time, the algorithm is in some state. The state should encode all the relevant details about the character’s environment and internal data.
  • In a game the states are made up of many factors: position, proximity of the enemy, health level, and so on. Q-learning doesn’t need to understand the components of a state. As far as the algorithm is concerned they can just be an integer value: the state number.
  • Q-learning is known as a model-free algorithm because it doesn’t try to build a model of how the world works. It simply treats everything as states. Algorithms that are not model-free try to reconstruct what is happening in the game from the states that it visits. Model-free algorithms, such as Q-learning, tend to be significantly easier to implement.
  • These four elements—the start state, the action taken, the reinforcement value, and the resulting state—are called the experience tuple, often written as (s, a, r, s).

How It Works

  • The experience tuple is split into two sections. The first two elements (the state and action) are used to look up a Q-value in the store. The second two elements (the reinforcement value and the new state) are used to update the Q-value based on how good the action was and how good it will be in the next state.
  • The update is handled by the Q-learning rule:

Q(s, a) = (1α) Q(s, a) +α(r +γ max (Q (s, a))),

Where α is the learning rate, and γ is the discount rate. Both are parameters of the algorithm.

  • The first component Q(s, a) is simply the current Q-value for the state and action.
  • The r value is the new reinforcement from the experience tuple
  • γ max(Q(s_, a_)), looks at the new state from the experience tuple. It looks at all possible actions that could be taken from that state and chooses the highest corresponding Q-value. This helps bring the success (i.e., the Q-value) of a later action back to earlier actions: if the next state is a good one, then this state should share some of its glory.
  • The discount parameter controls how much the Q-value of the current state and action depends on the Q-value of the state it leads to. A very high discount will be a large attraction to good states, and a very low discount will only give value to states that are near to success. Discount rates should be in the range [0, 1]. A value greater than 1 can lead to ever-growing Q-values, and the learning algorithm will never converge on the best solution.
  • So, in summary, the Q-value is a blend between its current value and a new value, which combines the reinforcement for the action and the quality of the state the action led to.

Exploration Strategy

  • It’s the policy for selecting which actions to take in any given state.
  • The basic Q-learning exploration strategy is partially random. Most of the time, the algorithm will select the action with the highest Q-value from the current state. The remainder, the algorithm will select a random action. The degree of randomness can be controlled by a parameter.

Convergence and Ending

  • If the problem always stays the same, and rewards are consistent (which they often aren’t if they rely on random events in the game), then the Q-values will eventually converge. Further running of the learning algorithm will not change any of the Q-values. At this point the algorithm has learned the problem completely. For very small toy problems this is achievable in a few thousand iterations, but in real problems it can take a vast number of iterations. In a practical application of Q-learning, there won’t be nearly enough time to reach convergence, so the Q-values will be used before they have settled down. It is common to begin acting under the influence of the learned values before learning is complete.


  • O(i) in time , where i is the number of iterations of learning
  • O(as) in memory where a is the number of actions, and s is the number of states per action, in case of using arrays
  • Using a Hash Table will decrease the order of the memory. However it will increase execution time

Tuning the Parameters

  • Will be discussed later

Choosing Rewards

  • Will be discussed later

Temporal-Difference learning

  • Will be discussed later

Applications and Limits

  • Reinforcement learning is most suitable for offline learning. It works well for problems with lots of different interacting components, such as optimizing the behavior of a group of characters or finding sequences of order-dependent actions. Its main strength is its ability to seamlessly handle uncertainty. This allows us to simplify the states exposed to it; we don’t have to tell the algorithm everything.
  • It is not suitable for problems where there is an easy way to see how close a solution is (we can use some kind of planning here), where there are too many states, or where the strategies that are successful change over time (i.e., it requires a good degree of stability to work).
  • It’s strong in board games and turn-based strategy games
  • It’s possible to use a neural network to act as a storing medium for Q-Values, the author is not aware of any developers who have used this combination successfully.
  1. Artificial Neural Networks


  • Developers who have experimented with neural networks for large-scale behavior control have been left in no doubt of the approaches weaknesses. The combined hype and disappointment has clouded the issue. AI-savvy hobbyists can’t understand why the industry isn’t using them more widely, and developers often see them as being useless and a dead end.
  • It is a huge subject, full of different kinds of network and learning algorithms specialized for very small sets of task. Very little neural network theory is applicable to games, however.
  • The Focus will be on only one type of neural networks, which is the multi-layer perceptron with one particular learning rule, which is the back-propagation algorithm. This type of neural network is the most common one that’s why the author chose it.


  • Will be discussed later


  • Will be discussed later


  • Will be discussed later


  • O (nw) in memory, where n is the number of perceptrons, and w is the number of inputs per perceptron.
  • In time, the performance is also O (nw).

Other Approaches: Radial Basis Function

  • Will be discussed later

Other Approaches: Weakly Supervised Learning

  • Will be discussed later

Other Approaches: Hebbian Learning

  • Will be discussed later

Posted in AI for Games | 2 Comments »