Search and Planning#

  • Author: Johannes Maucher

  • Last Update: 10.10.2025

Introduction#

As described in the previous section a rational agent perceives in each time stamp a state of the environment and he has to decide which action to take in this state, such in the long-term it maximizes it’s performance measure.

When the best action to take is not immediately obvious the agent must plan ahead. I.e. for a given current state and a targeted goal state the agent has to plan a sequence of actions, which most efficiently leads from the current to the goal state. This process is called planning and the corresponding agent is a planning agent.

Two variants of planning agents are goal-based agents and utility-based agents.

For planning agents a solution is a sequence of actions, which leads from the perceived start-state to a goal-state.

In this section in addition to planning also the related task of optimization is discussed. In contrast to planning a solution of an optimization task is just a good or even the best state. The path to this state is not relevant.

Goal-based Agent#

For a goal-based agent one or more goals are given and the agent’s task is to plan a sequence from the perceived current state to one of the goal states. For example in navigation the start-location is the current state and the target location is the goal state and a path from the former to the latter must be calculated.

agentGoal.png


For goal-based agents uninformed and informed algorithms are distinguished. In the uninformed case the agents knows in each state only

  • the state itself and

  • the actions available in this state.

In the informed case the agent can estimate the proximity of a state to the goal-state. This estimate is also called heuristic and the corresponding algorithms heuristic search algorithms.

Utility-based Agent#

For utility-based agents a function to measure the utility of states must be available. The benefits of such a measure are:

  1. sometimes it is not possible to define a concrete goal state. If then a utility-function is available, the search target can be to find a good state, i.e. a state with a high utility, or even the best state (state with maximum utility).

  2. sometimes different goals are possible. If a utility-function is available in this case, planning can be focused to finding the goal state with the highest utility.

  3. In optimizsation problems the task is not to find the best sequence of actions from a current state to the goal state. However, one just have to determine a good state. In order to measure the goodness of a state the utility-function is used.

agentGoal.png

Applications#

Navigation: \(\Rightarrow\) Uniform-Cost- / A*-Algorithm

https://maucher.home.hdm-stuttgart.de/Pics/bingsuche.PNG

Initial State: City (location) to start from.

Task: Find the best route from start- to destination city (location)


General Planning Problems:

https://maucher.home.hdm-stuttgart.de/Pics/screenshotStirbLangsam.png

Image from the action film Die Hard 3

Initial State: Jugs have a volume of \((5,3)\) and an initial fill level of \((v_1,v_2)=(0,0)\)

Task: How to decant jugs, such that their fill level is \((v_1^*,v_2^*)=(4,0)\)?


8-Puzzle: \(\Rightarrow\) Breadth-First-Algorithm

https://maucher.home.hdm-stuttgart.de/Pics/8puzzleExample.png

Initial State: Puzzle with arbitrarily ordered plates on it.

Task: Move the plates such that their numbers are in increasing order


Boardgames (Chess) \(\Rightarrow\) MinMax-Algorithm

https://maucher.home.hdm-stuttgart.de/Pics/ibmDeepBlue.PNG

Initial State: Current configuration on the chess board

Task: Plan actions, such that you win.


Go: \(\Rightarrow\) Monte-Carlo-Tree-Search

https://maucher.home.hdm-stuttgart.de/Pics/goGame.png

Initial State: Current configuration on the Go board

Task: Plan actions, such that you win.


Constraint Satisfaction Problem:

Coloring Problem:

https://maucher.home.hdm-stuttgart.de/Pics/colormapTask.png

Task: Color regions with either red, green or blue, such that no pair of neighboring regions have the same color.


Sudoku: \(\Rightarrow\) Constraint-Satisfaction Problem (Backtracking)

https://maucher.home.hdm-stuttgart.de/Pics/sudoku.gif

Task: Fill idle locations with integers between 1 and 9, such that no column, no row and no field contains one number more than once.


Optimize Configurations: Here: find Printed Circuit Board Design, such that number of intersections is minimal:

https://maucher.home.hdm-stuttgart.de/Pics/PrintedCircuitBoard.jpeg

Task: Find Printed Circuit Board Design, such that number of intersections is minimal


General Logistics and Scheduling Problems: \(\Rightarrow\) Constraint Satisfaction Problem / Genetic Algorithm

https://maucher.home.hdm-stuttgart.de/Pics/genAlgLogistikLager.PNG https://maucher.home.hdm-stuttgart.de/Pics/genAlgLogistikPackung.PNG

Task: Load the containers such, that in the end the empty space is as small as possible and a bunch of contstraints is fulfilled.

Requirements#

  • The algorithms, discussed in this notebook require an environment, which is

    • Fully observable: The current state of the environment is completely observable

    • Deterministic: Given a state \(s_t\) and a posible action \(a_t\), the successor state \(s_{t+1}\) is unique.

    • Static: Environment does not change during agent’s thinking.

    • Discrete State-Space, Action-Space and time

The algorithms described here are offline and open-loop. Offline means that the agent executes actions only after finding the complete solution. Open-loop means that all actions of the solution are executed blind, without regarding any feedback, that may become available during action execution.

Problem Specification#

For each problem, which shall be solved by an approach of this notebook, the following must be specified:

  • What is a state?

  • What is the initial state? What is the goal-state?

  • If initial- and/or goal- state is hard to define a utility-function may do the job

  • What is an action? Which actions are available in which states?

  • What are the costs of actions?

  • For heuristic algorithms: How to estimate the utility of an arbitrary state

Performance Metrics for Search- and Planning-Algorithms#

  • Completeness: Algorithm guarantees to find a solution.

  • Optimal: Algorithm guarantees to find optimal solution.

  • Complexity: In terms of time and memory

  • Global: Solution is a path from intitial- to goal-state

  • Local: Solution is a good state

Algorithm Categories#

  • uninformed

  • heuristic (informed)

  • global

  • local

  • goal-based

  • utility-based

2-Player Games#

The algorithm described in this section is applicable for all games with the following characteristics:

  • 2 players

  • fully observable

  • deterministic

  • Zero-Sum

Examples: Checkers, Chess, Go, Reversi, Backgammon, …

MinMax-Algorithm#

Evaluation of states in level \(L\) (planning horizon) https://maucher.home.hdm-stuttgart.de/Pics/tree-0.png

  • Root node of the tree is the current state of the game. Player Max is in turn.

  • Max has two possible actions, the corresponding successor states are the states for which the other player, Min, must find a good action.

  • In each of the two states in Min’s level 2 actions are available. Again all successor states can be generated for the next level, which belongs to Max

In this way a tree can be calculated down to a level \(L\). This level \(L\) constitutes the planning horizon and depends on the affordable complexity.

Once all nodes in level \(L\) are generated a utility function is applied to all states in this level.

If level \(L-1\) belongs to Min he will choose in each state the action, which leads to a successor of minimum utility - since minimum utility for Max is maximum utility of Min. Therefore, to each node in level \(L-1\), the minimum utility of it’s successor nodes is assigned.

In the same way to each node in level \(L-2\), the maximum utility of it’s successor nodes in level \(L-1\) is assigned.

This process is repeated up to the root node. Player Max then knows his next action: The one, which yields to a successor with maximum utility-value.

Player Min selects actions to nodes with minimum utility for Player Max: https://maucher.home.hdm-stuttgart.de/Pics/tree-1.png

Player Max selects actions to nodes with maximum utility: https://maucher.home.hdm-stuttgart.de/Pics/tree-2.png

Player Min selects actions to nodes with minimum utility for Player Max: https://maucher.home.hdm-stuttgart.de/Pics/tree-3.png

Player Max selects actions to nodes with maximum utility: https://maucher.home.hdm-stuttgart.de/Pics/tree-4.png

The Tic-Tac-Toe example below, actually allow planning up to the terminate states. In this case the heuristic state evaluation function is replaced by a function which assigns \(1\) to win and \(-1\) to lost.

https://maucher.home.hdm-stuttgart.de/Pics/tttBewertet2.PNG

Alpha-Beta Pruning#

Alpha-Beta Pruning is an improvement of the MinMax algorithm in the sense that it reduces the number of nodes, that have to be evaluated and generated by applying a simple logic. The process is as follows:

  • Create tree with limited depth search from left to right

  • As in MinMax-search evaluate nodes in the deepest level by applying a domain-specific evaluation function

  • For each max-node store the maximum value of the underlying subtree traversed so far in the variable \(\alpha\).

  • For each min-node store the minimum value of the underlying subtree traversed so far in the variable \(\beta\).

  • If at a minimum node k the current value \(\beta\) is smaller than \(\alpha\), the search under k can be terminated. Here \(\alpha\) is the largest value of a maximum node in the path from the root to k.

  • If at a maximum node k the current value \(\alpha\) is larger than \(\beta\), the search under k can be terminated. Here \(\beta\) is the smallest value of a minimum node in the path from the root to k.

Demo of MinMax- and AlphaBeta-Pruning-Algorithm

A nice simulation of MinMax- and AlphaBeta-Pruning can be found here: https://raphsilva.github.io/utilities/minimax_simulator/#

https://maucher.home.hdm-stuttgart.de/Pics/alphabetademo.png

Some remarks on the complexity of real boardgames:

In chess:

  • the effective mean branching factor is \(b=35\)

  • the mean number of turns is 40, i.e. \(d=80\) halfturns

  • the number of leave-nodes of the complete tree would then be \(35^{80} \approx 10^{123}\).

Typically, for chess AlphaBeta-pruning with a depth of 8 halfturns is applied. Of course, the quality of the algorithm strongly depends on the evaluation function. Good evaluation functions for chess can be found here: https://www.chessprogramming.org/Evaluation.

For the boardgame GO: its even worse:

  • the effective mean branching factor is \(b=250\)

  • the mean number of turns is 75, i.e. \(d=150\) halfturns

  • it’s even more challenging to find a good evaluation function.

Complexity-estimates for other popular boardgames can be found here: https://de.wikipedia.org/wiki/Spiel-Komplexität.

Monte-Carlo Tree Search (MCTS)#

  • The concept of a limited planning horizon, as applied in the Min/Max-algorithm is one way to handle complex problems, which can not be planned to the goal-state in one step.

  • Another concept is to construct a tree until a final state in such a way that in each state only promising actions and corresponding successors are generated. This concept is applied by Monte Carlo Tree Search .

Monte-Carlo Tree Search:

https://maucher.home.hdm-stuttgart.de/Pics/mcts.png

Monte Carlo Tree Search:

  • Selection: Starting from the root node, apply Tree Policy in order to select next action. Tree Policy exploits node-statistics

  • Expansion: In a leaf-node select an arbitrary action and generate the corresponding new child-node.

  • Simulation: Simulate the game, starting from the new child-node. Action-Selection during simulation according to Default Policy, e.g.:

    • Randomly (Pure Monte Carlo)

    • Favor actions with higher estimated chance of success, if corresponding heuristics are available.

  • At the end of each simulation: Game is won or lost.

  • Backpropagation: Adapt node-statistics of all nodes in tree, starting from the leaf-node of previous simulation.

If the computational budget is reached MCTS returns with the best action a for the given root node.

Characteristics of MCTS:

  • Applicable even for games with very large branching factor

  • Basic Algorithm easy to implement

  • Configurable stop-time (Longer time yields better game play)

  • Doesn’t require domain knowledge

AlphaGo and AlphaZero#

  • AlphaGo is MCTS, which applies (Deep) Neural Networks in order to calculate good heuristics for

  • the next action to choose in the selection-phase

  • which leaf-node to extend in the expansion phase

  • which actions to choose in the simulation phase

  • In AlphaGo the networks for the heuristics in the selection- and the simulation-phase are trained from expert-moves

  • In AlphaZero no Domain-knowledge (database of expert moves) is required.

A nice summary of MCTS and AlphaGo can be found here: https://www.analyticsvidhya.com/blog/2019/01/monte-carlo-tree-search-introduction-algorithm-deepmind-alphago/

Constraint Satisfaction Problems (CSP)#

Problem Specification#

  • Set of variables:

\[ \mathcal{X}=\lbrace X_1, X_2, \ldots,X_n \rbrace \]
  • Domain \(D_i\) of a variable \(X_i\) is the set of possible values for \(X_i\).

  • Set of constraints:

\[ \mathcal{C}=\lbrace C_1, C_2,, \ldots,C_m \rbrace \]
  • Each constraint refers to a set of variables from \(\mathcal{X}\)

Problem Specification (continued):

  • State is a concrete assignment of values to a set of variables from \(\mathcal{X}\)

  • An assignment of values, which does not violate any constraint is called consistent.

  • Complete Assignment: If values are assigned to all variables

  • Solution of CSP: If a complete assignment is consistent.

  • Sometimes solutions that maximize a utility function are required.

Backtracking Algorithm#

  • Backtracking is a variant of deep-first-search, which assigns in each level values to only one variable.

  • Values are assigned to variables such that assignment is consistent

  • Target test: Check if assignment is complete and consistent

Local Search / Optimisation Algorithms#

Genetic Algorithm#

  • Realizes concept of Darwin’s Theory of Evolution

    • Population of individuals

    • Individuals are selected (randomly).

    • Selection probability depends on fitness of individuals

    • Selected individuals generate new individuals by crossover (randomly)

    • There may be random mutations in new individuals

    • Each individual has a fitness

    • Only the fittest survive

https://maucher.home.hdm-stuttgart.de/Pics/AblaufGenAlgEng.png https://maucher.home.hdm-stuttgart.de/Pics/GeneticAlg.png

Problem Specification:#

  • What is an individual (a state)?

  • Define random selection

  • Define function for crossover

  • Define function for mutation

  • Define fitness-function

Applications#

  • Genetic Algorithms are applied for a wide range of search- and optimisation problems.

  • Examples:

    • Best solution in the Daimler container packing project

    • Wireless Network optimisation

    • Neural network optimisation

    • Logistik and Scheduling problems of all types

    • Find optimal parameters for complex control systems (motor-control)

    • Gaming AI

      • Evolution of artificial agents

      • Procedural Content Generation

    • Traveling Salesman Problem

    • Vehicle Configuration Demo.

Exercises#

  1. Demonstrate how \(A^*\)-algorithm finds the best path from Frankfurt to Ulm, given the Map and the Line-of-Sight distances below:

https://maucher.home.hdm-stuttgart.de/Pics/AsternnaviTask.png

Solutions#