在Python中使用Alpha-Beta修剪的Minimax

Minimax with Alpha-Beta Pruning in Python

介绍

早在1920年代后期，约翰·冯·诺依曼(John Von Neumann)提出了博弈论中的主要问题，该问题直到今天仍然存在：

玩家s _{1 ，s _{2 ，...，s _{n 正在玩给定游戏G。玩家s _{m应该采取哪些动作努力获得最佳结果-}}}}

此后不久，这种问题就成为当今计算机科学中最流行的领域之一-人工智能的发展具有重大意义的挑战。人工智能在战略游戏方面取得了一些最伟大的成就-各种战略游戏中的世界冠军已经被计算机击败，例如在国际象棋，跳棋，西洋双陆棋，以及最近(2016年)的Go中。

尽管这些程序非常成功，但它们的决策方式与人类的决策方式却大不相同。这些程序大多数基于有效的搜索算法，并且自最近以来也基于机器学习。

Minimax算法是一种相对简单的算法，用于博弈论和人工智能中的最佳决策。同样，由于这些算法严重依赖效率，因此可以通过使用alpha-beta修剪来大幅提高vanilla算法的性能-我们将在本文中介绍这两者。

尽管我们不会分别分析每个游戏，但我们将简要解释一些与具有完善信息的两人非合作式零和对称游戏相关的一般概念-国际象棋，围棋，井字游戏，西洋双陆棋，黑白棋，棋子，曼卡拉，连续4个等...

您可能已经注意到，这些游戏都不是例如玩家不知道对手拥有哪些牌，或者玩家需要在哪里猜测某些信息。

定义术语

这些游戏中的许多规则是由法律地位(或法律状态)和每个法律地位的法律动作定义的。对于每个法律职位，都可以有效地确定所有法律举动。一些法律职位是起点，有些是终点。

描述这些术语的最佳方法是使用树图，其节点是合法位置，其边缘是合法移动。该图是有向的，因为它不一定意味着我们将能够准确地移回上一步中的原始位置，例如在国际象棋中，棋子只能前进。该图称为游戏树。在游戏树上向下移动表示其中一位玩家进行了移动，并且游戏状态从一个合法位置更改为另一个合法位置。

这是井字游戏的游戏树的图示：

>

<center> <script src=

蓝色的网格是玩家X的转弯，红色的网格是玩家O的转弯。终点位置(树的叶子)是其中一位玩家获胜或棋盘已满且没有获胜者的任何网格。

完整游戏树是一种游戏树，其根是开始位置，所有叶子都是结束位置。每个完整的游戏树都有尽可能多的节点，每个合法的举动都会使游戏产生可能的结果。容易注意到，即使对于井字游戏这样的小型游戏，完整的游戏树也是巨大的。因此，在编写程序时应明确预测整个动作的最佳方法是将整个游戏树显式地创建为结构不是一个好习惯。但是，应该在访问过程中隐式创建节点。

我们将游戏的状态空间复杂度定义为从游戏起始位置可到达的合法游戏位置的数量，并将分支因子定义为每个节点上的子代数量(如果该数量不是恒定的，则很常见) 练习使用平均值)。

对于井字游戏，状态空间大小的上限是39 = 19683。想象一下像国际象棋这样的游戏的数字！因此，无论何时转弯，都要搜索整棵树以找出最好的动作，这将是超级低效且缓慢的。

这就是为什么Minimax在博弈论中如此重要的原因。

Minimax背后的理论

Minimax算法依靠系统搜索或更准确地说-蛮力和简单的评估功能。假设每次决定下一步时，我们都会搜索整棵树，一直到叶子。有效地，我们将调查所有可能的结果，并且每一次我们都能够确定最佳的方法。

但是，对于非平凡的游戏，这种做法是不适用的。甚至搜索到一定深度有时也会花费不可接受的时间。因此，Minimax借助适当的试探法和设计良好但简单的评估功能，将搜索应用于相当低的树深度。

通过这种方法，我们无法找到可能的最佳动作，但是大多数情况下，maxmax做出的决定要比任何人都要好得多。

现在，让我们仔细看看前面提到的评估函数。为了确定某个玩家的好举(不一定是最好的举棋)，我们必须以某种方式评估节点(位置)，以便能够根据质量将它们进行比较。

评估函数是一个静态数字，根据游戏本身的特性，该数字被分配给每个节点(位置)。

值得一提的是，评估功能不得依赖于先前节点的搜索，也不得依赖于后续节点的搜索。它应该简单地分析两个玩家都处于的游戏状态和情况。

评估功能必须包含尽可能多的相关信息，但另一方面-由于已多次计算，因此它必须简单。

通常，它将所有可能位置的集合映射到对称段中：

$$
mathcal {F}：mathcal {P}
ightarrow [-M，M]
$$

M的值仅分配给获胜者是第一位玩家的叶子，而值-M的值分配给获胜者为第二位球员的叶子。

在零和游戏中，评估函数的值具有相反的含义-对第一个玩家来说更好的是对第二个玩家来说更糟，反之亦然。因此，对称位置的值(如果玩家切换角色)应该仅以符号不同。

一种常见的做法是通过减去该精确叶子的深度来修改叶子的评估，以便在导致胜利的所有动作中，算法可以以最少的步数选择执行该动作的人(或选择推迟的动作) 如果不可避免的话)。

这是Minimax步骤的简单说明。在这种情况下，我们正在寻找最小值。

绿色层在子节点上的节点上调用Max()方法，红色层在子节点上调用Min()方法。

评估叶子：

>

<li>
使用深度3确定绿色球员的最佳举动：
</li>

<img src=

另一方面，如果我们看一下国际象棋，我们将通过对整个游戏树的强行逼迫很快意识到解决国际象棋的不切实际性。为了证明这一点，克劳德·香农(Claude Shannon)计算了国际象棋游戏树复杂度的下限，得出了大约10120种可能的游戏。

这个数字到底有多大？作为参考，如果将电子质量(10-30kg)与整个已知宇宙的质量(1050-1060kg)进行比较，则该比率约为1080-1090。

那是香农数的?0.0000000000000000000000000000000000000001％

想象一下，让一个算法执行这些组合中的每个组合只是为了做出单个决定。这几乎是不可能的。

即使经过十步移动，可能的游戏数量仍然非常庞大：

让我们以井字游戏为例。您可能已经知道，玩家X最著名的策略是从任何一个角落开始，这为玩家O提供了最大的犯错机会。如果玩家O除了中锋之外还扮演其他任何角色，并且X继续执行其最初的策略，这将是X的保证胜利。开局就是这样-一些很好的方法可以在一开始就欺骗对手以获得优势，或者最好是获得胜利。

为了简化代码并深入了解算法的核心，在下一章的示例中，我们将不费吹灰之力地使用开书或任何技巧。我们将从一开始就让minimax搜索，因此不要感到算法永远不会推荐转弯策略。

Python中的Minimax实现

在下面的代码中，我们将使用一个评估函数，该评估函数对于所有游戏都非常简单且通用，在其中可以搜索整棵树，一直到树叶。

它具有3个可能的值：

如果寻求最少胜利的玩家获胜，则为-1

如果是平局，则为0

如果寻求最大胜利的玩家为1

由于我们将通过井字游戏来实现这一目标，因此让我们来看看构建模块。首先，让我们构造一个构造器并画出木板：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

# We'll use the time module to measure the time of evaluating
# game tree in every move. It's a nice way to show the
# distinction between the basic Minimax and Minimax with
# alpha-beta pruning :)
import time

class Game:
def __init__(self):
self.initialize_game()

def initialize_game(self):
self.current_state = [['.','.','.'],
['.','.','.'],
['.','.','.']]

# Player X always plays first
self.player_turn = 'X'

def draw_board(self):
for i in range(0, 3):
for j in range(0, 3):
print('{}|'.format(self.current_state[i][j]), end="")
print()
print()

我们已经在文章的开头部分讨论了法律措施。为了确保我们遵守规则，我们需要一种方法来检查此举是否合法：

1
2
3
4
5
6
7
8

# Determines if the made move is a legal move
def is_valid(self, px, py):
if px < 0 or px > 2 or py < 0 or py > 2:
return False
elif self.current_state[px][py] != '.':
return False
else:
return True

然后，我们需要一种简单的方法来检查游戏是否结束。在井字游戏中，玩家可以通过在水平，对角线或垂直线上连接三个连续的符号来获胜：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

# Checks if the game has ended and returns the winner in each case
def is_end(self):
# Vertical win
for i in range(0, 3):
if (self.current_state[0][i] != '.' and
self.current_state[0][i] == self.current_state[1][i] and
self.current_state[1][i] == self.current_state[2][i]):
return self.current_state[0][i]

# Horizontal win
for i in range(0, 3):
if (self.current_state[i] == ['X', 'X', 'X']):
return 'X'
elif (self.current_state[i] == ['O', 'O', 'O']):
return 'O'

# Main diagonal win
if (self.current_state[0][0] != '.' and
self.current_state[0][0] == self.current_state[1][1] and
self.current_state[0][0] == self.current_state[2][2]):
return self.current_state[0][0]

# Second diagonal win
if (self.current_state[0][2] != '.' and
self.current_state[0][2] == self.current_state[1][1] and
self.current_state[0][2] == self.current_state[2][0]):
return self.current_state[0][2]

# Is whole board full?
for i in range(0, 3):
for j in range(0, 3):
# There's an empty field, we continue the game
if (self.current_state[i][j] == '.'):
return None

# It's a tie!
return '.'

我们与之对抗的AI正在寻求两件事-最大化自己的得分并最小化我们的得分。为此，我们将提供AI使用的max()方法来做出最佳决策。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

# Player 'O' is max, in this case AI
def max(self):

# Possible values for maxv are:
# -1 - loss
# 0 - a tie
# 1 - win

# We're initially setting it to -2 as worse than the worst case:
maxv = -2

px = None
py = None

result = self.is_end()

# If the game came to an end, the function needs to return
# the evaluation function of the end. That can be:
# -1 - loss
# 0 - a tie
# 1 - win
if result == 'X':
return (-1, 0, 0)
elif result == 'O':
return (1, 0, 0)
elif result == '.':
return (0, 0, 0)

for i in range(0, 3):
for j in range(0, 3):
if self.current_state[i][j] == '.':
# On the empty field player 'O' makes a move and calls Min
# That's one branch of the game tree.
self.current_state[i][j] = 'O'
(m, min_i, min_j) = self.min()
# Fixing the maxv value if needed
if m > maxv:
maxv = m
px = i
py = j
# Setting back the field to empty
self.current_state[i][j] = '.'
return (maxv, px, py)

但是，我们还将包括一个min()方法，该方法将作为我们的助手，以最小化AI的得分：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

# Player 'X' is min, in this case human
def min(self):

# Possible values for minv are:
# -1 - win
# 0 - a tie
# 1 - loss

# We're initially setting it to 2 as worse than the worst case:
minv = 2

qx = None
qy = None

result = self.is_end()

if result == 'X':
return (-1, 0, 0)
elif result == 'O':
return (1, 0, 0)
elif result == '.':
return (0, 0, 0)

for i in range(0, 3):
for j in range(0, 3):
if self.current_state[i][j] == '.':
self.current_state[i][j] = 'X'
(m, max_i, max_j) = self.max()
if m < minv:
minv = m
qx = i
qy = j
self.current_state[i][j] = '.'

return (minv, qx, qy)

最后，让我们进行一个游戏循环，让我们可以与AI对抗：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

def play(self):
while True:
self.draw_board()
self.result = self.is_end()

# Printing the appropriate message if the game has ended
if self.result != None:
if self.result == 'X':
print('The winner is X!')
elif self.result == 'O':
print('The winner is O!')
elif self.result == '.':
print("It's a tie!")

self.initialize_game()
return

# If it's player's turn
if self.player_turn == 'X':

while True:

start = time.time()
(m, qx, qy) = self.min()
end = time.time()
print('Evaluation time: {}s'.format(round(end - start, 7)))
print('Recommended move: X = {}, Y = {}'.format(qx, qy))

px = int(input('Insert the X coordinate: '))
py = int(input('Insert the Y coordinate: '))

(qx, qy) = (px, py)

if self.is_valid(px, py):
self.current_state[px][py] = 'X'
self.player_turn = 'O'
break
else:
print('The move is not valid! Try again.')

# If it's AI's turn
else:
(m, px, py) = self.max()
self.current_state[px][py] = 'O'
self.player_turn = 'X'

让我们开始游戏吧！

1
2
3
4
5
6

def main():
g = Game()
g.play()

if __name__ =="__main__":
main()

现在，我们来看看遵循建议的转弯顺序会发生什么-即，我们发挥最佳状态：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

.| .| .|
.| .| .|
.| .| .|

Evaluation time: 5.0726919s
Recommended move: X = 0, Y = 0
Insert the X coordinate: 0
Insert the Y coordinate: 0
X| .| .|
.| .| .|
.| .| .|

X| .| .|
.| O| .|
.| .| .|

Evaluation time: 0.06496s
Recommended move: X = 0, Y = 1
Insert the X coordinate: 0
Insert the Y coordinate: 1
X| X| .|
.| O| .|
.| .| .|

X| X| O|
.| O| .|
.| .| .|

Evaluation time: 0.0020001s
Recommended move: X = 2, Y = 0
Insert the X coordinate: 2
Insert the Y coordinate: 0
X| X| O|
.| O| .|
X| .| .|

X| X| O|
O| O| .|
X| .| .|

Evaluation time: 0.0s
Recommended move: X = 1, Y = 2
Insert the X coordinate: 1
Insert the Y coordinate: 2
X| X| O|
O| O| X|
X| .| .|

X| X| O|
O| O| X|
X| O| .|

Evaluation time: 0.0s
Recommended move: X = 2, Y = 2
Insert the X coordinate: 2
Insert the Y coordinate: 2
X| X| O|
O| O| X|
X| O| X|

It's a tie!

如您所知，与这种AI竞争是不可能的。如果我们假设玩家和AI都处于最佳状态，那么游戏将永远是平局。由于AI始终处于最佳状态，因此如果我们滑倒，我们会输掉。

请仔细查看评估时间，因为我们将在下一个示例中将其与算法的下一个改进版本进行比较。

Alpha-Beta修剪

1900年代中期，一些研究独立发现了Alpha-beta(?????)算法。 Alpha-beta实际上是使用启发式算法改进的minimax。当它确定比以前检查的动作差时，它将停止评估动作。此类移动无需进一步评估。

当添加到简单的minimax算法中时，它会提供相同的输出，但是会切断某些可能不会影响最终决策的分支-大大提高了性能。

主要概念是通过整个搜索保持两个值：

Alpha：玩家Max最好的探索选项

Beta：玩家Min最好的最佳选择

最初，alpha是负无穷大，而beta是正无穷大，即在我们的代码中，我们将对两个玩家使用最差的得分。

让我们看看如果应用alpha-beta方法，前一棵树的样子：

>


当搜索到第一个灰色区域(8)时，它将沿着最小化器的路径检查当前最佳的(具有最小值)已探索的选项，此时为7。由于8大于7，我们允许切断我们所在节点的所有其他子级(在这种情况下，没有任何子级)，因为如果我们进行该移动，对手将进行值为8的移动，这对我们而言比如果我们再采取行动，对手可能采取的任何行动。


一个更好的例子可能是关于下一个灰色。注意值为-9的节点。那时，最大化器路径上最佳的(具有最大值)探索选项是-4。由于-9小于-4，因此我们可以切断节点所在的所有其他子节点。


这种方法使我们可以忽略许多分支，这些分支导致的值对我们的决策无济于事，也不会以任何方式影响它。


考虑到这一点，让我们从以前修改<wyn>min()</wyn>和<wyn>max()</wyn>方法：

<div class=

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

def max_alpha_beta(self, alpha, beta):
maxv = -2
px = None
py = None

result = self.is_end()

if result == 'X':
return (-1, 0, 0)
elif result == 'O':
return (1, 0, 0)
elif result == '.':
return (0, 0, 0)

for i in range(0, 3):
for j in range(0, 3):
if self.current_state[i][j] == '.':
self.current_state[i][j] = 'O'
(m, min_i, in_j) = self.min_alpha_beta(alpha, beta)
if m > maxv:
maxv = m
px = i
py = j
self.current_state[i][j] = '.'

# Next two ifs in Max and Min are the only difference between regular algorithm and minimax
if maxv >= beta:
return (maxv, px, py)

if maxv > alpha:
alpha = maxv

return (maxv, px, py)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

def min_alpha_beta(self, alpha, beta):

minv = 2

qx = None
qy = None

result = self.is_end()

if result == 'X':
return (-1, 0, 0)
elif result == 'O':
return (1, 0, 0)
elif result == '.':
return (0, 0, 0)

for i in range(0, 3):
for j in range(0, 3):
if self.current_state[i][j] == '.':
self.current_state[i][j] = 'X'
(m, max_i, max_j) = self.max_alpha_beta(alpha, beta)
if m < minv:
minv = m
qx = i
qy = j
self.current_state[i][j] = '.'

if minv <= alpha:
return (minv, qx, qy)

if minv < beta:
beta = minv

return (minv, qx, qy)

现在，游戏循环：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

def play_alpha_beta(self):
while True:
self.draw_board()
self.result = self.is_end()

if self.result != None:
if self.result == 'X':
print('The winner is X!')
elif self.result == 'O':
print('The winner is O!')
elif self.result == '.':
print("It's a tie!")

self.initialize_game()
return

if self.player_turn == 'X':

while True:
start = time.time()
(m, qx, qy) = self.min_alpha_beta(-2, 2)
end = time.time()
print('Evaluation time: {}s'.format(round(end - start, 7)))
print('Recommended move: X = {}, Y = {}'.format(qx, qy))

px = int(input('Insert the X coordinate: '))
py = int(input('Insert the Y coordinate: '))

qx = px
qy = py

if self.is_valid(px, py):
self.current_state[px][py] = 'X'
self.player_turn = 'O'
break
else:
print('The move is not valid! Try again.')

else:
(m, px, py) = self.max_alpha_beta(-2, 2)
self.current_state[px][py] = 'O'
self.player_turn = 'X'

玩游戏与以前相同，但是如果我们看一下AI寻找最佳解决方案所需的时间，则有很大的不同：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

.| .| .|
.| .| .|
.| .| .|

Evaluation time: 0.1688969s
Recommended move: X = 0, Y = 0

Evaluation time: 0.0069957s
Recommended move: X = 0, Y = 1

Evaluation time: 0.0009975s
Recommended move: X = 2, Y = 0

Evaluation time: 0.0s
Recommended move: X = 1, Y = 2

Evaluation time: 0.0s
Recommended move: X = 2, Y = 2

It's a tie!

经过几次测试并从头启动该程序后，下表中提供了比较结果：

结论

Alpha-Beta修剪在评估大型和复杂的游戏树方面有很大的不同。即使井字游戏本身就是一款简单的游戏，我们仍然可以注意到，如果没有alpha-beta启发式算法，该算法将如何花费大量时间来推荐第一轮比赛。