The mechanism for finding an equilibrium solution. Duality in linear programming. Property of mutually dual problems

20.06.2020

Optimal strategies in conflict theory are considered to be those that lead players to stable equilibria, i.e. certain situations that satisfy all players.

The optimality of a solution in game theory is based on the concept equilibrium situation:

1) it is not beneficial for any of the players to deviate from the equilibrium situation if all the others remain in it,

2) the meaning of equilibrium - when the game is repeated many times, the players will reach a situation of equilibrium, starting the game in any strategic situation.

In each interaction, the following types of equilibria can exist:

1. equilibrium in careful strategies . Determined by strategies that provide players with a guaranteed result;

2. equilibrium in dominant strategies .

Dominant strategy is a plan of action that provides a participant with the maximum gain regardless of the actions of the other participant. Therefore, the equilibrium of dominant strategies will be the intersection of the dominant strategies of both participants in the game.

If the players' optimal strategies dominate all their other strategies, then the game has an equilibrium in the dominant strategies. In the prisoners' dilemma game, the Nash equilibrium set of strategies will be ("recognize - admit"). Moreover, it is important to note that for both player A and player B, “recognize” is the dominant strategy, while “not recognize” is the dominated one;

3. equilibrium Nash . Nash equilibrium is a type of decision in a game of two or more players in which no participant can increase the winnings by changing his decision unilaterally, when other participants do not change their decisions.

Let's say it's a game n persons in normal form, where is a set of pure strategies and is a set of payoffs.

When each player selects a strategy in the strategy profile, the player receives a win. Moreover, the winnings depend on the entire profile of strategies: not only on the strategy chosen by the player himself, but also on other people’s strategies. A strategy profile is a Nash equilibrium if changing one’s strategy is not beneficial to any player, that is, for any

A game can have a Nash equilibrium in both pure strategies and mixed ones.

Nash proved that if we allow mixed strategies, then in every game n players will have at least one Nash equilibrium.

In a Nash equilibrium situation, each player's strategy provides him with the best response to the other players' strategies;

4. Balance Stackelberg. Stackelberg model– a game-theoretic model of an oligopolistic market in the presence of information asymmetry. In this model, the behavior of firms is described by a dynamic game with complete perfect information, in which the behavior of firms is modeled using static games with complete information. Main feature The game is the presence of a leading company, which is the first to set the volume of production of goods, and the remaining companies are guided in their calculations by it. Basic prerequisites of the game:

· the industry produces a homogeneous product: the differences between the products of different companies are negligible, which means that the buyer, when choosing which company to buy from, is guided only by price;

· there are a small number of firms operating in the industry;

· firms set the quantity of products produced, and the price for it is determined based on demand;

· there is a so-called leader company, the production volume of which is used by other companies.

Thus, the Stackelberg model is used to find the optimal solution in dynamic games and corresponds to the maximum payoff of the players, based on the conditions that arise after the choice has already been made by one or more players. Stackelberg equilibrium.- a situation where none of the players can increase their winnings unilaterally, and decisions are made first by one player and become known to the second player. In the “prisoners' dilemma” game, the Stackelberg equilibrium will be achieved in the square (1;1) - “admit guilt” by both criminals;

5. Pareto optimality- a state of the system in which the value of each particular criterion describing the state of the system cannot be improved without worsening the position of other players.

The Pareto principle states: “Any change that does not cause loss, but which brings benefit to some people (in their own estimation), is an improvement.” Thus, the right to all changes that do not cause additional harm to anyone is recognized.

The set of Pareto optimal states of a system is called the “Pareto set”, “the set of Pareto optimal alternatives”, or the “set of optimal alternatives”.

The situation when Pareto efficiency is achieved is a situation when all the benefits from the exchange have been exhausted.

Pareto efficiency is one of the central concepts for modern economic science. Based on this concept, the first and second fundamental theorems of welfare are built.

One of the applications of Pareto optimality is the Pareto allocation of resources (labor and capital) in international economic integration, i.e. economic unification of two or more states. It is interesting that the Pareto distribution before and after international economic integration was adequately described mathematically (Dalimov R.T., 2008). The analysis showed that the added value of sectors and the income of labor resources move in the opposite direction in accordance with the well-known equation of thermal conductivity, similar to a gas or liquid in space, which makes it possible to apply the analysis methodology used in physics in relation to economic problems of migration of economic parameters.

Pareto optimum states that the welfare of society reaches its maximum, and the distribution of resources becomes optimal, if any change in this distribution worsens the welfare of at least one subject of the economic system.

Pareto-optimal market state- a situation where it is impossible to improve the position of any participant in the economic process without simultaneously reducing the well-being of at least one of the others.

According to the Pareto criterion (a criterion for the growth of social welfare), movement towards the optimum is possible only with such a distribution of resources that increases the welfare of at least one person without harming anyone else.

A situation S* is said to Pareto dominate a situation S if:

· for any player his payoff is S<=S*

· there is at least one player for whom his payoff in the situation is S*>S

In the "prisoners' dilemma" problem, the Pareto equilibrium, when it is impossible to improve the position of one of the players without worsening the position of the other, corresponds to the situation of the square (2;2).

Let's consider example 1:

Equilibria in dominant strategies No.

Nash equilibrium. (5.5) and (4.4). Since it is unprofitable for any of the players to individually deviate from the chosen strategy.

Pareto optimum. (5.5). Since the winnings of players when choosing these strategies are greater than the winnings when choosing other strategies.

Stackelberg equilibrium:

Player A makes the first move.

Selects his first strategy. B chooses the first strategy. A gets 5.

Chooses his second strategy. B chooses the second one. A gets 4.

5 > 4 =>

B makes the first move.

Selects his first strategy. A chooses the first strategy. B gets 5.

Chooses his second strategy. And he chooses the second one. B gets 4.

5 > 4 => Stackelberg equilibrium (5, 5)

Example 2.Modeling duopoly.

Let's consider the essence of this model:

Let there be an industry with two firms, one of which is a “leader firm”, the other is a “follower firm”. Let the product price be a linear function of the total supply Q:

P(Q) = a − bQ.

Let us also assume that firms' costs per unit of output are constant and equal to With 1 and With 2 respectively. Then the profit of the first firm will be determined formula

Π 1 = P(Q 1 + Q 2) * Q 1 − c 1 Q 1 ,

and the profit is second accordingly

Π 2 = P(Q 1 + Q 2) * Q 2 − c 2 Q 2 .

In accordance with the Stackelberg model, the first firm - the leader firm - at the first step assigns its output Q 1 . After this, the second firm - the follower firm - by analyzing the actions of the leader firm determines its output Q 2. The goal of both firms is to maximize their payment functions.

The Nash equilibrium in this game is determined by backward induction. Let's consider the penultimate stage of the game - the move of the second firm. At this stage, firm 2 knows the volume of optimal output of the first firm Q 1 * . Then the problem of determining the optimal output Q 2 * comes down to solving the problem of finding the maximum point of the payment function of the second company. Maximizing the function Π 2 with respect to the variable Q 2, counting Q 1 given, we find that the optimal output of the second firm

This is the follower firm's best response to the leader firm's choice of issue. Q 1 * . The leading company can maximize its payment function, taking into account the type of function Q 2*. Maximum point of function Π 1 in variable Q 1 on substitution Q 2* will be

Substituting this into the expression for Q 2 * , we get

Thus, in equilibrium, the leader firm produces twice as much output as the follower firm.

Optimal strategies in conflict theory are considered to be those that lead players to stable equilibria, i.e. certain situations that satisfy all players.

The optimality of a solution in game theory is based on the concept equilibrium situation:

1) it is not beneficial for any of the players to deviate from the equilibrium situation if all the others remain in it,

2) the meaning of equilibrium - when the game is repeated many times, the players will reach a situation of equilibrium, starting the game in any strategic situation.

In each interaction, the following types of equilibria can exist:

1. equilibrium in careful strategies . Determined by strategies that provide players with a guaranteed result;

2. equilibrium in dominant strategies .

Let's say it's a game n persons in normal form, where is a set of pure strategies and is a set of payoffs.

A game can have a Nash equilibrium in both pure strategies and mixed ones.

Nash proved that if we allow mixed strategies, then in every game n players will have at least one Nash equilibrium.

In a Nash equilibrium situation, each player's strategy provides him with the best response to the other players' strategies;

4. Balance Stackelberg. Stackelberg model– a game-theoretic model of an oligopolistic market in the presence of information asymmetry. In this model, the behavior of firms is described by a dynamic game with complete perfect information, in which the behavior of firms is modeled using static games with complete information. The main feature of the game is the presence of a leading firm, which is the first to set the volume of production of goods, and the remaining firms are guided in their calculations by it. Basic prerequisites of the game:

· there are a small number of firms operating in the industry;

· firms set the quantity of products produced, and the price for it is determined based on demand;

· there is a so-called leader company, the production volume of which is used by other companies.

5. Pareto optimality- a state of the system in which the value of each particular criterion describing the state of the system cannot be improved without worsening the position of other players.

The set of Pareto optimal states of a system is called the “Pareto set”, “the set of Pareto optimal alternatives”, or the “set of optimal alternatives”.

The situation when Pareto efficiency is achieved is a situation when all the benefits from the exchange have been exhausted.

Pareto efficiency is one of the central concepts for modern economic science. Based on this concept, the first and second fundamental theorems of welfare are built.

A situation S* is said to Pareto dominate a situation S if:

· for any player his payoff is S<=S*

· there is at least one player for whom his payoff in the situation is S*>S

Let's consider example 1.

Let us consider the mechanism for establishing market equilibrium, when, under the influence of changes in demand or supply factors, the market leaves this state. There are two main types of imbalance between supply and demand: excess and shortage of goods.

Excess(surplus) of a product is a market situation when the supply of a product at a given price exceeds the demand for it. In this case, competition arises between manufacturers, a struggle for buyers. The winner is the one who offers more profitable terms sales of goods. Thus, the market strives to return to a state of equilibrium.

Shortage goods - in this case, the quantity demanded for a product at a given price exceeds the quantity supplied of the product. In this situation, competition arises between buyers for the opportunity to purchase scarce goods. The one who offers the highest price for a given product wins. The increased price attracts the attention of manufacturers, who begin to expand production, thereby increasing the supply of goods. As a result, the system returns to a state of equilibrium.

Thus, the price performs a balancing function, stimulating the expansion of production and supply of goods during shortages and restraining supply, ridding the market of surpluses.

The balancing role of price is manifested through both demand and supply.

Suppose that the equilibrium established in our market was disrupted - under the influence of some factors (for example, income growth) there was an increase in demand, as a result of which its curve shifted from D1 V D2(Fig. 4.3 a), but the proposal remained unchanged.

If the price of a given product has not changed immediately after the shift in the demand curve, then following an increase in demand a situation will arise when, at the same price P1 quantity of goods that each buyer can now purchase (QD) exceeds the volume that can be offered at a given price by manufacturers of a given goods (QS). The amount of demand will now exceed the amount of supply of this product, which means the emergence of shortage of goods at the rate of Df = QD – Qs in this market.

A shortage of goods, as we already know, leads to competition between buyers for the opportunity to purchase a given product, which leads to an increase in market prices. According to the law of supply, sellers' response to an increase in price will be to increase the quantity supplied. On the chart this will be expressed by the movement of the market equilibrium point E1 along the supply curve until it intersects with the new demand curve D2 where a new equilibrium will be achieved of this market E2 s equilibrium quantity of goods Q2 and equilibrium price P2.

Rice. 4.3. Shift of the equilibrium price point.

Let's consider a situation where the equilibrium state is disrupted on the supply side.

Suppose that under the influence of some factors there was an increase in supply, as a result of which its curve shifted to the right from the position S1 V S2 and demand remained unchanged (Fig. 4.3 b).

Provided the market price remains at the same level (P1) an increase in supply will lead to excess goods in size Sp = Qs – QD. As a result, there is seller competition, leading to a decrease in market price (with P1 before P2) and growth in the volume of goods sold. This will be reflected on the graph by moving the market equilibrium point E1 along the demand curve until it intersects with the new supply curve, which will lead to the establishment of a new equilibrium E2 with parameters Q2 And P2.

Similarly, one can identify the effect on the equilibrium price and equilibrium quantity of goods of a decrease in demand and a decrease in supply.

IN educational literature four rules for the interaction of supply and demand are formulated.

1. An increase in demand causes an increase in the equilibrium price and equilibrium quantity of goods.

2. A decrease in demand causes a fall in both the equilibrium price and the equilibrium quantity of goods.

3. An increase in supply entails a decrease in the equilibrium price and an increase in the equilibrium quantity of goods.

4. A decrease in supply entails an increase in the equilibrium price and a decrease in the equilibrium quantity of goods.

Using these rules, you can find the equilibrium point for any changes in supply and demand.

The return of prices to the market equilibrium level can mainly be hampered by the following circumstances:

1) administrative regulation of prices\

2) monopolism producer or consumer, allowing them to maintain a monopoly price, which can be either artificially high or low.

| |

Topic 4. Game theory and interaction modeling.

1. Basic concepts of game theory.

2. Types of equilibrium: Nash equilibrium, Steckelberg equilibrium, Pareto-optimal equilibrium, equilibrium of dominant strategies.

3. Basic models of game theory.

Basic concepts of game theory.

Usage mathematical methods, which includes game theory, in the analysis of economic processes allows us to identify trends and relationships that remain hidden when using other methods and even obtain very unexpected results.

Note that game theory is one of the youngest mathematical disciplines. Its emergence as an independent branch of mathematics dates back to the mid-1950s, when the famous monograph by F. Neumann and O. Morgenstern “The Theory of Games and Economic Behavior” was published. The origins of game theory associated with the works of E. Porel (1921)."

By now, game theory has turned into a whole mathematical field, rich in interesting results and having a large number of practical recommendations and applications.

Let's consider the basic assumptions and concepts of the game model of interhuman interactions.

1. The number of interacting individuals is two. Individuals are called players. The concept of a player allows us to model social roles individual: seller, buyer, husband, wife, etc. A game is a simplified representation of the interactions of two individuals who have different or similar social roles, for example, buyer - seller, seller - seller, etc.

2. Each individual has a fixed set of behavior options, or alternatives. The number of behavior options for different players may not be the same.

3. Interpersonal interaction is considered implemented if both players simultaneously choose options for their behavior and act in accordance with them. A single act of human interaction is called the course of a game. The duration of the interaction act is assumed to be zero.

4. The course of the game is specified by two integers - the selected number of the behavior option (move) of the first player and the selected number of the behavior option (move) of the second player. The maximum possible number of different moves in the game is equal to the product of the total number of moves of the first player and the total number of moves of the second player.

5. Each interaction between individuals, or game move, receives its own serial number: 1, 2, 3, etc. The concept of “game move” (a pair of numbers) and “game move number” (one number) should not be confused. Interactions are assumed to occur regularly at regular intervals, so the game turn number indicates the length of time that given individuals interact with each other.

6. Each player strives to achieve the maximum value of some target indicator, which is called utility, or winnings. Thus, the player has the traits of an “economic man”. The player's payoff can be either positive or negative. A negative gain is also called a loss.

7. Each move of the game (a pair of alternatives chosen by the players) corresponds to a single pair of player wins. The dependence of players' winnings on the moves they choose is described by the game matrix, or payoff matrix. The rows of this matrix correspond to the alternatives (moves) of the first player, and the columns correspond to the alternatives (moves) of the second player. The elements of the game matrix are pairs of winnings corresponding to the corresponding row and column (player moves). The winnings of the first player (the first number in the cell of the game matrix) depend not only on his move (row number), but also on the move of the second player (column number). Therefore, before the interaction is implemented, the individual does not know the exact amount of his gain. In other words, the player’s choice of behavior is carried out under conditions of uncertainty, i.e. the player has the traits of an “institutional person.”

8. The player's strategy is a habitual pattern of behavior that the player follows when choosing an alternative behavior over a certain period of time. The player's strategy is determined by the probabilities (or frequencies) of choosing all possible behavior options. In other words, the player's strategy is a vector whose number of coordinates is equal to total number possible alternatives, and i-th coordinate equal to the probability (frequency) of choice i-th alternative. It is clear that the sum of the values of all coordinates given vector equal to one.

If a player chooses only one behavior option during the considered period of time, then the player’s strategy is called clean.

All coordinates of the corresponding pure strategy vector are equal to zero, except one, which is equal to one.

A strategy that is not pure is called mixed.

In this case, the player's strategy vector has at least two non-zero coordinates. They respond to active behavior options. A player following a mixed strategy alternates active behavioral options in accordance with the given probabilities (frequencies) of choice. In the following, for simplicity of presentation of the material, we will assume that the player always follows some pure strategy, i.e., during the period of time under consideration, he invariably chooses a single behavior option from a given set of alternatives.

An institutional person is characterized by the variability of his behavior, which depends on his internal state, life experience, external social environment etc. Within the framework of the game approach to the study of institutions, this property of an institutional person is expressed in the possibility of a player changing his strategy. If among the player’s strategies there was always an objectively better one, then he would invariably follow it and changing the strategy would be meaningless. But in real life a person usually considers several behavioral strategies. It is impossible to objectively single out the best among them. The game model of interhuman interactions allows us to study this feature of institutional behavior, since it covers a number of behavioral strategies that are not mutually exclusive and reflect various aspects behavior of an institutional person. Let's look at these behavior patterns.

Game matrix

First player	Second player

	6; 15	2; 13	3; 11
	1; 10	5; 14	4; 12
	4; 12	4; 13	3; 13

Distinguish solidary And non-solidarity behavior strategies. The first are most characteristic of the “institutional man”, and the second - of the “economic man”.

Non-solidarity behavior strategies are characterized by the fact that the individual chooses his behavior independently, while he either does not take into account the behavior of another individual at all, or, based on existing experience, assumes possible variant his behavior.

The main types of non-solidarity behavior include the following: irrational, careful, optimizing, deviant And innovative.

1) Irrational behavior. Let us denote the two strategies of the first player by A and B, respectively. Strategy A is said to be dominant with respect to strategy B if, for any move of the second player, the payoff of the first player corresponding to strategy A is greater than his payoff corresponding to strategy B. Thus, strategy B is objectively worse with respect to strategy A.

If strategy A can always be freely chosen by the player, then strategy B should never be chosen at all. If, nevertheless, strategy B is chosen by the first player, then his behavior in this case is called irrational. To identify a player’s irrational behavior, it is enough to analyze his payoff matrix: the payoff matrix of the other player is not used.

Note that the term “irrational behavior” is borrowed from neo classical theory. It only means that the choice of this strategy is certainly not the best in a situation where both players are in an antagonistic confrontation, characteristic of an “economic man.” But for an “institutional person” who enters into interpersonal interactions with other people, irrational behavior is not only possible, but may turn out to be the most reasonable course of action. An example of this is the Prisoners' Dilemma game.

2) Cautious behavior. “Institutional man,” unlike “economic man,” is not absolutely rational, i.e., he does not always choose the best behavior that maximizes gain. The limited rationality of “institutional man” is expressed in his inability to choose best option behavior due to a large number of alternatives, a complex algorithm for determining the optimal alternative, limited decision-making time, etc. At the same time, the concept of bounded rationality assumes that, given all the complexities of choice, a person is able to choose a fairly good alternative.

In the game approach to the study of institutions, the bounded rationality of the individual is illustrated by the careful behavior of the player.

Strategy of cautious behavior- this is a player’s strategy that guarantees him a certain amount of winnings regardless of the choice (move) of the other player. The cautious strategy is also called maximin because it is calculated by finding the maximum value from several minimum values.

The first player's cautious strategy is defined as follows. In each row of the matrix of his winnings, the minimum element is found, and then the maximum, or maximin, of the first player is selected from such minimum elements. The row of the game matrix on which the first player's maximin is located corresponds to his cautious strategy. The cautious strategy of the second player is similar. In each column of the matrix of its winnings, the minimum element is found, and then the maximum element is determined from such minimum elements. The column of the game matrix in which the second player's maximin is located corresponds to his cautious strategy. Each player may have several cautious strategies, but they all have the same meaning maximina (high-low strategy), or guaranteed winnings. Careful strategies exist in any matrix game. To identify a player's cautious strategy, it is enough to analyze his payoff matrix, without using the other player's payoff matrix. This feature is common to irrational and cautious behavior.

3) Optimizing behavior. In economic practice, situations often arise when economic agents (for example, a seller and a regular buyer), in the course of long-term interaction with each other, find strategies of behavior that suit both parties, and therefore are used by the “players” for a long period of time. In the game approach to the study of institutions, the described situation is modeled using the concept of equilibrium strategies. A pair of such strategies is characterized by the following property: if the first player deviates from his equilibrium strategy (chooses some other one), and the second continues to follow his equilibrium strategy, then the first player suffers damage in the form of a decrease in the amount of winnings. The cell of the game matrix located at the intersection of a row and a column corresponding to a pair of equilibrium strategies is called an equilibrium point. The game matrix may have several equilibrium points, or may not have them at all.

The behavior of a player following the equilibrium strategy is called optimizing ( minimax behavior or minmax strategy).

It is different from maximizing behavior. First, the player's equilibrium payoff is not the maximum of all possible payoffs. It corresponds not to a global maximum, but to a local optimum. Thus, the global maximum of a function defined on a numerical interval exceeds each of its local maxima. Secondly, following the equilibrium strategy by one player entails achieving a local maximum only if the other player maintains the equilibrium strategy. If the second player deviates from the equilibrium strategy, then the first player's continued use of the equilibrium strategy will not give him a maximizing effect.

Equilibrium strategies are determined by the following rule: a cell of the game matrix is considered equilibrium if the corresponding payoff of the first player is the maximum in the column, and the corresponding payoff of the second player is the maximum in the row. Thus, the algorithm for finding equilibrium strategies uses the payoff matrices of both players, and not one of them, as in the cases of irrational and cautious behavior.

4) Deviant behavior. The institutionalization of an equilibrium strategy as a basic norm of behavior occurs as a result of a person’s generalization of his experience of interpersonal interactions, including the experience of deviant behavior. Human awareness negative consequences Such behavior, based on the choice of nonequilibrium alternatives, is the decisive argument in choosing an optimizing behavior strategy. Thus, deviant behavior serves as an integral component of the life experience of an “institutional person”, serving as an empirical justification for optimizing behavior. The experience of deviant behavior gives a person confidence that the other participant in the game will invariably adhere to the equilibrium strategy. Thus, such experience serves as proof of the rationality of the other player’s behavior and the predictability of future interactions with him.

5) Innovative behavior. Above, deviant behavior was considered, the main purpose of which is to empirically substantiate and consolidate the original equilibrium strategy. However, the purpose of deviation from the equilibrium strategy may be fundamentally different. Innovative behavior is a systematic deviation from the usual equilibrium strategy in order to find another equilibrium state that is more profitable for the innovator.

Within the framework of the game model of interhuman interactions, the goal of innovative behavior can be achieved if the game matrix has a different equilibrium point, in which the payoff of the innovator player is greater than in the initial equilibrium state. If there is no such point, then innovative behavior will most likely be doomed to failure, and the innovator will return to the original equilibrium strategy. Moreover, his losses from the innovation experiment will be equal to the total effect of the deviation for the entire period of the experiment.

In real life, interacting individuals often agree to follow certain behavioral strategies in the future. In this case, the behavior of the players is called solidary.

The main reasons for solidarity behavior:

a) the benefit of solidarity behavior for both players. Within the framework of the game model of interaction, this situation is illustrated by a game matrix, in one cell of which the payoffs of both players are maximum, but at the same time it is not equilibrium and does not correspond to a pair of cautious strategies of the players. Strategies that correspond to this cell are unlikely to be chosen by players who implement non-solidarity models of behavior. But if the players come to an agreement on the choice of appropriate solidary strategies, then subsequently it will be unprofitable for them to violate the agreement, and it will be carried out automatically;

b) the ethics of solidarity behavior often serves as an “internal” mechanism to ensure compliance with the agreement. The moral costs in the form of social condemnation that an individual will incur if he violates an agreement may have an impact on him higher value than the increase in winnings achieved. The ethical factor plays an important role in the behavior of “institutional man,” but it is not actually taken into account in the game model of interhuman interactions;

c) enforcement of solidarity behavior serves as an “external” mechanism to ensure compliance with the agreement. This factor institutional behavior is also not adequately reflected in the game model of interactions.

Types of equilibrium: Nash equilibrium, Steckelberg equilibrium, Pareto-optimal equilibrium, equilibrium of dominant strategies.

In every interaction there may exist different kinds equilibria: equilibrium of dominant strategies, Nash equilibrium, Stackelberg equilibrium and Pareto equilibrium. A dominant strategy is a plan of action that provides a participant with maximum utility regardless of the actions of the other participant. Accordingly, the equilibrium of the dominant strategies will be the intersection of the dominant strategies of both participants in the game. Nash equilibrium is a situation in which each player's strategy is the best response to the other player's actions. In other words, this equilibrium provides the player with maximum utility depending on the actions of the other player. Stackelberg equilibrium occurs when there is a time lag in the decision-making of the participants in the game: one of them makes decisions already knowing what the other did. Thus, the Stackelberg equilibrium corresponds to the maximum utility of players in conditions of non-simultaneous decision-making by them. Unlike the equilibrium of dominant strategies and the Nash equilibrium, this type of equilibrium always exists. Finally, Pareto equilibrium exists under the condition that it is impossible to increase the utility of both players at the same time. Let us consider one example of the technology for searching for equilibria of all four types.

Dominant strategy- a plan of action that provides the participant with maximum utility, regardless of the actions of the other participant.

Nash equilibrium- a situation in which none of the players can increase their winnings unilaterally by changing their plan of action.

Stackelberg equilibrium- a situation where none of the players can increase their winnings unilaterally, and decisions are made first by one player and become known to the second player.

Pareto equilibrium- a situation where it is impossible to improve the position of any of the players without worsening the position of the other and without reducing the total winnings of the players.

Let firm A seek to break the monopoly of firm B on the production of a certain product. Firm A decides whether it should enter the market, and firm B decides whether it should reduce output if A decides to enter. In the case of constant output at firm B, both firms are losers, but if firm B decides to reduce output, then it “shares” its profit with A.

Equilibrium of dominant strategies. Firm A compares its payoff under both scenarios (-3 and O if B decides to start a price war) and (4 and 0 if B decides to reduce output). She does not have a strategy that ensures maximum gain regardless of B’s actions: 0 > -3 => “not enter the market” if B leaves output at the same level, 4 > 0 => “enter” if B reduces output (see .solid arrows). Although Firm A does not have a dominant strategy, Firm B does. She is interested in reducing output regardless of A's actions (4 > -2, 10 = 10, see dotted arrows). Consequently, there is no equilibrium of dominant strategies.

Nash equilibrium. The best response of firm A to firm B's decision to leave output the same is not to enter, and to the decision to reduce output is to enter. The best response of firm B to firm A's decision to enter the market is to reduce output; when deciding not to enter, both strategies are equivalent. Therefore, two Nash equilibria (A, A2) are located at points (4, 4) and (0, 10) - A enters and B reduces output, or A does not enter and B does not reduce output. It is quite easy to verify this, since at these points none of the participants are interested in changing their strategy.

Stackelberg equilibrium. Let's assume that firm A makes the first decision. If it chooses to enter the market, it will ultimately end up at point (4, 4): firm B's choice is clear in this situation, 4 > -2. If it decides to refrain from entering the market, then the result will be two points (0, 10): Firm B's preferences allow for both options. Knowing this, firm A maximizes its payoff at points (4, 4) and (0, 10), comparing 4 and 0. Preferences are unambiguous, and the first Stackelberg equilibrium StA will be at point (4, 4). Similarly, the Stackelberg equilibrium StB, when firm B makes the first decision, will be at point (0, 10).

Pareto equilibrium. To determine the Pareto optimum, we must sequentially go through all four outcomes of the game, answering the question: “Does switching to any other outcome of the game provide an increase in utility simultaneously for both participants?” For example, from outcome (-3, -2) we can move to any other outcome, fulfilling the specified condition. Only from the outcome (4, 4) we cannot move further without reducing the utility of any of the players, this will be the Pareto equilibrium, R.

In an antagonistic game, it is natural to consider the optimal outcome to be one in which it is unprofitable for either player to deviate from it. Such an outcome (x*,y*) is called an equilibrium situation, and the principle of optimality, based on finding an equilibrium situation, is called the equilibrium principle.

Definition. In a matrix game with a matrix of dimensions, the outcome is equilibrium situation or a saddle point if

At a saddle point, a matrix element is both a minimum in its row and a maximum in its column. In the game from example 2 element a 33 is a saddle point. The optimal strategies in this game are the third ones for both players. If the first player deviates from the third strategy, then he begins to win less than a 33. If the second player deviates from the third strategy, then he begins to lose more than a 33. Thus, there is nothing better for both players than to consistently stick to the third strategy.

Principle of optimal behavior: if there is a saddle point in a matrix game, then the optimal choice is the strategy corresponding to the saddle point. What happens if there is more than one saddle point in the game?

Theorem. Let two arbitrary saddle points in a matrix game. Then:

Proof. From the definition of an equilibrium situation we have:

Let us substitute , into the left side of inequality (2.8), and into the right side, , into the left side of inequality (2.9), and into the right side, . Then we get:

This implies the equality:

It follows from the theorem that the payoff function takes the same value in all equilibrium situations. That's why the number is called at the cost of the game. And strategies corresponding to any of the saddle points are called optimal strategies players 1 and 2, respectively. By virtue of (2.7), all optimal strategies of the player are interchangeable.

The optimal behavior of players will not change if the set of strategies in the game remains the same, and the payoff function is multiplied by a positive constant (or a constant number is added to it).

Theorem. For the existence of a saddle point (i*,j*) in a matrix game, it is necessary and sufficient that the maximin be equal to the minimax:

(2.10)

Proof. Necessity. If (i*,j*) is a saddle point, then, according to (2.6):

(2.11)

At the same time we have:

(2.12)

From (2.11) and (2.12) we obtain:

(2.13)

Reasoning similarly, we arrive at the equalities:

Thus,

On the other hand, the inverse inequality (2.5) always holds, so (2.10) turns out to be valid.

Adequacy. Let (2.10) be true. Let us prove the existence of a saddle point. We have:

According to equality (2.10), inequalities (2.15) and (2.16) turn into equalities. Then we have:

The theorem has been proven. Along the way, it was proven that general meaning maximin and minimax are equal to the price of the game.

Mixed Game Expansion

Consider a matrix game G. If there is an equilibrium situation in it, then the minimax is equal to the maximin. Moreover, each player can provide the other player with information about his optimal strategy. His opponent will not be able to derive any additional benefit from this information. Now suppose that there is no equilibrium situation in the game G. Then:

In this case, the minimax and maximin strategies are not sustainable. Players may have incentives to deviate from their cautious strategies due to the possibility of gaining more winnings, but also with the risk of losing, that is, getting a smaller win than when using a cautious strategy. When using risky strategies, transmitting information about them to the opponent has detrimental consequences: the player automatically receives a smaller payoff than when using a cautious strategy.

Example 3. Let the game matrix have the form:

For such a matrix, i.e. there is no equilibrium situation. The cautious strategies of the players are i*=1, j*=2. Let player 2 follow strategy j*=2, and player 1 choose strategy i=2. then the latter will receive payoff 3, which is two units more than maximin. If, however, player 2 guesses about player 1's plans, he will change his strategy to j=1, and then the first will receive a payoff of 0, that is, less than his maximin. Similar reasoning can be carried out for the second player. In general, we can conclude that the use of an adventurous strategy can bring a result greater than guaranteed in a separate game, but its use is associated with risk. The question arises, is it possible to combine a reliable cautious strategy with an adventurous one in such a way as to increase your average winnings? Essentially, the question is how to divide the winnings between the players (2.17)?

It turns out that a reasonable solution is to use a mixed strategy, that is, a random selection of pure strategies. Let us recall that Player 1's strategy is called mixed, if he selects the i-th row with a certain probability p i . This strategy can be identified with the probability distribution on many lines. Suppose that the first player has m pure strategies, and the second player has n pure strategies. Then their mixed strategies are probabilistic vectors:

(2.18)

Consider two possible mixed strategies for the first player from Example 3: . These strategies differ in the probability distributions between pure strategies. If in the first case the rows of the matrix are chosen by the player with equal probabilities, then in the second case - with different ones. When we talk about a mixed strategy, we mean random selection not a choice “at random”, but a choice based on the operation of a random mechanism that provides the probability distribution we need. Thus, tossing a coin is well suited for implementing the first of the mixed strategies. The player chooses the first line or the second depending on how the coin lands. On average, a player will choose both the first line and the second equally often, but the choice at a particular iteration of the game is not subject to any fixed rule and has the maximum degree of secrecy: until the implementation of the random mechanism, it is unknown even to the very first player. The mechanism of drawing lots is well suited for implementing the second mixed strategy. The player takes seven identical pieces of paper, marking three of them with a cross, and throws them into the hat. Then, at random, he pulls out one of them. According to classical probability theory, he will pull out a piece of paper with a cross with a probability of 3/7, and a blank piece of paper with a probability of 4/7. Such a drawing mechanism is capable of realizing any rational probabilities.

Let the players follow mixed strategies (2.18). Then the payoff of the first player at a particular iteration of the game is a random variable: v(X,Y). Since players choose strategies independently of each other, then, according to the probability multiplication theorem, the probability of choosing outcome (i, j) with a win is equal to the product of probabilities. Then the distribution law of the random variable v(X,Y) given by the following table

Now let the game play out indefinitely. Then the average payoff in such a game is equal to the mathematical expectation of the value v(X,Y).

(2.19)

At the end, but enough large number iterations of the game, the average payoff will differ slightly from value (2.19).

Example 4. Calculate the average payoff (2.19) for the game from example 3 when players use the following strategies: . The payoff matrix and probability matrix look like this:

Let's find the average:

Thus, the average payoff (2.20) is intermediate between maximin and minimax.

Since for any pair of mixed strategies X and Y the average value of the game can be calculated, the problem arises of finding the optimal strategy. It's natural to start by exploring cautious strategies. The first player's careful strategy provides him with a maximin. The careful strategy of the second player does not allow the first to win more than the minimax. The most significant result in the theory of games with opposing interests is the following:

Theorem. Every matrix game has an equilibrium situation in mixed strategies. Proving this theorem is not easy. It is omitted in this course.

Consequences: The existence of an equilibrium situation means that maximin is equal to minimax, and therefore any matrix game has a price. The optimal strategy for the first player is the maximin strategy. The optimal strategy for the second one is minimax. Since the problem of finding optimal strategies has been solved, we say that any matrix game solvable on a variety of mixed strategies.

Solution to 2x2 game

Example 5. Solve the game. It is not difficult to verify that there is no saddle point. Let us denote the optimal strategy of the first player (x, 1-x) is a column vector, but for convenience we write it as a string. Let us denote the optimal strategy of the second player (y,1-y).

The payoff of the first player is a random variable with the following distribution:

v(x,y)	2	-1	-4	7
p	xy	x(1-y)	(1-x)y	(1-x)(1-y)

We find the average payoff per iteration of the first player - the mathematical expectation of a random variable v(x,y):

Let's transform this expression:

This mathematical expectation consists of a constant (5/7) and a variable part: 14(x-11/14)(y-8/14). If the value y different from 8/14, then the first player can always choose X in such a way as to make the variable part positive, increasing your winnings. If the value X different from 11/14, then the second player can always choose y in such a way as to make the variable part negative, reducing the first player's payoff. Thus, the saddle point is determined by the equalities: x*=11/14, y*=8/14.

2.5 Game solving

We will show how to solve such games using an example.

Example 6. Solve the game . We make sure that there is no saddle point. Let us denote the mixed strategy of the first player X=(x, 1-x) is a column vector, but for convenience we write it as a string.

Let the first player use strategy X, and the second player use his own j-th clean strategy. Let us denote the average payoff of the first player in this situation as . We have:

Let us depict the graphs of functions (2.21) on the segment .

The ordinate of a point located on any straight line segment corresponds to the winnings of the first player in a situation where he uses a mixed strategy (x,(1-x)), and the second player – the corresponding pure strategy. The guaranteed result of the first player is the lower envelope of the family of straight lines (broken ABC). Highest point this broken line (point B) is the maximum guaranteed result of player 1. The abscissa of point B corresponds to the optimal strategy of the first player.

Since the desired point B is the intersection of lines and , its abscissa can be found as a solution to the equation:

Thus, the optimal mixed strategy of the first player is (5/9, 4/9). The ordinate of point B is the cost of the game. It is equal to:

(2.22)

Note that the line corresponding to the second strategy of the second player passes above point B. This means that if the first player uses his optimal strategy, and player 2 uses the second, then the loss of the second increases compared to the use of strategies 1 or 3. Thus, the second the strategy should not participate in the optimal strategy of the second player. Player 2's optimal strategy should look like: . Pure strategies 1 and 3 of the second player, which have non-zero components in the optimal strategy, are usually called significant. Strategy 2 is called insignificant. From the figure above, as well as from equality (2.22), it is clear that when the first player uses his optimal strategy, the payoff of the second player does not depend on which of his essential strategies he uses. He can also apply any mixed strategy consisting of significant ones (in particular, the optimal one), and the winnings in this case will not change. A completely similar statement is true for the opposite case. If the second player applies his optimal strategy, then the payoff of the first player does not depend on which of his essential strategies he uses and is equal to the cost of the game. Using this statement, we find the optimal strategy of the second player.