Arbitrable stochastic games: three variations of the heads-or-tails game for bitcoin

Cyril Grunspan; Ricardo Pérez-Marco

doi:10.1017/S0269964826100308

Arbitrable stochastic games: three variations of the heads-or-tails game for bitcoin

Part of: Stochastic processes Game theory Computer system organization

Published online by Cambridge University Press: 06 May 2026

Cyril Grunspan

and

Ricardo Pérez-Marco

Show author details

Cyril Grunspan*: Affiliation:
De Vinci Higher Education, De Vinci Research Center, Paris, France
Ricardo Pérez-Marco: Affiliation:
CNRS, IMJ-PRG, University Paris Cité, Paris, France
*: Corresponding author: Cyril Grunspan; Email: cyril.grunspan@devinci.fr

Article contents

Abstract
Introduction
Three examples of the coin toss game
Application to bitcoin
Competing interests
References

Rights & Permissions

Abstract

We introduce the concept of arbitrable stochastic games, which appears to be new. To do so, we consider a reward criterion different from the standard gamma-weighted criterion. This allows us to define the fair price to play a non-competitive stochastic game. We then illustrate the concept through three variations of the classical coin-toss game with chips, providing proofs via Doob’s theorem for supermartingales and practical algorithms. These examples deepen our understanding of the Bitcoin protocol.

Keywords

arbitrable stochastic games Bitcoin blockchain Markov decision processes Nakamoto consensus non-competitive games

MSC classification

Primary: 68M01: General

Secondary: 60G40: Stopping times; optimal stopping problems; gambling theory 91A60: Probabilistic games; gambling

Information

Type: Research Article
Information: Probability in the Engineering and Informational Sciences , First View , pp. 1 - 20

DOI: https://doi.org/10.1017/S0269964826100308 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2026. Published by Cambridge University Press.

1. Introduction

1.1. General introduction

Markov decision problems, central to decision theory, constitute a fundamental mathematical framework for modeling sequential decision-making under uncertainty. These models find widespread applications across diverse fields, including dynamic programming, quantitative finance, artificial intelligence, and robotics. Typically, these problems involve identifying an optimal policy—a decision strategy that maximizes a given performance criterion.

Often, this criterion is the so-called “gamma-weighted” criterion, which balances immediate and future rewards, and appears in disciplines such as econometrics (utility discounting models), finance (discounted cash flows), robotics, and reinforcement learning. However, the framework considered herein diverges from the classical optimization viewpoint. Instead of seeking an optimal policy, we focus on analyzing the structural properties of stochastic games—specifically, whether such games exhibit bias or balanced outcomes according to formal properties. In other words, our goal is not so much to optimize a given strategy, but rather to understand the intrinsic behavior of the game and identify potential biases that may advantage or disadvantage the player.

To illustrate this concept, we present three examples related to Bitcoin. First, by showing that a specific stochastic game is arbitrable, we resolve a concrete problem: determining the relative hash power threshold beyond which an otherwise honest miner has an incentive to mine on an accidental fork. Our analysis estimates this threshold at approximately 42.91%, a figure not clearly documented in existing Bitcoin literature. Second, we analyze a biased variation of the classical Heads or Tails game that sheds light on the vulnerability of the Bitcoin difficulty adjustment formula to certain attacks on Nakamoto’s consensus protocol. We derive, in a straightforward manner—without recourse to Markov decision process solvers—the threshold beyond which a rational miner with connectivity constraints benefits from deviating from honest mining behavior. Third, we prove that a certain stochastic game is non-arbitrable, demonstrating that the aforementioned problem can be addressed by modifying Bitcoin’s difficulty adjustment formula to factor in orphan blocks.

It is unsurprising that game theory has been employed to analyze the Bitcoin protocol. Indeed, Satoshi Nakamoto himself was the first to do so in his seminal paper, where he computed the success probability of a double-spending attack using the classical gambler’s ruin formula, which is a simple example of a casino game. Since then, numerous studies have advanced the understanding of such attacks [Reference Brown, Peköz and Ross6, Reference Georgiadis and Zeilberger11, Reference Grunspan and Pérez-Marco12]. For a broader context, readers are referred to a recent comprehensive book by M. Warren on Bitcoin and game theory [Reference Warren26]. An alternative approach applying mean field games to Bitcoin mining also exists [Reference Bertucci, Bertucci, Lasry and Lions3]. The remainder of this paper begins by recalling the formal definition of stochastic games.

1.2 Arbitrable stochastic game

Non-competitive stochastic game, also called Markov decision problem (MDP) is a single-player, discrete-time game with full observability. It can be described as a set of Markovian transitions—actions—on a set of states, which we assume here to be countable. Below, we recall the definition of a Markov decision problem and introduce the notations that will be used throughout this article. We also provide elementary proofs for general results.

Definition 1.1 (Non-competitive stochastic game)

A non-competitive stochastic game is defined by a quadruple $(S, A, P, R)$ and an initial state $s \in S$ where $S$ is a (countable) set of states, $A$ is a set of actions, $P$ is a transition probability, and $R$ is a reward function taking values in the real numbers $\mathbb{R}$.

Note that every transition gives a reward that can be positive or negative: $R\in\mathbb{R}$. More precisely, for each $s \in S$, the player has a subset $A(s)$ of the set $A$ of all possible actions. In what follows, actions will be denoted by Greek letters to distinguish them from states, which are denoted by Latin letters. When the player is in state $s \in S$ and chooses action $\alpha \in A(s)$, they reach state $s' \in S$ with probability $P(s' \mid s, \alpha)$ and receive a deterministic reward $R(s' \mid s, \alpha)$. Note that the reward function is slightly different from that found, for example, in [Reference Filar and Vrieze10] since it depends not only on the state $s$ and the choice of action $\alpha$ but also on the outcome $s'$ of the action $\alpha$.

We assume that $A(s)$ is measurable for every $s \in S$, and define a strategy $\mathfrak{f}$ as a family $(\mathfrak{f}_s)_{s \in S}$ where each $\mathfrak{f}_s$ is a probability measure on $A(s)$. We denote by $\mathcal{F}$ the set of all possible strategies. In other words, according to the strategy $\mathfrak{f} \in \mathcal{F}$, if the player is in state $s$, they randomly choose an action in $A(s)$ according to the probability measure $\mathfrak{f}_s$. For $\alpha \in A(s)$, we write

\begin{equation*} \mathfrak{f}(\alpha \mid s) = \mathfrak{f}_s(\alpha). \end{equation*}

This is the probability of choosing action $\alpha$ when in state $s$ following strategy $\mathfrak{f}$.

The choice of a strategy $\mathfrak{f}$ defines a Markov chain $\mathfrak{X}$ on $S$. The sequence $(\tilde{R}_t)$ denotes the sequence of rewards obtained by the player as a result of state transitions: for each $t \in \mathbb{N}$, $\tilde{R}_t$ is the reward earned during the period $[t, t+1[$, and we write

\begin{equation*} \mathbb{E}_{s, \mathfrak{f}}[\tilde{R}_t] = \mathbb{E}_{\mathfrak{f}}[\tilde{R}_t \mid \mathfrak{X}_0 = s]. \end{equation*}

For every integer $n \in \mathbb{N}$, we denote by

\begin{equation*} G_n = \sum_{t=0}^{n-1} \tilde{R}_t \end{equation*}

the cumulative gain of the player, corresponding to the wealth at step $n$.

A classical problem consists in finding the best strategy $\mathfrak{f}$ that maximizes a certain utility function depending on the rewards $(\tilde{R}_t)$, such as

\begin{equation*} \sum_{t=0}^{\infty} \beta^t \mathbb{E}_{s,\mathfrak{f}}[\tilde{R}_t] \quad \text{with } \beta \in (0,1). \end{equation*}

However, for the problems that interest us, we consider another reward criterion associated to any strategy $\mathfrak{f} \in \mathcal{F}$.

Definition 1.2. Let $\mathcal{T}$ denote the set of stopping times and $\mathcal{T} \cap L^1$ the set of integrable stopping times. Let $s \in S$ and $n \in \mathbb{N}$. For any strategy $\mathfrak{f} \in \mathcal{F}$, define:

\begin{equation*} E_{\mathfrak{f}}(s) = \sup_{\tau \in \mathcal{T} \cap L^1} \left\{\sum_{t=0}^{\tau-1} \mathbb{E}_{s,\mathfrak{f}}[\tilde{R}_t] \right\} \in \overline{\mathbb{R}}, \end{equation*}

\begin{equation*} E_{\mathfrak{f}, n}(s) = \sup_{\substack{\tau \leq n \\ \tau \in \mathcal{T}}} \left\{\sum_{t=0}^{\tau-1} \mathbb{E}_{s,\mathfrak{f}}[\tilde{R}_t] \right\} \in \mathbb{R}. \end{equation*}

In the first case, the sup is taken over all integrable stopping times $\tau$, and in the second case, the sup is taken over all finite stopping times $\tau$ such that $\tau \leq n$.

Definition 1.3. Let $s \in S$ and $(\mathfrak{f},\mathfrak{g}) \in \mathcal{F}$. We say that $\mathfrak{f}$ is less profitable than $\mathfrak{g}$ and we write $\mathfrak{f} \prec \mathfrak{g}$ if $E_{\mathfrak{f}}(s)\leq E_{\mathfrak{g}}(s)$

The goal is to maximize $E_{\mathfrak{f}}(s)$ or sometimes $E_{\mathfrak{f}, n}(s)$.

Definition 1.4. Let $s \in S$ and $n \in \mathbb{N}$. We set:

\begin{equation*} E(s) = \sup_{\mathfrak{f} \in \mathcal{F}} E_{\mathfrak{f}}(s) \in \overline{\mathbb{R}}, \end{equation*}

\begin{equation*} E_n(s) = \sup_{\mathfrak{f} \in \mathcal{F}} E_{\mathfrak{f}, n}(s) \in \mathbb{R}. \end{equation*}

It is clear that

\begin{equation*} E_n(s) \leq E(s) \end{equation*}

for every state $s$. Moreover, if the player chooses to quit immediately and stop playing ( $\tau=0$), his payoff is 0. So,

\begin{equation*} 0 \leq E_n(s) \leq E(s). \end{equation*}

The value $E(s)$ represents the maximum expected gain achievable when playing the stochastic game, where the game terminates at a random time $\tau$ that depends on the system’s positions before the final time. In contrast, $E_n(s)$ denotes the maximum expected gain when the game is constrained to end after at most $n$ actions. Another way of looking at it is to say that $E(s)$ is the fair price to pay for playing the game, starting at state $s \in S$. Note that this $E(s)$ can possibly be infinite. There is a classical case when it is the case.

Proposition 1.1. If the starting state $s \in S$ of a non-competitive stochastic game is a recurrent state for the Markov chain $\mathfrak{X}$ associated with a certain strategy $\mathfrak{f} \in \mathcal{F}$ and if for a return time $\tau$ to $s$ (i.e., $\mathfrak{X}(\tau) = s$) we have

\begin{equation*} \sum_{t=0}^{\tau-1} \mathbb{E}_{s,\mathfrak{f}}[\tilde{R}_t] \gt 0, \end{equation*}

then $E(s) = +\infty$.

Proof. For any $n \in \mathbb{N}$, it suffices to consider the stopping time

\begin{equation*} \tau^{(n)} = \sum_{i=1}^n \tau_i \end{equation*}

where the $\tau_i$ are i.i.d. with the same distribution as $\tau$. Then,

\begin{equation*} \sum_{t=0}^{\tau^{(n)} -1} \mathbb{E}_{s,\mathfrak{f}}[\tilde{R}_t] = n \cdot \sum_{t=0}^{\tau -1} \mathbb{E}_{s,\mathfrak{f}}[\tilde{R}_t], \end{equation*}

which tends to infinity as $n \to \infty$. Hence, we get the result.

We now give the definition of an arbitrable stochastic game.

Definition 1.5. A non-competitive stochastic game starting from state $s \in S$ is said to be arbitrable or biased if

\begin{equation*} E(s) \gt 0. \end{equation*}

It is generally difficult to express $E(s)$ explicitly, but $E_n(s)$ can sometimes be computed straightforwardly by induction on $n \in \mathbb{N}$.

Proposition 1.2. Let $s \in S$ and $n \in \mathbb{N}^*$. Let us assume that for all $s \in S$, $|A(s)|$ is finite. Then,

\begin{equation*} E_n(s) = \max_{\alpha \in A(s)} \left\{\sum_{s' \in S} (E_{n-1}(s') + R(s' \mid s, \alpha)) \cdot \mathbb{P}(s' \mid s, \alpha) \right\}. \end{equation*}

Proof. Starting from state $s$, if the player chooses action $\alpha \in A(s)$, then they transition to state $s' \in S$ with probability $P(s' \mid s, \alpha)$ and receive a reward of $R(s' \mid s, \alpha)$. Therefore, the expected maximal wealth by choosing action $\alpha$ is

\begin{equation*} \sum_{s' \in S} \left( E_{n-1}(s') + R(s' \mid s,\alpha) \right) \cdot P(s' \mid s,\alpha). \end{equation*}

Assume now $|A(s)|$ is finite. If the action $\alpha$ is chosen randomly with probability $\mathfrak{f}_s(\alpha)$, the player obtains on average at most

\begin{equation*} \sum_{\alpha \in A(s)} \mathfrak{f}_s(\alpha) \cdot \left( \sum_{s' \in S} \left( E_{n-1}(s') + R(s' \mid s,\alpha) \right) \cdot \mathbb{P}(s' \mid s,\alpha) \right). \end{equation*}

We have a finite number of real numbers that are explicitly given by $\sum_{s' \in S} \left( E_{n-1}(s') + R(s' \mid s,\alpha)\right)$ for $\alpha\in|A(s)|$, and we are looking for the supremum of a convex combination of these real numbers. The supremum is necessarily attained for one of these real numbers.

Thus,

\begin{align*} E_n(s) &= \sup_{\mathfrak{f}_s} \left\{\sum_{\alpha \in A(s)} \mathfrak{f}_s(\alpha) \cdot \left( \sum_{s' \in S} \left( E_{n-1}(s') + R(s' \mid s,\alpha) \right) \cdot \mathbb{P}(s' \mid s,\alpha) \right) \right\}, \\ &= \max_{\alpha \in A(s)} \left\{\sum_{s' \in S} \left( E_{n-1}(s') + R(s' \mid s, \alpha) \right) \cdot \mathbb{P}(s' \mid s, \alpha) \right\}. \end{align*}

2. Three examples of the coin toss game

We will illustrate the previous section by providing three examples of variations on the classical coin toss game with chips. In each case, it is a variation of the coin toss game but played with chips. The set $S$ of all possible states is always $\mathbb{N}^2$, and for every $s \in S$, the cardinality of $A(s)$ equals $2$. The $i$-th game starting in the state $s$ is denoted by $HT^i(s)$ (HT stands for “Heads-or-Tails”). During these various games, both the player and the casino accumulate chips that can be converted into real money under certain conditions. In any case, we note $E^i(s)$ as the maximum expected gain of the game $HT^i(s)$ starting from state $s$, according to the notation in the previous section, and $E_n^i(s)$ is the maximum expected gain under the constraint that the game ends after at most $n$ actions with $n \in \mathbb{N}$.

2.1. Classic coin toss game with chips

Let $q \in [0, 1/2)$ and $p = 1 - q$. The stochastic game is defined as follows. We have $S = \mathbb{N} \times \mathbb{N}$ and for all $s \in S$, $|A(s)| = 2$. If $s = (a,h)$ with $a \le h$, then $A(s) = \{\text{abandon}, \text{toss}\}$ and if $a \gt h$, $A(s) = \{\text{crush}, \text{toss}\}$. The functions $P$ and $R$ are defined by:

• For all $s' \in S$ and $s = (a,h)$,
\begin{equation*} P(s' \mid s, \text{toss}) = \begin{cases} p &\text{if } s' = (a, h+1)\\ q &\text{if } s' = (a+1, h)\\ 0 &\text{otherwise} \end{cases} \quad \text{and} \quad R(s' \mid s, \text{toss}) = -q. \end{equation*}
• For all $s' \in S$ and $s = (a,h)$ with $a \le h$,
\begin{equation*} P(s' \mid s, \text{abandon}) = \begin{cases} 1 &\text{if } s' = (0,0)\\ 0 &\text{otherwise} \end{cases} \quad \text{and} \quad R(s' \mid s, \text{abandon}) = 0. \end{equation*}
• For all $s' \in S$ and $s = (a,h)$ with $a \gt h$,
\begin{equation*} P(s' \mid s, \text{crush}) = \begin{cases} 1 &\text{if } s' = (a - h - 1, 0)\\ 0 &\text{otherwise} \end{cases} \quad \text{and} \quad R(s' \mid s, \text{crush}) = h + 1. \end{equation*}

Definition 2.1. We denote by $HT^1(a,h)$ the game described above, which begins with the initial state $(a,h)$.

This is a classic version of the game of heads or tails, but with chips. At any given moment, the player has two of the following actions available:

Toss: A croupier tosses a coin rigged in favor of the casino. The probability of getting Tails is $q$. This action costs the player $q$ euros whatever the result.
- • If the result is Heads, the casino wins a chip.
- • If the result is Tails, the player wins a chip.
Crush: This action is only possible if the player (resp. the casino) has $a$ (resp. $h$) chips with $a \gt h$. In this case, the casino loses all its chips, the player loses $h+1$ chips but gains $h+1$ euros. This action costs the player nothing.
Abandon: The casino and the player lose all their chips. This action costs the player nothing.

Theorem 2.1. Let $(a,h) \in \mathbb{N}^2$. Then,

\begin{equation*} E^1(a,h) \leq a. \end{equation*}

Proof. The initial state is $(a,h)$. Let $\mathfrak{f}$ be an arbitrary strategy and $\tau$ an integrable stopping time. The strategy $\mathfrak{f}$ defines a Markov chain $\mathfrak{X}$ on $\mathbb{N}^2$. For $n \in \mathbb{N}$, let $A_n$ denote the first coordinate of $\mathfrak{X}_n \in \mathbb{N}^2$. This represents the number of chips the player holds at step $n$, and let

\begin{equation*} G_n = \sum_{t=0}^{n-1} \tilde{R}_t \end{equation*}

be the cumulative gains in fiat money collected by the player throughout the game.

Note that, regardless of the strategy, the player’s gain is at most equal to the total number of chips received over the course of the game, including the chips the player has at the beginning of the game. This number increases by at most one at each step; thus,

\begin{equation*} G_n \leq n + a \quad \text{for all } n. \end{equation*}

The same argument shows that

\begin{equation*} \sum_{t=0}^{n-1} \max(0, \tilde{R}_t) \leq n + a. \end{equation*}

Similarly, the player loses at most $q$ at each step, so

\begin{equation*} \sum_{t=0}^{n-1} \min(0, \tilde{R}_t) \geq -q n. \end{equation*}

It follows that

\begin{equation*} \sum_{t=0}^{n-1} |\tilde{R}_t| = \sum_{t=0}^{n-1} \max(0, \tilde{R}_t) - \sum_{t=0}^{n-1} \min(0, \tilde{R}_t) + a \leq (1 + q) n + a. \end{equation*}

By assumption, $\tau$ is integrable. This ensures that the sequence $(\sum_{t=0}^{n \wedge (\tau - 1)} |\tilde{R}_t|)_n$ is bounded by $(1+q)\tau + 1 + q + a$ and so converges in $L^1$ as $n \to \infty$ as does $\sum_{t=0}^{n \wedge (\tau - 1)} \tilde{R}_t$. Therefore,

(1)

\begin{equation} \mathbb{E}_{s,\mathfrak{f}} \left[ \sum_{t=0}^{\tau -1} \tilde{R}_t \right] = \sum_{t=0}^{\tau -1} \mathbb{E}_{s,\mathfrak{f}} [\tilde{R}_t].\end{equation}

Let $n \in \mathbb{N}$ and $X \in \{\text{Abandon}, \text{Toss}, \text{Crush}\}$. By definition, the event $\{f_n = X\}$ means that action $X$ was chosen at step $n$. If Toss is chosen at step $n$, then necessarily $G_{n+1} = G_n - q$, and

\begin{equation*} \mathbb{E}_{s,\mathfrak{f}}[A_{n+1} \mid f_n = \text{Toss}] = p A_n + q (A_n + 1). \end{equation*}

First, suppose $\mathfrak{X}_n = (a,h)$ with $a \leq h$. If Abandon is chosen at step $n$, then $A_{n+1} = 0$ and $G_{n+1} = G_n$. Thus,

\begin{align*} \mathbb{E}_{s,\mathfrak{f}}[A_{n+1} + G_{n+1}] &= \mathbb{E}_{s,\mathfrak{f}}[A_{n+1} + G_{n+1} \mid f_n = \text{Toss}] \mathbb{P}[f_n = \text{Toss}] \\ &\quad + \mathbb{E}_{s,\mathfrak{f}}[A_{n+1} + G_{n+1} \mid f_n = \text{Abandon}] \mathbb{P}[f_n = \text{Abandon}] \\ &= (p A_n + q (A_n + 1) + G_n - q) \mathbb{P}[f_n = \text{Toss}] + G_n \mathbb{P}[f_n = \text{Abandon}] \\ &\leq (A_n + G_n)(\mathbb{P}[f_n = \text{Toss}] + \mathbb{P}[f_n = \text{Abandon}]) \\ &\leq A_n + G_n. \end{align*}

Similarly, in the case $a \gt h$, if Crush is chosen at step $n$, then $A_{n+1} = A_n - h - 1$ and $G_{n+1} = G_n + h + 1$, so

\begin{equation*} \mathbb{E}_{s,\mathfrak{f}}[A_{n+1} + G_{n+1} \mid f_n = \text{Crush}] = A_n + G_n, \end{equation*}

and

\begin{align*} \mathbb{E}_{s,\mathfrak{f}}[A_{n+1} + G_{n+1}] &= \mathbb{E}_{s,\mathfrak{f}}[A_{n+1} + G_{n+1} \mid f_n = \text{Toss}] \mathbb{P}[f_n=\text{Toss}] \\ &\quad + \mathbb{E}_{s,\mathfrak{f}}[A_{n+1}+G_{n+1} \mid f_n = \text{Crush}] \mathbb{P}[f_n=\text{Crush}] \\ &= (p A_n + q (A_n + 1) + G_n - q) \mathbb{P}[f_n=\text{Toss}] + (A_n + G_n) \mathbb{P}[f_n=\text{Crush}] \\ &= (A_n + G_n)(\mathbb{P}[f_n=\text{Toss}] + \mathbb{P}[f_n=\text{Crush}]) = A_n + G_n. \end{align*}

It follows that, regardless of the chosen strategy, $(A_n + G_n)$ is a supermartingale. The stopping time $\tau \wedge n$ is finite, thus by Doob’s theorem,

\begin{equation*} \mathbb{E}_{s,\mathfrak{f}}[A_{\tau \wedge n} + G_{\tau \wedge n}] \leq A_0 + G_0. \end{equation*}

Therefore,

\begin{equation*} \mathbb{E}_{s,\mathfrak{f}}[G_{\tau \wedge n}] \leq A_0 + G_0 = a, \end{equation*}

since the initial state is $(a,h)$ and $G_0 = 0$. By (1), $\mathbb{E}_{s,\mathfrak{f}}[G_{\tau \wedge n}]$ converges to $\sum_{t=0}^{\tau - 1} \mathbb{E}_{s,\mathfrak{f}}[\tilde{R}_t]$. Therefore,

\begin{equation*} \sum_{t=0}^{\tau -1} \mathbb{E}_{s,\mathfrak{f}}[\tilde{R}_t] \leq a \end{equation*}

for any strategy $\mathfrak{f}$ and any stopping time $\tau$. So,

\begin{equation*} E(a,h) \leq a. \end{equation*}

Corollary 2.1. For all $h \geq 0$, the stochastic game $HT^1(0,h)$ is non-arbitrable.

Proof. By Theorem 2.1, $E^1(0,h) \leq 0$. But of course, if the player chooses Abandon immediately and leaves the game, his payoff is 0. So, also $E^1(0,h) \geq 0$. Hence, we get $E^1(0,h) = 0$ and $HT^1(0,h)$ is non-arbitrable.

Corollary 2.2. For all $n \in \mathbb{N}$, we have $E_n^1(0,0)=0$.

Corollary 2.3. Let $(a,h) \in \mathbb{N}^2$ with $a \gt h$. Then, $E^1(a,h) = a$.

Proof. Starting from $(a,h) \in \mathbb{N}^2$ with $a \gt h$, if the player repeatedly chooses the crush action until they have used up all their chips, then they get $a$. So, $E^1(a,h) \geq a$ and also $E^1(a,h) \leq a$ by Theorem 2.1. Hence, we get the result.

This last result can be interpreted as meaning that if, during the game, the player manages to obtain more chips than the casino, then their best strategy is to use the Crush action and end the game.

Note, however, that for $(a,h) \in \mathbb{N}^2$, the game $HT^1(a,h)$ may be arbitrable when $0 \lt a \lt h$.

Proposition 2.1. The stochastic game $HT^1(1,2)$ is arbitrable for $q \gt 0.429056$.

Proof. By Proposition 1.2 and Corollary 2.2, for $0 \leq a \leq h$, we have:

\begin{align*} E_n^1(a,h) &= \max \left\{E_{n-1}^1(0,0), \ p E_{n-1}^1(a,h+1) + q E_{n-1}^1(a+1,h) - q \right\} \\ &= \max \left\{0, \ p E_{n-1}^1(a,h+1) + q E_{n-1}^1(a+1,h) - q \right\} \end{align*}

Moreover, by Proposition 1.2, for $a \gt h$,

\begin{equation*} E_n^1(a,h) = \max \{h+1 + E_{n-1}^1(a - h-1, 0), \ p E_{n-1}^1(a,h+1) + q E_{n-1}^1(a+1,h) - q \}. \end{equation*}

On the other hand, for $(a,h) \in \mathbb{N}^2$, $E_0^1(a,h) = 0$. Previous formulas allow $E_n^1(a,h)$ to be calculated by induction on $n$. Below is a simple pseudo-code that accurately gives the average maximum gain $E_n^1(a,h)$. We use the memoization principle for the sake of efficiency [Reference Cormen, Leiserson, Rivest and Stein7].

We observe that $E^1_n(1, 2) = 4.050134694288943 \times 10^{8} \gt 0$ for $q=0.429056$ and $n=75$. In other words, if $q \gt 42.91\%$, the game $HT^1(1, 2)$ is arbitrable.

Note, however, that we are unable to find $n\in\mathbb{N}$ such that $E^1_n(1, 2) \gt 0$ for $q\leq 0.429055$.

2.2. Optimal strategy

As noted after Corollary 2.3, the optimal action is Crush whenever $a \gt h$. Furthermore, when $E(a,h) \gt 0$ and $a \leq h$, the optimal action must be Toss rather than Abandon, since Abandon leads to the state $(0,0)$, and $E^1(0,0) = 0$ by Corollary 2.2. The table below details the optimal action when the game begins in state $(1,2)$. Each time the player selects the Crush action, the game terminates because the next state is $(0,0)$ and $E^1(0,0) = 0$, as stated in Corollary 2.2. The function $E^1(a,h)$ was computed using the preceding algorithm with $n = 500$ and $q=0.45$. See Table 1.

Table 1.

Optimal strategy for the game $E^1(1,2)$ when $q=0.45$.

The player leaves the game as soon as he chooses the Abandon action, denoted by A, or the Crush action, denoted by C. The Toss action is denoted by T. An asterisk (*) indicates that the state is unreachable.

We observe that when the player is two chips behind the casino but has at least three chips, continuing to play is advantageous (with $q=0.45$ as before).

2.3. Modified coin toss game with chips. Second version

The game is similar to the previous one, except that the player does not always pay $q$ each time they use the Toss action. They only pay when the result of this action is unfavorable to them (when the result is Head and the casino wins a chip). As compensation, the action Crush yields slightly less when used because now, each time the player uses it, they must pay $q$. We will see that this is not sufficient and that the game is biased if $q$ is large enough. To be concrete, the stochastic game is defined as follows. We have $S = \mathbb{N} \times \mathbb{N}$ and for all $s \in S$, $|A(s)| = 2$. If $s = (a,h)$ with $a \leq h$, $A(s) = \{\text{abandon}, \text{toss}\}$ and if $a \gt h$, $A(s) = \{\text{crush}, \text{toss}\}$.

The functions $P$ and $R$ are defined by:

• For all $s' \in S$ and $s = (a,h)$,
\begin{equation*} P(s' \mid s, \text{toss}) = \begin{cases} p & \text{if } s' = (a,h+1), \\ q & \text{if } s' = (a+1,h), \\ 0 & \text{otherwise}, \end{cases} \quad R(s' \mid s, \text{toss}) = \begin{cases} -q & \text{if } s' = (a,h+1), \\ 0 & \text{otherwise}. \end{cases} \end{equation*}
• For all $s' \in S$ and $s = (a,h)$ with $a \leq h$,
\begin{equation*} P(s' \mid s, \text{abandon}) = \begin{cases} 1 & \text{if } s' = (0,0), \\ 0 & \text{otherwise}, \end{cases} \quad R(s' \mid s, \text{abandon}) = 0. \end{equation*}
• For all $s' \in S$ and $s = (a,h)$ with $a \gt h$,
\begin{equation*} P(s' \mid s, \text{crush}) = \begin{cases} 1 & \text{if } s' = (a - h - 1, 0), \\ 0 & \text{otherwise}, \end{cases} \quad R(s' \mid s, \text{crush}) = h + 1-q. \end{equation*}

We denote by $HT^2(a,h)$ the game described above, which begins with the initial state $(a,h)$. In other terms, during the course of the game, the player regularly accumulates chips and can, under certain constraints, convert them into cash (euros, let’s say). At any given moment, the player has a maximum of three possible actions.

Toss: A croupier tosses a coin rigged in favor of the casino. The probability of getting Tails is $q$.
- • If the result is Tails, the player wins a chip and pays nothing.
- • If the result is Heads, the casino wins a chip and the player pays $q$ to the dealer.
Crush: This action is only possible if the player (resp. the casino) has $a$ (resp. $h$) chips with $a \gt h$. In this case, the casino loses all its chips, the player loses $h+1$ chips but wins $h+1$ euros and also gives $q$ euros to the dealer. His net result is therefore $h+1-q$ euros.
Abandon: The casino and the player lose all their chips. This action costs the player nothing.

Theorem 2.2. The stochastic game $HT^2(0,0)$ is arbitrable for $q \gt 0.329393$.

Proof. We use Proposition 1.2 to calculate $E_n^2(a,h)$ by induction on $n$. For any $n \in \mathbb{N}$ and $a \gt h$, if the player decides to use the Crush action, then the state of the network changes from $(a,h)$ to $(a - h - 1, 0)$ and the game continues. Thus, for $a \gt h$, we have

\begin{equation*} E_n^2(a,h) = \max \big\{h + 1 - q + E_{n-1}^2(a - h - 1, 0), q E_{n-1}^2(a + 1, h) + p \big( E_{n-1}^2(a, h + 1) - q \big) \big\}. \end{equation*}

In the case where $a \leq h$, the player has basically the choice between Toss or Abandon. Therefore,

\begin{equation*} E_n^2(a,h) = \max \big\{E_{n-1}^2(0,0), q E_{n-1}^2(a + 1, h) + p \big( E_{n-1}^2(a, h + 1) - q \big) \big\}. \end{equation*}

Below is a very simple pseudo-code that precisely provides the maximal expected gain $E_n^2(a,h)$ through memoization.

We note that for $q = 0.329393$ and $n=146$,

\begin{equation*} E^2_{n}(0,0) = 4.4530581139179404 \times 10^{-8}. \end{equation*}

so, the game $HT^2(0,0)$ is biased.

2.4. The optimal strategy

By modifying the previous code and imposing certain conditions for different values of $a$ and $h$, we can obtain the optimum solution. The solution is given in Table 2.

Table 2.

Optimal strategy for $q \gt 0.329393$. The parameter $a$ (resp. $h$) is represented horizontally (resp. vertically) from $0$ to $8$.

For each value of $a$ and $h$ an action is selected from $\{A,T,C\}$ ( $A=$ Abandon, $T=$ Toss, $C=$ Crush). The * symbol means that the action is irrelevant.

In comparison, the honest strategy is just (all other states $(a,h)$ for $a\geq 2$ or $h\geq 2$ are irrelevant)

We observe that the optimal strategy is both conservative and aggressive. It is conservative because in a situation which is favorable to the player, he does not take the risk of being caught and losing all his advantage, except in the case where $a=1$ and $h=0$. In all other cases where $a=h+1$, he uses the Crush action. The optimal strategy is also aggressive because, in an unfavorable situation where $a \lt h$, the more chips the player has, the less likely he is to give up. For example, in a situation where $a\geq 5$, the player does not give up if $h\leq a+$2, and similarly if $a\geq 10$, he continues to play (Toss action) if $h\leq a+$3.

2.5. A third coin toss game

During this game, the player regularly accumulates chips that, under certain constraints, they can convert into cold hard cash (let’s say euros). At any given moment, the player has at most three possible actions:

Toss: A croupier tosses a coin rigged in favor of the casino. The probability of getting Tails is $q$.
- • If the result is Tails, the player wins a chip and pays nothing.
- • If the result is Heads, the casino wins a chip and the player pays $q$ to the dealer.
Crush: This action is only possible if the player (or the casino) has $a$ (or $h$) chips with $a \gt h$. In this case, the casino loses $h$ chips, the player loses $h+1$ chips and wins $h+1$ euros, but he must also give the dealer $q\,(h+1)$. Hence, their net result is $p.(h+1)$ euros.
Abandon: The casino and the player lose all their chips. This action costs the player nothing.

The only difference between $HT^2$ and $HT^3$ is the consequence of the Crush action. In $HT^3$, the player earns less than in $HT^2$. To be concrete $HT^3$ is defined as follows. Let $S = \mathbb{N} \times \mathbb{N}$ and for all $s \in S$, $|A(s)| = 2$. If $s = (a,h)$ with $a \leq h$, then $A(s) = \{\text{abandon}, \text{toss}\}$, and if $a \gt h$, then $A(s) = \{\text{crush}, \text{toss}\}$. The functions $P$ and $R$ are defined by:

• For all $s' \in S$ and $s = (a,h)$,
\begin{equation*} P(s' \mid s, \text{toss}) = \begin{cases} p & \text{if } s' = (a, h+1) \\ q & \text{if } s' = (a+1, h) \\ 0 & \text{otherwise} \end{cases} , \quad R(s' \mid s, \text{toss}) = \begin{cases} -q & \text{if } s' = (a, h+1) \\ 0 & \text{otherwise} \end{cases} \end{equation*}
• For all $s' \in S$ and $s = (a,h)$ with $a \leq h$,
\begin{equation*} P(s' \mid s, \text{abandon}) = \begin{cases} 1 & \text{if } s' = (0,0) \\ 0 & \text{otherwise} \end{cases} , \quad R(s' \mid s, \text{abandon}) = 0. \end{equation*}
• For all $s' \in S$ and $s = (a,h)$ with $a \gt h$,
\begin{equation*} P(s' \mid s, \text{crush}) = \begin{cases} 1 & \text{if } s' = (a - h - 1, 0) \\ 0 & \text{otherwise} \end{cases} , \quad R(s' \mid s, \text{crush}) = p (h+1). \end{equation*}

Definition 2.2. We denote by $HT^3(a,h)$ the game described above, where the player starts from a position with $a$ chips against $h$ chips for the casino, that is, $(a,h)\in\mathbb{N}^2$.

We prove that $HT^3(0,0)$ is not arbitrable.

Theorem 2.3. Let $(a,h)\in\mathbb{N}^2$. Then, $E^3(a,h)\leq p\, a$.

Proof. The proof is analogous to that of Theorem 2.1. In particular, Equation (1) holds. This time, the sequence $(p A_n + G_n)_n$ is always a supermartingale regardless of the player’s strategy. Indeed, denoting by $\{f_n = X\}$ the event that action $X$ was chosen at step $n$, for an integer $n$, we have

\begin{equation*} \mathbb{E}_{s,\mathfrak{f}}[A_{n+1} \mid f_n = \text{Toss}] = p A_n + q (A_n + 1), \end{equation*}

\begin{equation*} \mathbb{E}_{s,\mathfrak{f}}[G_{n+1} \mid f_n = \text{Toss}] = p (G_n - q) + q G_n. \end{equation*}

Hence,

\begin{equation*} \mathbb{E}_{s,\mathfrak{f}}[p A_{n+1} + G_{n+1} \mid f_n = \text{Toss}] = p (p A_n + q (A_n + 1)) + p (G_n - q) + q G_n = p A_n + G_n. \end{equation*}

Suppose first that $\mathfrak{X}_n = (a,h)$ with $a \leq h$. If Abandon was chosen at step $n$, then $A_{n+1} = 0$ and $G_{n+1} = G_n$. Thus,

\begin{align*} \mathbb{E}_{s,\mathfrak{f}}[p A_{n+1} + G_{n+1}] &= \mathbb{E}_{s,\mathfrak{f}}[p A_{n+1} + G_{n+1} \mid f_n = \text{Toss}] \, \mathbb{P}[f_n = \text{Toss}] \\ &\quad + \mathbb{E}_{s,\mathfrak{f}}[p A_{n+1} + G_{n+1} \mid f_n = \text{Abandon}] \, \mathbb{P}[f_n = \text{Abandon}] \\ &= (p A_n + G_n) \, \mathbb{P}[f_n = \text{Toss}] + G_n \, \mathbb{P}[f_n = \text{Abandon}] \\ &\leq (p A_n + G_n) \left(\mathbb{P}[f_n = \text{Toss}] + \mathbb{P}[f_n = \text{Abandon}]\right) \\ &\leq p A_n + G_n. \end{align*}

Similarly, in the case $a \gt h$, if Crush was chosen at step $n$, then $A_{n+1} = A_n - h - 1$ and $G_{n+1} = G_n + p (h+1)$. Hence,

\begin{equation*} \mathbb{E}_{s,\mathfrak{f}}[p A_{n+1} + G_{n+1} \mid f_n = \text{Crush}] = p A_n + G_n, \end{equation*}

and

\begin{equation*} \begin{aligned} \mathbb{E}_{s,\mathfrak{f}}[p A_{n+1} + G_{n+1}] &= \mathbb{E}_{s,\mathfrak{f}}[p A_{n+1} + G_{n+1} \mid f_n = \text{Toss}] \mathbb{P}[f_n = \text{Toss}] \\ &\quad + \mathbb{E}_{s,\mathfrak{f}}[p A_{n+1} + G_{n+1} \mid f_n = \text{Abandon}] \mathbb{P}[f_n = \text{Abandon}] \\ &= (p A_n + G_n) \mathbb{P}[f_n = \text{Toss}] + G_n \mathbb{P}[f_n = \text{Abandon}] \\ &\leq (p A_n + G_n) \bigl(\mathbb{P}[f_n = \text{Toss}] + \mathbb{P}[f_n = \text{Abandon}]\bigr) \\ &\leq p A_n + G_n. \end{aligned} \end{equation*}

It follows that, whatever the strategy chosen, $(p A_n + G_n)$ is a supermartingale. The stopping time $\tau \wedge n$ is finite, so by Doob’s theorem,

\begin{equation*} \mathbb{E}_{s,\mathfrak{f}}[p A_{\tau \wedge n} + G_{\tau \wedge n}] \leq p A_0 + G_0. \end{equation*}

Hence,

\begin{equation*} \mathbb{E}_{s,\mathfrak{f}}[G_{\tau \wedge n}] \leq p\, a \end{equation*}

since the initial state is $(a,h)$ by hypothesis and $G_0=0$. Moreover, by Equation (1), $\mathbb{E}_{s,\mathfrak{f}}[G_{\tau \wedge n}]$ converges to $\sum_{t=0}^{\tau - 1} \mathbb{E}_{s,\mathfrak{f}}[\tilde{R}_t]$. Therefore,

\begin{equation*} \sum_{t=0}^{\tau - 1} \mathbb{E}_{s,\mathfrak{f}}[\tilde{R}_t] \leq p\, a \end{equation*}

which completes the proof.

By taking $(a,h)=(0,0)$, we deduce the following corollary.

Corollary 2.4. The game $HT^3(0,0)$ game is not arbitrable.

3. Application to bitcoin

We now turn to applications to the Bitcoin protocol [Reference Antonopoulos1, Reference Nakamoto21]. We present three mining problems on the Bitcoin network, all of which we solve using the stochastic games studied previously. The first variant models temporary Bitcoin forks, determining the threshold at which an honest but temporarily “Byzantine” miner continues mining on their fork to recover orphaned blocks. The second, biased variant explains vulnerabilities in the difficulty adjustment formula, providing a simple derivation—without Markov decision solvers—of when a miner lacking connectivity benefits from a deviant strategy. The unbiased third variant shows that this difficulty adjustment issue can be theoretically fully corrected. Our results align with the literature and clarify it quantitatively and qualitatively. In each case, the random event corresponding to the creation of a block is modeled as a coin toss: if Heads occurs, the honest miners find a block; if Tails occurs, the attacker does.

It is the intrinsic nature of the game—namely, the fact that it can be biased in favor of the player according to the definition given in the previous section—that allows us to understand why the honest strategy is not always optimal.

In the first case (Sections 2.1 and 3.1), it is in the miner-player’s best interest to maintain their fork because if they succeed, they stand to gain significantly; the risk is therefore justified. Therefore, if the miner-player starts with a slight advantage ( $a \gt 0$), the game $HT^1(a,h)$ can be arbitrable even when $h \gt a$. See Proposition 2.1.

In the second case (Sections 2.3 and 3.2), the effect of the difficulty adjustment is reflected in the cost incurred by the miner-player when new blocks are added. Only the height of the official blockchain is considered in the difficulty adjustment, meaning only unfavorable outcomes for the miner (i.e., when Heads occurs) impose a cost. When the miner discovers a block (i.e., when Tails occurs), the official blockchain height does not increase. From this, we can see that the game is biased in favor of the miner-player even when $a=0$ (no blocks at the start). See Theorem 2.2. The fact that the game $HT^2(0,0)$ is arbitrable highlights a flaw in Bitcoin’s difficulty adjustment mechanism.

Finally, in the third case (Sections 2.5 and 3.3), when the miner overwrites the official blockchain, they orphan all the honest miners’ blocks that they replace. As a result, their net gain is reduced, since these orphaned blocks are publicly known and the miner now incurs a cost for each orphaned block created. The fact that the $HT^3(0,0)$ game is not arbitrable reflects the correction made to the difficulty adjustment formula.

3.1. Temporarily Byzantine by “force of circumstance”

When analyzing the security of certain systems, it is common practice in computer science to consider two very distinct categories of actors: honest participants, who respectfully follow the rules of protocol, and attackers. Following the terminology introduced by [Reference Lamport, Shostak and Pease19] in the study of distributed systems, the latter are called “Byzantines.” In general, we do not change categories. Nevertheless, depending on the circumstances, we may occasionally be led to do so, such as a person who is fundamentally honest but finds a large sum of money on the street and decides to keep it for himself, without any effort to find the rightful owner. We consider a simple situation where an honest miner on the Bitcoin network can be tricked into not respecting the rules of the protocol: the creation of a temporary fork. This is a relatively rare occurrence, but not an extremely rare one. According to statistical analyses carried out between 18/03/2014 and 14/06/2017, the rate of orphan block creation was 0.31% for this period, and it is likely that this rate is even lower today thanks to the new versions of Bitcoin Core [4, 8]. We consider the case where two “honest” miners, each mining on the official blockchain, find a block at almost the same time. “Honest” means here, as elsewhere in the article, that the miner always mines on the last block of the official blockchain and always immediately makes his discoveries public. In general, the first block discovered takes precedence and the second is considered “orphaned,” although this terminology is imprecise. The Bitcoin wiki site prefers to speak of a “stale block” [5]. The miner who mined the second block is then drawn into a deviant posture. It is clearly in his interest to mine his orphan block rather than the last official block, because if he manages to mine a new block before the rest of the network, he will earn the reward contained in two blocks rather than just one. But then imagine that the other miners discover a block before he does. He must now not only catch up with the official blockchain, but also mine an additional block to gain the upper hand. Should he continue mining on his fork, or return to mining on the last block of the official blockchain, as stipulated by the Bitcoin protocol? This is an unprecedented situation for the miner, who eventually becomes “Byzantine” by “force of circumstances.” The situation is indeed unprecedented, since it is perhaps the first time that it has been imposed on the miner, and it is unlikely that he will find himself in the same situation twice in a row in the future in the course of his activity. Furthermore, being fundamentally “honest,” the miner, if he manages to catch up with and surpass the official blockchain, will benefit immediately. In particular, he will not engage in a block-withholding attack. The miner may become temporarily byzantine. His attack starts when he is one block behind the official blockchain and ends as soon as he gives up or manages to catch up and exceed the official blockchain by one block. In both cases, he resumes his position as an honest miner. The natural question is: Is it in his interest to continue mining on his fork, or should he abandon it and return to mining on the official blockchain? What is the threshold in terms of relative hash power at which an a priori “honest” miner has an interest in stubbornly mining on his fork when he is one block behind on the official blockchain? This question can be resolved using the classic coin toss game seen in Section 2.1.

• The Toss action corresponds to the fact that the miner persists in mining on his fork despite a delay on the official blockchain.
• The Crush action corresponds to the fact that the miner has a secret fork enabling him to gain an advantage over the official blockchain. He then decides to make it public and pocket all the rewards it contains; he then naturally resumes his position as an honest miner.
• The Abandon action means that the miner returns to mining on the last block of the official blockchain, like any honest miner.

Remark 3.1. In fact, given that the miner overwrites the official blockchain whenever they have the opportunity, the associated game is actually only a subgame of the first game described in Section 2.1, in the sense of Definition 3.1 below. For $a \gt h$, we have $A(s) = \{\text{Crush}\} \subset \{\text{Crush}, \text{Toss}\}$. However, it was also shown in this section that the optimal strategy for this game is to stop playing as soon as the Abandon or Crush action is used once.

Definition 3.1 (Sub-game)

We say that a non-competitive stochastic game $(S, A, P, R)$ is a sub-game of another non-competitive stochastic game $(S', A', P', R')$ if the following conditions are satisfied : $S\subset S'$, for any $s\in S$, $A(s)\subset A'(s)$, for any $(s,s')^2\in S$ and any action $\alpha\in A(s)$, we have $P(s' \mid s, \alpha)=P'(s' \mid s, \alpha)$ and $R(s' \mid s, \alpha)=R'(s' \mid s, \alpha)$.

In the temporary fork situation we are considering, a mining strategy is just the stopping time $\tau$ that specifies the first instant when the miner returns to mine on the official blockchain. We denote $R(\tau)$ the income earned by the miner following this strategy. We need to compare $R(\tau)$ with the income the miner would have earned in $\tau$ if he had mined honestly all along. Given that the miner’s relative hashing power is $q$, the latter quantity is worth on average $q b\frac{\mathbb{E} [\tau]}{\tau_0}$ with ${\tau_0}=10$ minutes and $b=3.125$ BTC (current value of a coinbase) plus the average value of transaction fees present in a block. So the key quantity for choosing the $\tau$ strategy over the honest one is $\mathbb{E} [R(\tau)]-q b \frac{\mathbb{E} [\tau]}{\tau_0}$. The $\tau$ strategy is preferable to the honest strategy if $\mathbb{E} [R(\tau)]-q b \frac{\mathbb{E} [\tau]}{\tau_0} \gt 0$. The second term $-q b \frac{\mathbb{E} [\tau]}{\tau_0}$ is then interpreted as a cost. In this expression, everything happens as if the miner were paying $q b$ every time a block is discovered. Hence, the fact that the player-miner pays a fixed cost to the croupier, which is $q$ whatever the result of the coin toss (the unit of wealth is $b$). The parameter $q$ is the probability that the coin will land on Tails, which corresponds to the miner finding a block before the honest miners.

As observed in Section 2.1, for $q=0.429056$, we have $E^1_{75} (1, 2) \gt 0$. However, we are unable to find $n$ such that $E^1_{n} (1, 2) \gt 0$ for $q\leq 0.429055$. So, we can state.

Proposition 3.1. In the case of a temporary fork, the minimum threshold beyond which an a priori honest miner to insist on mining on his fork even though he is one block behind on the blockchain is about $42, 91$%.

The result is in line with the $36.1$% and $45.5$% bounds obtained in [Reference Kiayias, Koutsoupias, Kyropoulou and Tselekounis17] with calculus in their section “The Immediate-Release Game.” Moreover, in a presentation of this article given at the University of Crete in $2019$ the speaker (who is also one of the authors of the article) is more precise and states that the threshold lies between $42$% and $43$% (the lower bound $42$% is also in the paper) [Reference Kiayias, Koutsoupias, Kyropoulou and Tselekounis17, Reference Koutsoupias18]. An a priori honest miner therefore has a temporary interest in holding on to his fork as soon as he has more than $42.91\%$ of relative hash power. Note, however, that if the honest miners find a second block again (so that $a=1$ and $h=3$) then the miner has an interest in giving up and returning to mine on the last block of the official blockchain. Indeed, we are unable to find a value of $n\in\mathbb{N}$ and $q \lt \frac{1}{2}$ such that $E^1_{n}(1,3) \gt 0$.

3.2. To be or not to be totally Byzantine?

We now consider another mining problem. At what relative hash power $q$ does it no longer make sense for a miner to be honest? For a miner, being honest means always mining on the last block of the official blockchain and always making any blocks discovered public. Not being honest means the opposite: keeping discovered blocks secret or not mining on the last block of the official blockchain. The problem under consideration is fundamentally different from the one previously considered. The miner is not an honest miner who momentarily becomes “Byzantine” by force of circumstances. On the contrary, he chooses his camp from the outset (honest or Byzantine) and never leaves it [Reference Rosenfeld23]. What is more, his mining strategy is not limited in time. On the contrary, it is unlimited and repetitive. This problem has certainly already been solved in the general case [Reference Sapirshtein, Sompolinsky and Zohar24]. The authors recognize a Markov decision problem, which they solve using a solver. They are then confronted with technical issues, as a priori, such a solver can only be used in the case where the number of system states is finite. We show that it is possible to simplify this problem in the case where the connectivity of the miner is zero, and that it then resembles the second non-competitive stochastic game $HT^2$. Recall that miner connectivity is a parameter introduced by [Reference Eyal and Sirer9] and picked up by many authors since then. This parameter, often denoted by the Greek letter $\gamma$, measures the attacker’s ability to react in the case where it possesses a hidden block on one of its computers that has the same height as the latest block in the official blockchain. If it is well-connected, it will quickly learn of the existence of a new block before the others and announce the existence of its own hidden block to the rest of the network. This is a measure of its ability to create confusion in the network.

Since the network is constantly evolving, it is an illusion to believe that $\gamma$ remains constant over time. However, this is an assumption often made when assessing the profitability of mining strategies. In itself, connectivity is an attack vector that was not imagined by Satoshi Nakamoto, since it does not feature in his founding paper. With $\gamma=1$, a miner has no incentive to be honest. He has no interest in publishing a block he has just discovered. He can simply wait for another block to be discovered and react then. It is interesting to pose $\gamma=0$ to understand how Nakamoto’s consensus can naturally be faulted without adding this attack vector. We therefore formulate this hypothesis ( $\gamma=0$) and, within this framework, we seek to find out whether a miner has an incentive to behave honestly or whether there is a more profitable mining strategy. A mining strategy specifies the action to be taken by the miner depending on the state of the network. The chosen mining strategy, whether honest or not, is repetitive. In due course, the miner will almost certainly return to his starting point and mine on the last block of the official blockchain. When this happens, the miner is said to have completed a cycle. During this cycle, we denote by $R$ the number of blocks added by the attacker to the official blockchain and $H$ the progression of the height of the official blockchain. In concrete terms, at any given moment, the attacker has the choice between mining on his secret fork, overriding the official blockchain when he has the means to do so, or giving up and returning to mine on the last block of the official blockchain. In reality, in the general case, he has an additional action at his disposal: the action noted as “Match” by the authors [Reference Sapirshtein, Sompolinsky and Zohar24], which consists in revealing a block already mined but kept secret by the attacker. However, under our assumption $\gamma=0$, this action is not possible. Furthermore, the state of the network is normally modeled by a triplet $(a,h,f)$ where $a$ (resp. $h$) designates the number of blocks mined by the attacker (resp. honest miners) on the last fork created and $f$ designates the possibility of using the Match action or the fact that it is already activated. In the case of $\gamma=0$, the latter parameter is irrelevant. The state of the network issimply modeled by a pair $(a,h)$.

The effect of the difficulty adjustment formula. PnL (Profit and Loss) per unit of time is the only quantity that makes economic sense. The cost of mining per unit of time is independent of the mining strategy chosen (keeping blocks secret may have an impact on the miner’s income, but not on his cost of mining). So, the quantity that allows us to compare different full-time mining strategies is the revenue per unit of time. This key observation was made in [Reference Grunspan and Pérez-Marco13], before only the relative proportion of mined blocks in the official blockchain was used, without justification, as a benchmark of the profitability of the strategy. Only in the long run, after difficulty adjustments these are equivalent. A difficulty adjustment formula occurs when the official blockchain grows by $2016$ blocks. This has the effect of maintaining an average duration of $10$ minutes each time the height function of the official blockchain increases by $1$. In these circumstances, only in the long run, the percentage of blocks mined by the attacker present in the official blockchain gives the revenue in the long run. In concrete terms, when a given mining strategy, or “mining policy,” is modeled by a Markov chain as in [Reference Sapirshtein, Sompolinsky and Zohar24], the measure of revenue per unit of time is given by $\frac{\mathbb{E}[R]}{\mathbb{E}[H]}$ only when the interblock time stabilizes, for integrable strategies (i.e., those with finite time expectations for the cycles where $\mathbb{E}[H] \lt \infty$). When the miner mines honestly, this quantity is equal to the miner’s relative hash power, which we have always denoted $q$. This leads to the following proposition (Corollary 10.1 of [Reference Sapirshtein, Sompolinsky and Zohar24], which is a Corollary of the more general Proposition 3.6 of [Reference Grunspan and Pérez-Marco13]).

Proposition 3.2. An admissible mining strategy is more profitable than the honest strategy if and only if $\mathbb{E}[R-q H] \gt 0$.

As in the previous section, we can interpret the second term (here - $q H$) as a cost. But unlike in the previous section, the miner no longer pays $q$ each time a block is discovered (by him or the rest of the network), but only only each time the function $H$ increases by $1$, i.e., each time the official blockchain progresses. So, we are led to game $HT^2(0,0)$.

• A coin toss by the dealer is equivalent to the discovery of a block by the honest miner or miners.
• The Toss action is equivalent to the miner choosing to mine secretly and wait for a block to be discovered. If the result is Heads (probability $p$), then the official blockchain advances by one block. The height function, therefore, increases by $1$, resulting in a cost $q$ paid by the player in this case. If, on the other hand, the result is Tails (this event occurs with probability $q$), the block discovered by the miner is kept secret. So, the height of the official blockchain does not increase. Hence, a zero cost since the function $H$ has not increased.
• The Crush action means that the miner replaces the last $h$ blocks of the official blockchain with his own. For this action to be possible, the miner must reveal one more block ( $h+1$ in all), which increases the height of the official blockchain by $1$. The miner then gains the reward contained in $h+1$ blocks, and at the same time, the height $H$ of the official blockchain increases by $1$. Hence, a net gain of $h+1-q$.
• The Abandon action is equivalent to the miner dropping his secret fork and returning to mine on the last block of the official blockchain. He neither gains nor loses anything with this action, as the height of the official blockchain remains unchanged.
Proposition 3.3. The threshold beyond which a miner with zero connectivity is incentivized to choose a deviant strategy is approximately $32.94\%$.

This result coincides with that obtained with the Python implementation of the article [Reference Sapirshtein, Sompolinsky and Zohar24], see [20]. Note that this threshold is not far from the $\frac{1}{3}$ threshold for the classical selfish mining strategy. However, the optimal strategy differs slightly from the selfish mining approach. In certain scenarios, the miner might find it advantageous to persist with their attack even when they are behind the official blockchain, see [Reference Nayak, Shi, Kumar and Miller22] and [Reference Grunspan and Pérez-Marco14, Reference Grunspan and Pérez-Marco15] for the analysis of these Stubborn mining strategies.

3.3. Honesty is the best policy

In this section, we consider the case where the Bitcoin difficulty adjustment formula has been modified, and we theoretically demonstrate that the best strategy is still the honest strategy when $\gamma = 0$. The general result without assumptions on $\gamma$ has also been proven in [Reference Grunspan and Pérez-Marco16].

A new difficulty adjustment formula. Today, the nodes in the Bitcoin network do not transmit orphaned blocks. But let us imagine that they could. Let us even imagine that miners are incentivized to do so by modifying the rule that governs the official blockchain. In the event of a tie between two blockchains, we should select the one that contains the most proof of work, taking into account orphaned blocks as well, provided that they have an ancestor in the considered blockchain. This would be a kind of reinforcement of Satoshi Nakamoto’s rule. In case of another tie between two blockchains with the same characteristics, a node would select, as it does today, the one that was transmitted to it first.

In this case, the key quantity for comparing two mining strategies would no longer be $\frac{\mathbb{E}[R]}{\mathbb{E}[H]}$ but $\frac{\mathbb{E}[R]}{\mathbb{E}[D]}$ where $R$ would still represent the number of blocks added to the official blockchain by the miner during a cycle and $D$ would represent the progression of the so-called difficulty function during the same period (authors in [Reference Bar-Zur, Eyal and Tamar2] introduced the concept of “difficulty contribution” and denoted by $D_t$ in their article). We would have $D=H+U$ where $U$ is the number of recorded orphan blocks mined during a cycle. The effect of the new difficulty adjustment algorithm would be to impose a duration of $10$ minutes on each progression of the difficulty function (instead of the height function as now). As before, this would mean that a mining strategy would be more profitable than the honest strategy if and only if $\mathbb{E}[R-q D] \gt 0$ where $D$ here represents the progression of the difficulty function over a cycle, which leads us to the third game $HT^3(0,0)$.

• A coin toss by the dealer is equivalent to the discovery of a block by the honest miner or miners.
• The Toss action is equivalent to the miner choosing to mine secretly and wait for a block to be discovered. If the result is Heads (probability $p$), then the difficulty function $D$ advances by $1$, resulting in a cost $q$ paid by the player in this case. If, on the other hand, the result is Tails (this event occurs with probability $q$), the block discovered by the miner is kept secret. Hence, a zero cost since the function $D$ has not increased.
• The Crush action means that the miner replaces the last $h$ blocks of the official blockchain with his own. For this action to be possible, the miner must reveal one more block ( $h+1$ in all), which increases the height of the official blockchain by $1$ and also orphaned the last $h$ blocks of the official blockchain. The miner then gains the reward contained in $h+1$ blocks, and at the same time, the difficulty function $D$ increases by $h+1$. Hence, a net gain of $h+1-q(h+1)=p(h+1)$.
• The Abandon action is equivalent to the miner dropping his secret fork and returning to mine on the last block of the official blockchain. He neither gains nor loses anything with this action, as the difficulty function remains unchanged.

Proposition 3.4. When the production of orphan blocks by honest miners is incorporated into the difficulty adjustment formula, the optimal strategy remains the honest one originally advocated by Satoshi Nakamoto.

This result coincides with [Reference Grunspan and Pérez-Marco16], which was obtained in the more general case (for any $\gamma$, not necessarily equal to $0$). Let us recall here the conditions so that the attacker’s activity can be modeled using this third game:

(1) the attacker’s connectivity is zero.
(2) All orphaned blocks mined by honest miners are reported and recorded in the blockchain (no assumption is made about orphaned blocks mined by the attacker; in particular, it is not assumed that they are reported and recorded in the official blockchain).
(3) A difficulty adjustment is regularly made to take into account the production of orphan blocks recorded in the official blockchain.

In particular, if for some technical reason the orphan blocks mined by honest miners cannot all be recorded and reported in the official blockchain, the conclusion of the last result may no longer be valid [Reference Sarenche, Zhang, Nikova and Preneel25].

Acknowledgements

The authors would like to thank the Associate Editor and the referee for their careful reading and valuable suggestions, which helped to improve both the presentation and the content of the paper. They would also like to thank Martino Grasselli and Joris Van der Hoeven for valuable discussions.

Competing interests

The authors have no competing interests to declare.

References

Antonopoulos, A. (2014). Mastering Bitcoin: Unlocking Digital Cryptocurrencies. Sebastopol, CA, USA: O’Reilly Media, Inc.Google Scholar

Bar-Zur, R., Eyal, I., & Tamar, A. (2020). Efficient MDP analysis for selfish-mining in blockchains. Proceedings of the 2nd ACM Conference on Advances in Financial Technologies, pp. 113–131.CrossRef Google Scholar

Bertucci, C., Bertucci, L., Lasry, J.M., & Lions, P.L. (2024). A mean field game approach to bitcoin mining. SIAM Journal on Financial Mathematics 15(3): 960–987.CrossRef Google Scholar

Bitcoin Stackexchange. What are orphaned and stale. https://bitcoin.stackexchange.com/questions/5859/what-are-orphaned-and-stale-blocks.Google Scholar

Bitcoin Wiki. Orphan Block. https://en.bitcoin.it/wiki/Orphan_Block.Google Scholar

Brown, M., Peköz, E., & Ross, S. (2021). Blockchain double-spend attack duration. Probability in the Engineering and Informational Sciences 35(4): 858–866.CrossRef Google Scholar

Cormen, T.H., Leiserson, C.E., Rivest, R.L., & Stein, C. (2009). Introduction to algorithms, 3rd edn. Cambridge, MA, USA: MIT Press.Google Scholar

Decentralized Systems and Network Services Research Group - KASTEL. Bitcoin Monitoring. https://www.dsn.kastel.kit.edu/bitcoin.Google Scholar

Eyal, I., & Sirer, E. (2014). Majority is not enough: bitcoin mining is vulnerable. International Conference on Financial Cryptography and Data Security 8439: 436–454.Google Scholar

Filar, T., & Vrieze, K. (1997). Competitive markov decision processes. New York, NY, USA: Springer.Google Scholar

Georgiadis, E., & Zeilberger, D. (2019). A combinatorial-probabilistic analysis of Bitcoin attacks. Journal of Difference Equations and Applications 25: 56–63.CrossRef Google Scholar

Grunspan, C., & Pérez-Marco, R. (2018). Double spend races. International Journal of Theoretical and Applied Finance 21(8): 1850053.CrossRef Google Scholar

Grunspan, C., & Pérez-Marco, R. (2018). On profitability of selfish mining, A. rXiv:1805.08281.Google Scholar

Grunspan, C., & Pérez-Marco, R. (2018). On profitability of trailing mining, A. rXiv:1811.09322.Google Scholar

Grunspan, C., & Pérez-Marco, R. (2020). The mathematics of Bitcoin. Newsletter of the European Mathematical Society Newsletter 115: 31–37.Google Scholar

Grunspan, C., & Pérez-Marco, R. (2025). Block withholding resilience. Digital Finance 7 (1): 43–60.CrossRef Google Scholar

Kiayias, A., Koutsoupias, E., Kyropoulou, M., & Tselekounis, Y. (2016). Blockchain mining games. Proceedings of the 2016 ACM Conference on Economics and Computation, pp. 365–382.CrossRef Google Scholar

Koutsoupias, E.. (2019). Talk by Elias Koutsoupias at ECE TUC (see between 28’ and 30’). https://www.youtube.com/watch?v=K6iTBLhsFA0.Google Scholar

Lamport, L., Shostak, R., & Pease, M. (1982). The byzantine generals problem. ACM Transactions on Programming Languages and Systems 4(3): 382–401.CrossRef Google Scholar

Mitsuhamizu. A Python implementation for solving the MDP in Optimal selfish mining. https://github.com/Mitsuhamizu/Optimal-selfish-mining.Google Scholar

Nakamoto, S. (2008). Bitcoin: A Peer-To-Peer Electronic Cash System. Bitcoin.org/bitcoin.pdf.Google Scholar

Nayak, K., Shi, E., Kumar, S., & Miller, A. (2016). Stubborn mining: generalizing selfish mining and combining with an eclipse attack. IEEE European Symp. Security and Privacy, pp. 305–320.CrossRef Google Scholar

Rosenfeld, M. (2009). Analysis of hashrate-based double spending. arXiv:1402.2009.Google Scholar

Sapirshtein, A., Sompolinsky, Y., & Zohar, A. (2017). Optimal selfish mining strategies in Bitcoin. International Conference on Financial Cryptography and Data Security, pp. 515–532.CrossRef Google Scholar

Sarenche, R., Zhang, R., Nikova, S., & Preneel, B. (2024). Time-averaged analysis of selfish mining in bitcoin: is orphan reporting an effective countermeasure? IEEE Transactions on Information Forensics and Security 20: 449–464.CrossRef Google Scholar

Warren, M. (2023). Bitcoin: A game-theoretic analysis. Berlin, Germany: De Gruyter Graduate.CrossRef Google Scholar

Table 1. Optimal strategy for the game $E^1(1,2)$ when $q=0.45$.

Table 2. Optimal strategy for $q \gt 0.329393$. The parameter $a$ (resp. $h$) is represented horizontally (resp. vertically) from $0$ to $8$.

Article contents

Arbitrable stochastic games: three variations of the heads-or-tails game for bitcoin

Abstract

Keywords

MSC classification

Information

1. Introduction

1.1. General introduction

1.2 Arbitrable stochastic game

Definition 1.1 (Non-competitive stochastic game)

2. Three examples of the coin toss game

2.1. Classic coin toss game with chips

2.2. Optimal strategy

2.3. Modified coin toss game with chips. Second version

2.4. The optimal strategy

2.5. A third coin toss game

3. Application to bitcoin

3.1. Temporarily Byzantine by “force of circumstance”

Definition 3.1 (Sub-game)

3.2. To be or not to be totally Byzantine?

3.3. Honesty is the best policy

Acknowledgements

Competing interests

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests