Hostname: page-component-89b8bd64d-x2lbr Total loading time: 0 Render date: 2026-05-08T04:21:12.017Z Has data issue: false hasContentIssue false

General distributions of number representation elements

Published online by Cambridge University Press:  07 February 2024

Félix Balado*
Affiliation:
School of Computer Science, University College Dublin, Dublin, Ireland
Guénolé C. M. Silvestre
Affiliation:
School of Computer Science, University College Dublin, Dublin, Ireland
*
Corresponding author: F. Balado; Email: felix@ucd.ie
Rights & Permissions [Opens in a new window]

Abstract

We provide general expressions for the joint distributions of the k most significant b-ary digits and of the k leading continued fraction (CF) coefficients of outcomes of arbitrary continuous random variables. Our analysis highlights the connections between the two problems. In particular, we give the general convergence law of the distribution of the jth significant digit, which is the counterpart of the general convergence law of the distribution of the jth CF coefficient (Gauss-Kuz’min law). We also particularise our general results for Benford and Pareto random variables. The former particularisation allows us to show the central role played by Benford variables in the asymptotics of the general expressions, among several other results, including the analogue of Benford’s law for CFs. The particularisation for Pareto variables—which include Benford variables as a special case—is especially relevant in the context of pervasive scale-invariant phenomena, where Pareto variables occur much more frequently than Benford variables. This suggests that the Pareto expressions that we produce have wider applicability than their Benford counterparts in modelling most significant digits and leading CF coefficients of real data. Our results may find practical application in all areas where Benford’s law has been previously used.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (http://creativecommons.org/licenses/by-nc-nd/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided that no alterations are made and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use and/or adaptation of the article.
Copyright
© The Author(s), 2024. Published by Cambridge University Press.
Figure 0

Figure 1. Illustration of the asymptotic sum-invariance property of a Benford variable for b = 10.

Figure 1

Figure 2. Theoretical distribution of the two most significant decimal digits of Pareto X (4.24) versus theoretical Benford-based asymptotic approximation (4.20). The lines join adjacent probability mass points for clarity.

Figure 2

Figure 3. Theoretical joint pmf of the first two CF coefficients of $\log_{10}X$ for Pareto X with s= 1 and ρ = 0.3 [solid lines, (4.27)] versus theoretical Benford-based asymptotic approximation [dashed lines, (4.21)]. The lines join adjacent probability mass points corresponding to equal a2 for clarity.

Figure 3

Figure 4. Distributions of the most significant decimal digit of X. The theoretical pmf’s (solid and dashed lines) are (4.2) and (4.24), and the empirical frequencies (symbols) correspond to $p=10^7$ pseudorandom outcomes in each case.

Figure 4

Figure 5. Distributions of the two most significant decimal digits of X. The theoretical pmf’s (solid and dotted lines) are (4.2) and (4.24), and the empirical frequencies (symbols) correspond to $p=10^7$ pseudorandom outcomes in each case.

Figure 5

Figure 6. Distributions of the most significant decimal digit of X for real Benfordian and Paretian datasets. The theoretical pmf’s (solid lines) are (4.2) and (4.24), and the empirical frequencies (symbols) are joined through dotted lines for clarity.

Figure 6

Figure 7. Distributions of the jth most significant decimal digit of X. The theoretical pmf’s (solid and dashed lines) are (4.5) and (4.25), and the empirical frequencies (symbols) correspond to $p=5\times 10^7$ pseudorandom outcomes in each case.

Figure 7

Figure 8. Joint distribution of the first two CF coefficients of $\log_{10} X$ for Benford X. The theoretical joint pmf (dashed lines) is (4.7) and the empirical frequencies (symbols) correspond to $p=10^8$ pseudorandom outcomes.

Figure 8

Figure 9. Joint distribution of the first two CF coefficients of $\log_{10} X$ for Pareto X, s = 1.5,ρ = 0.48. The theoretical joint pmf (dashed lines) is (4.27) and the empirical frequencies (symbols) correspond to $p=10^8$ pseudorandom outcomes.

Figure 9

Figure 10. Distributions of the jth CF coefficient of $\log_{10} X$. The theoretical pmf’s (solid, dashed, dash-dotted and dotted lines) are (3.11)(4.10)(4.11) and (4.27) [with k = 1]. The empirical frequencies (symbols) correspond to $p=10^8$ pseudorandom outcomes in each of the cases.

Figure 10

Figure 11. Joint distribution of the first two CF coefficients of $\log_{10} X$ for a real Benfordian dataset (US total income per ZIP code, National Bureau of Economic Research, 2016, $p=159,928$). The theoretical joint pmf (dashed lines) is (4.7), whereas the symbols represent empirical frequencies.

Figure 11

Figure 12. Joint distribution of the first two CF coefficients of $\log_{10} X$ for a real Paretian dataset (diameter of Lunar craters ≥1 km, $p=1,296,796$ [31]). The theoretical joint pmf (dashed lines) is (4.27), driven by $\hat{s}=1.59$ and $\hat{\rho}=0.00$, whereas the symbols represent empirical frequencies.

Figure 12

Figure 13. Mappings between the range of $\{\log_{b} x\}$ and the supports of a) $A_{(1)}$ and b) A1.