Hostname: page-component-8448b6f56d-tj2md Total loading time: 0 Render date: 2024-04-23T13:58:25.268Z Has data issue: false hasContentIssue false

Gaussian Distribution of Trie Depth for Strongly Tame Sources

Published online by Cambridge University Press:  11 December 2014

EDA CESARATTO
Affiliation:
CONICET and Instituto de Desarrollo Humano, Universidad Nacional de General Sarmiento, Buenos Aires, Argentina (e-mail: ecesarat@ungs.edu.ar)
BRIGITTE VALLÉE
Affiliation:
Laboratoire GREYC, Université de Caen/ENSICAEN/CNRS, F-14032 Caen, France (e-mail: brigitte.vallee@unicaen.fr)

Abstract

The depth of a trie has been deeply studied when the source which produces the words is a simple source (a memoryless source or a Markov chain). When a source is simple but not an unbiased memoryless source, the expectation and the variance are both of logarithmic order and their dominant terms involve characteristic objects of the source, for instance the entropy. Moreover, there is an asymptotic Gaussian law, even though the speed of convergence towards the Gaussian law has not yet been precisely estimated. The present paper describes a ‘natural’ class of general sources, which does not contain any simple source, where the depth of a random trie, built on a set of words independently drawn from the source, has the same type of probabilistic behaviour as for simple sources: the expectation and the variance are both of logarithmic order and there is an asymptotic Gaussian law. There are precise asymptotic expansions for the expectation and the variance, and the speed of convergence toward the Gaussian law is optimal. The paper first provides analytical conditions on the Dirichlet series of probabilities of a general source under which this Gaussian law can be derived: a pole-free region where the series is of polynomial growth. In a second step, the paper focuses on sources associated with dynamical systems, called dynamical sources, where the Dirichlet series of probabilities is expressed with the transfer operator of the dynamical system. Then, the paper extends results due to Dolgopyat, already generalized by Baladi and Vallée, and shows that the previous analytical conditions are fulfilled for ‘most’ dynamical sources, provided that they ‘strongly differ’ from simple sources. Finally, the present paper describes a class of sources not containing any simple source, where the trie depth has the same type of probabilistic behaviour as for simple sources, even with more precise estimates.

Type
Paper
Copyright
Copyright © Cambridge University Press 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Baladi, V. (2000) Positive Transfer Operators and Decay of Correlations, Advanced Series in Nonlinear Dynamics, World Scientific.Google Scholar
[2] Baladi, V. and Vallée, B. (2005) Euclidean algorithms are Gaussian. J. Number Theory 110 331386.Google Scholar
[3] Bourdon, J. (2001) Size and path length of Patricia tries: Dynamical sources context. Random Struct. Alg. 19 289315.Google Scholar
[4] Broise, A. (1996) Transformations dilatantes de l'intervalle et théorèmes limites. Astérisque 238 5109.Google Scholar
[5] Cesaratto, E. and Vallée, B. (2007) Distribution of the average external depth for tries in dynamical sources context. In Proc. Logic Computability and Randomness 2007, pp. 3334.Google Scholar
[6] Chazal, F., Maume-Deschamps, V. and Vallée, B. (2004) Erratum to ‘Dynamical sources in information theory: Fundamental intervals and word prefixes’. Algorithmica 38 591596.Google Scholar
[7] Clément, J., Fill, J. A., Nguyen Thi, T. and Vallée, B. Towards a realistic analysis of the QuickSelect algorithm. In Theory of Computing Systems, special issue for STACS 2013, to appear.Google Scholar
[8] Clément, J., Flajolet, P. and Vallée, B. (2001) Dynamical sources in information theory: A general analysis of trie structures. Algorithmica 29 307369.Google Scholar
[9] Clément, J., Nguyen Thi, T. H. and Vallée, B. Towards a realistic analysis of some popular sorting algorithms. Combin. Probab. Comput. Google Scholar
[10] Dolgopyat, D. (1998) On decay of correlations in Anosov flows. Ann. of Math. 147 357390.Google Scholar
[11] Dolgopyat, D. (1998) Prevalence of rapid mixing in hyperbolic flows. Ergod. Theory Dynam. Systems 18 10971114.Google Scholar
[12] Fayolle, G., Flajolet, P. and Hofri, M. (1986) On a functional equation arising in the analysis of a protocol for a multiaccess broadcast channel. Adv. Appl. Probab. 18 441472.Google Scholar
[13] Flajolet, P. (2006) The ubiquitous digital tree. In Proc. 23rd Annual Symposium on Theoretical Aspects of Computer Science: STACS 2006, Vol. 3884 of Lecture Notes in Computer Science, Springer, pp. 122.Google Scholar
[14] Flajolet, P., Roux, M. and Vallée, B. (2010) Digital trees and memoryless sources: From arithmetics to analysis. In Proc. AofA'10, DMTCS Proc. AM, pp. 231258.Google Scholar
[15] Flajolet, P. and Sedgewick, R. (1986) Digital search trees revisited. SIAM J. Comput 15 748767.Google Scholar
[16] Flajolet, P. and Sedgewick, R. (1995) Mellin transforms and asymptotics: Finite differences and Rice's integrals. Theoret. Comput. Sci. 144 101124.Google Scholar
[17] Flajolet, P. and Vallée, B. (2000) Continued fractions, comparison algorithms, and fine structure constants. In Constructive, Experimental, and Nonlinear Analysis (Thera, M., ed.), Vol. 27 of CMS Conference Proceedings, Canadian Mathematical Society, pp. 5582.Google Scholar
[18] Hennion, H. (1993) Sur un théorème spectral et son application aux noyaux lipchitziens. Proc. Amer. Math. Soc 118 627634.Google Scholar
[19] Hun, K. (2014) Analysis of depth of digital trees built on general sources. PhD thesis, University of Caen.Google Scholar
[20] Hun, K. and Vallée, B. (2014) Typical depth of a digital search tree built on a general source. In Proc. ANALCO'14, SIAM, pp. 115.Google Scholar
[21] Hwang, H. (1998) On convergence rates in the central limit theorems for combinatorial structures. European J. Combin. 19 329343.Google Scholar
[22] Jacquet, P. and Régnier, M. (1986) Trie partitioning process: Limiting distributions. In Proc. 11th Colloquium on Trees in Algebra and Programming: CAAP '86 (P. Franchi-Zannettacci, ed.), Vol. 214 of Lecture Notes in Computer Science, Springer, pp. 196210.Google Scholar
[23] Jacquet, P. and Szpankowski, W. (1991) Analysis of digital tries with Markovian dependency. IEEE Trans. Inform. Theory 37 14701475.Google Scholar
[24] Jacquet, P. and Szpankowski, W. (1995) Asymptotic behavior of the Lempel–Ziv parsing scheme and digital search trees. Theoret. Comput. Sci. 144 161197.Google Scholar
[25] Jacquet, P., Szpankowski, W. and Tang, J. (2001) Average profile of the Lempel–Ziv parsing scheme for a Markovian source. Algorithmica 31 318360.Google Scholar
[26] Kato, T. (1980) Perturbation Theory for Linear Operators, Springer.Google Scholar
[27] Knuth, D. E. (1998) The Art of Computer Programming: Sorting and Searching, Vol. 3, third edition, Addison-Wesley.Google Scholar
[28] Lapidus, M. and van Frankenhuijsen, M. (2006) Fractal Geometry, Complex Dimensions and Zeta Functions: Geometry and Spectra of Fractal Strings, Springer.Google Scholar
[29] Louchard, G. and Szpankowski, W. (1995) Average profile and limiting distribution for a phrase size in the Lempel–Ziv parsing algorithm. IEEE Trans. Inform. Theory 41 478488.Google Scholar
[30] Louchard, G. and Szpankowski, W. (1997) On the average redundancy rate of the Lempel–Ziv code. IEEE Trans. Inform. Theory 43 28.Google Scholar
[31] Nörlund, N. E. (1929) Leçons sur les équations linéaires aux différences finies. In Collection de Monographies sur la Théorie des Fonctions, Gauthier-Villars.Google Scholar
[32] Nörlund, N. E. (1954) Vorlesungen über Differenzenrechnung, Chelsea Publishing Company.Google Scholar
[33] Roux, M. (2011) Séries de Dirichlet, théorie de l'information, et analyse d'algorithmes. PhD thesis, University of Caen.Google Scholar
[34] Roux, M. and Vallée, B. (2011) Information theory: Sources, Dirichlet series, and realistic analyses of data structures. In Proc. 8th International Conference: WORDS 2011, Vol. 63 of Electronic Proceedings in Theoretical Computer Science, pp. 199214.Google Scholar
[35] Ruelle, D. (1978) Thermodynamic Formalism, Addison-Wesley.Google Scholar
[36] Schachinger, W. (2000) Limiting distributions for the costs of partial match retrievals in multidimensional tries. Random Struct. Alg. 17 428459.Google Scholar
[37] Szpankowski, W. (2001) Average Case Analysis of Algorithms on Sequences, Wiley.Google Scholar
[38] Vallée, B. (1997) Opérateurs de Ruelle–Mayer généralisés et analyse en moyenne des algorithmes de Gauss et d'Euclide. Acta Arith. 81 101144.Google Scholar
[39] Vallée, B. (2001) Dynamical sources in information theory: Fundamental intervals and word prefixes. Algorithmica 29 262306.Google Scholar
[40] Vallée, B., Clément, J., Fill, J. A. and Flajolet, P. (2009) The number of symbol comparisons in QuickSort and QuickSelect. In Proc. ICALP 2009, part I, Vol. 5555 of Lecture Notes in Computer Science, Springer, pp. 750763.Google Scholar