Hostname: page-component-8448b6f56d-gtxcr Total loading time: 0 Render date: 2024-04-18T20:41:15.770Z Has data issue: false hasContentIssue false

Path reversal, islands, and the gapped alignment of random sequences

Published online by Cambridge University Press:  14 July 2016

John L. Spouge*
Affiliation:
National Library of Medicine
*
Postal address: National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD 20894, USA. Email address: spouge@ncbi.nlm.nih.gov

Abstract

In bioinformatics, the notion of an ‘island’ enhances the efficient simulation of gapped local alignment statistics. This paper generalizes several results relevant to gapless local alignment statistics from one to higher dimensions, with a particular eye to applications in gapped alignment statistics. For example, reversal of paths (rather than of discrete time) generalizes a distributional equality, from queueing theory, between the Lindley (local sum) and maximum processes. Systematic investigation of an ‘ownership’ relationship among vertices in ℤ2 formalizes the notion of an island as a set of vertices having a common owner. Predictably, islands possess some stochastic ordering and spatial averaging properties. Moreover, however, the average number of vertices in a subcritical stationary island is 1, generalizing a theorem of Kac about stationary point processes. The generalization leads to alternative ways of simulating some island statistics.

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 2004 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Altschul, S. F., Bundschuh, R., Olsen, R., and Hwa, T. (2001). The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res. 29, 351361.Google Scholar
Altschul, S. F. et al. (1990). Basic local alignment search tool. J. Molec. Biol. 215, 403410.Google Scholar
Altschul, S. F. et al. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 33893402.CrossRefGoogle ScholarPubMed
Arratia, R., and Waterman, M. S. (1985). Critical phenomena in sequence matching. Ann. Prob. 13, 12361249.Google Scholar
Arratia, R., and Waterman, M. S. (1994). A phase transition for the score in matching random sequences allowing deletions. Ann. Appl. Prob. 4, 200225.Google Scholar
Asmussen, S. (1987). Applied Probability and Queues (Wiley Ser. Probab. Math. Statist. Appl. Probab. Statist.). John Wiley, New York.Google Scholar
Breiman, L. (1992). Probability. SIAM, Philadelphia, PA.Google Scholar
Bundschuh, R. (2002). Asymmetric exclusion process and extremal statistics of random sequences. Phys. Rev. E 65, 031911.Google Scholar
Bundschuh, R. (2002). Rapid significance estimation in local sequence alignment with gaps. J. Comput. Biol. 9, 243260.Google Scholar
Bundschuh, R., and Hwa, T. (2000). An analytic study of the phase transition line in local sequence alignment with gaps. Discrete Appl. Math. 104, 113142.Google Scholar
Dembo, A., and Karlin, S. (1993). Central limit theorems of partial sums for large segmental values. Stoch. Process. Appl. 45, 259271.Google Scholar
Dembo, A., Karlin, S., and Zeitouni, O. (1994). Limit distributions of maximal non-aligned two-sequence segmental score. Ann. Prob. 22, 20222039.Google Scholar
Dunford, N., and Schwartz, J. T. (1958). Linear Operators. I. General theory. Interscience, New York.Google Scholar
Durrett, R. (1984). Oriented percolation in two dimensions. Ann. Prob. 12, 9991040.Google Scholar
Kac, M. (1947). On the notion of recurrence in discrete stochastic processes. Bull. Amer. Math. Soc. 53, 10021010.Google Scholar
Karlin, S., and Altschul, S. F. (1990). Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc. Nat. Acad. Sci. USA 87, 22642268.Google Scholar
Karlin, S., and Dembo, A. (1992). Limit distributions of maximal segmental score among Markov-dependent partial-sums. Adv. Appl. Prob. 24, 113140.Google Scholar
Karlin, S., and Taylor, H. M. (1975). A First Course in Stochastic Processes. Academic Press, New York.Google Scholar
Mott, R., and Tribe, R. (1999). Approximate statistics of gapped alignments. J. Comput. Biol. 6, 91112.Google Scholar
Needleman, S. B., and Wunsch, C. D. (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Molec. Biol. 48, 443453.Google Scholar
Olsen, R., Bundschuh, R., and Hwa, T. (1999). Rapid assessment of extremal statistics for local alignment with gaps. In Proc. Seventh Internat. Conf. Intelligent Systems Molec. Biol., eds Lengauer, T. et al., AAAI Press, Menlo Park, CA, pp. 211222.Google Scholar
Schuler, G. D., Altschul, S. F., and Lipman, D. J. (1991). A workbench for multiple alignment construction and analysis. Proteins: Structure, Function, and Genetics 9, 180190.Google Scholar
Siegmund, D., and Yakir, B. (2000). Approximate p-values for local sequence alignments. Ann. Statist. 28, 657680. (Correction: 31 (2003), 1027–1031.)Google Scholar
Smith, T. F., and Waterman, M. S. (1981). Identification of common molecular subsequences. J. Mol. Biol. 147, 195197.Google Scholar
Storey, J. D., and Siegmund, D. (2001). Approximate p-values for local sequence alignments: numerical studies. J. Comput. Biol. 8, 549556.Google Scholar
Tempel′man, A. A. (1972). Ergodic theorems for general dynamical systems. Trans. Moscow Math. Soc. 26, 94132 (in Russian).Google Scholar
Waterman, M. S., and Vingron, M. (1994). Rapid and accurate estimates of statistical significance for sequence data base searches. Proc. Nat. Acad. Sci. USA 91, 46254628.Google Scholar
Waterman, M. S., and Vingron, M. (1994). Sequence comparison significance and Poisson approximation. Statist. Sci. 9, 367381.CrossRefGoogle Scholar
Waterman, M. S., Gordon, L., and Arratia, R. (1987). Phase transitions in sequence matches and nucleic acid structure. Proc. Nat. Acad. Sci. 84, 12391243.Google Scholar
Williams, D. (1997). Probability with Martingales. Cambridge University Press.Google Scholar
Yu, Y. K., Bundschuh, R., and Hwa, T. (2002). Hybrid alignment: high-performance with universal statistics. Bioinformatics 18, 864872.Google Scholar