Hostname: page-component-76fb5796d-x4r87 Total loading time: 0 Render date: 2024-04-26T22:26:16.831Z Has data issue: false hasContentIssue false

Checkpointing for the RESTART problem in Markov networks

Published online by Cambridge University Press:  14 July 2016

Lester Lipsky
Affiliation:
University of Connecticut, Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road, Storrs, CT 06269-2155, USA. Email address: lester@engr.uconn.edu
Derek Doran
Affiliation:
University of Connecticut, Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road, Storrs, CT 06269-2155, USA. Email address: lester@engr.uconn.edu
Swapna Gokhale
Affiliation:
University of Connecticut, Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Road, Storrs, CT 06269-2155, USA. Email address: lester@engr.uconn.edu
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We apply the known formulae of the RESTART problem to Markov models of software (and many other) systems, and derive new equations. We show how checkpoints might be included, with their resultant performance under RESTART. The result is a complete procedure for finding the mean, variance, and tail behavior of the job completion time as a function of the failure rate. We also provide a detailed example.

Type
Part 4. Simulation
Copyright
Copyright © Applied Probability Trust 2011 

References

[1] Asmussen, S., (2003). Applied Probability and Queues, 2nd edn. Springer, New York.Google Scholar
[2] Asmussen, S. and Albrecher, H., (2010). Ruin Probabilities, 2nd edn. World Scientific, River Edge, NJ.Google Scholar
[3] Asmussen, S. et al., (2008). Asymptotic behavior of total times for jobs that must start over if failure occurs. Math. Operat. Res. 33, 932944.CrossRefGoogle Scholar
[4] Cheung, R. C., (1980). A user-oriented software reliability model. IEEE Trans. Software Eng. 6, 118125.Google Scholar
[5] Chimento, P. F. Jr. and Trivedi, K. S., (1993). The completion time of programs on processors subject to failure and repair. IEEE Trans. Comput. 42, 11841194.Google Scholar
[6] Gokhale, S. S. and Trivedi, K. S., (2006). Analytical models for architecture-based software reliability analysis: a unification framework. IEEE Trans. Reliab. 55, 281292.Google Scholar
[7] Gut, A., (1988). Stopped Random Walks. Springer, New York.Google Scholar
[8] Jelenković, P. R. and Tan, J., (2007). Can retransmissions of superexponential documents cause subexponential delays? In Proc. INFOCOM '07 (Anchorage, 2007), pp. 892900.CrossRefGoogle Scholar
[9] Kulkarni, V., Nicola, V. and Trivedi, K., (1986). On modeling the performance and reliability of multimode systems. J. Systems Software 6, 175183.CrossRefGoogle Scholar
[10] Lipsky, L., (2009). Queueing Theory: A Linear Algebraic Approach, 2nd edn. Springer, New York.Google Scholar
[11] Littlewood, B., (1975). A reliability model for Markov structured software. In Proc. Internat. Conf. Reliab. Software (Los Angeles, 1975), Association for Computing Machinery, New York, pp. 204207.Google Scholar
[12] Neuts, M. F., (1981). Matrix-Geometric Solutions in Stochastic Models. Johns Hopkins University Press, Baltimore, MD.Google Scholar
[13] Sheahan, R., Lipsky, L., Fiorini, P. and Asmussen, S., (2006). On the completion time distribution for tasks that must restart from the beginning if a failure occurs. In ACM SIGMETRICS Performance Evaluation Review, Association for Computing Machinery, New York, pp. 2426. (Expanded version: (2007), Tech. Rep.)Google Scholar
[14] Sholl, H. A. and Booth, T. L., (1975). Software performance modeling using computation structures. IEEE Trans. Software Eng. 1, 414420.Google Scholar
[15] Van de Liefvoort, A. H. A., (1982). An algebraic approach to the steady-state solution of G/G/1//N-type loops. , University of Nebraska, Lincoln.Google Scholar