Hostname: page-component-848d4c4894-75dct Total loading time: 0 Render date: 2024-06-04T04:21:06.076Z Has data issue: false hasContentIssue false

Twenty Questions with Noise: Bayes Optimal Policies for Entropy Loss

Published online by Cambridge University Press:  04 February 2016

Bruno Jedynak*
Affiliation:
Johns Hopkins University
Peter I. Frazier*
Affiliation:
Cornell University
Raphael Sznitman*
Affiliation:
Johns Hopkins University
*
Postal address: Department of Applied Mathematics and Statistics, Johns Hopkins University, Whitehead 208-B, 3400 North Charles Street, Baltimore, MD 21218, USA. Email address: bruno.jedynak@jhu.edu
∗∗ Postal address: School of Operations Research and Industrial Engineering, Cornell University, 232 Rhodes Hall, Ithaca, NY 14850, USA.
∗∗∗ Postal address: Johns Hopkins University, Hackerman Hall, 3400 North Charles Street, Baltimore, MD 21218, USA.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We consider the problem of twenty questions with noisy answers, in which we seek to find a target by repeatedly choosing a set, asking an oracle whether the target lies in this set, and obtaining an answer corrupted by noise. Starting with a prior distribution on the target's location, we seek to minimize the expected entropy of the posterior distribution. We formulate this problem as a dynamic program and show that any policy optimizing the one-step expected reduction in entropy is also optimal over the full horizon. Two such Bayes optimal policies are presented: one generalizes the probabilistic bisection policy due to Horstein and the other asks a deterministic set of questions. We study the structural properties of the latter, and illustrate its use in a computer vision application.

Type
Research Article
Copyright
© Applied Probability Trust 

Footnotes

Research supported in part by AFOSRYIP FA9550-11-1-0083.

Research supported in part by NIH grant R01 EB 007969-01.

References

Ben-Or, M. and Hassidim, A. (2008). The Bayesian learner is optimal for noisy binary search (and pretty good for quantum as well). In 2008 49th Ann. IEEE Symp. Foundations of Computer Science. IEEE Computer Society Press, Washington, DC, pp. 221230.Google Scholar
Berry, D. A. and Fristedt, B. (1985). Bandit Problems. Chapman & Hall, London.CrossRefGoogle Scholar
Blum, J. R. (1954). Multidimensional stochastic approximation methods. Ann. Math. Statist. 25, 737744.CrossRefGoogle Scholar
Burnašhev, M. V. and Zigangirov, K. Š. (1974). A certain problem of interval estimation in observation control. Problemy Peredachi Informatsii 10, 5161.Google Scholar
Castro, R. and Nowak, R. (2008). Active learning and sampling. In Foundations and Applications of Sensor Management, Springer, pp. 177200.CrossRefGoogle Scholar
Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. John Wiley, New York.Google Scholar
DeGroot, M. H. (1970). Optimal Statistical Decisions. McGraw Hill, New York.Google Scholar
Dynkin, E. B. and Yushkevich, A. A. (1979). Controlled Markov Processes. Springer, New York.CrossRefGoogle Scholar
Frazier, P. I., Powell, W. B. and Dayanik, S. (2008). A knowledge-gradient policy for sequential information collection. SIAM J. Control Optimization 47, 24102439.CrossRefGoogle Scholar
Geman, D. and Jedynak, B. (1996). An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Anal. Machine Intelligence 18, 114.CrossRefGoogle Scholar
Gittins, J. C. (1989). Multi-Armed Bandit Allocation Indices. John Wiley, Chichester.Google Scholar
Horstein, M. (1963). Sequential decoding using noiseless feedback. IEEE Trans. Inf. Theory 9, 136143.CrossRefGoogle Scholar
Horstein, M. (2002). Sequential transmission using noiseless feedback. IEEE Trans. Inf. Theory 9, 136143.CrossRefGoogle Scholar
Karp, R. M. and Kleinberg, R. (2007). Noisy binary search and its applications. In Proc. 18th Ann. ACM-SIAM Symp. Discrete Algorithms, ACM, New York, pp. 881890.Google Scholar
Kushner, H. J. and Yin, G. G. (2003). Stochastic Approximation and Recursive Algorithms and Applications, 2nd edn. Springer, New York.Google Scholar
Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6, 422.CrossRefGoogle Scholar
Lampert, C. H., Blaschko, M. B. and Hofmann, T. (2009). Efficient subwindow search: a branch and bound framework for object localization. IEEE Trans. Pattern Anal. Machine Intelligence 31, 21292142.CrossRefGoogle ScholarPubMed
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. Internat. J. Comput. Vision 60, 91110.CrossRefGoogle Scholar
Nowak, R. (2008). Generalized binary search. In 2008 46th Ann. Allerton Conf. Commun., Control, and Computing, pp. 568574.CrossRefGoogle Scholar
Nowak, R. (2009). Noisy generalized binary search. Adv. Neural Inf. Processing Systems 22, 13661374.Google Scholar
Pelc, A. (2002). Searching games with errors—fifty years of coping with liars. Theoret. Comput. Sci. 270, 71109.CrossRefGoogle Scholar
Polyak, B. T. (1990). A new method of stochastic approximation type. Automat. Remote Control 51, 937946.Google Scholar
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58, 527535.CrossRefGoogle Scholar
Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann. Math. Satist. 22, 400407.CrossRefGoogle Scholar
Ruppert, D. (1988). Efficient estimators from a slowly convergent Robbins-Monro procedure. Tech. Rep. 781, School of Operations Research and Industrial Engineering, Cornell University.Google Scholar
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning 5, 197227.CrossRefGoogle Scholar
Sznitman, R. and Jedynak, B. (2010). Active testing for face detection and localization. IEEE Trans. Pattern Anal. Machine Intelligence 32, 19141914.CrossRefGoogle ScholarPubMed
Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer, New York.CrossRefGoogle Scholar
Vedaldi, A., Gulshan, V., Varma, M. and Zisserman, A. (2009). Multiple kernels for object detection. In Proc. Internat. Conf. Computer Vision, pp. 606613.CrossRefGoogle Scholar
Viola, P. and Jones, M. J. (2004). Robust real-time face detection. Internat. J. Comput. Vision 57, 137154.CrossRefGoogle Scholar
Waeber, R., Frazier, P. I. and Henderson, S. G. (2011). A Bayesian approach to stochastic root finding. In Proc. 2011 Winter Simulation Conference, eds Jain, S. et al., IEEE.Google Scholar
Whittle, P. (1981). Arm-acquiring bandits. Ann. Prob. 9, 284292.CrossRefGoogle Scholar
Whittle, P. (1988). Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability (J. Appl. Prob. Spec. Vol. 25A), ed. Gani, J., Applied Probability Trust, Sheffield, pp. 287298.Google Scholar