Hostname: page-component-6766d58669-7cz98 Total loading time: 0 Render date: 2026-05-23T20:00:42.680Z Has data issue: false hasContentIssue false

Covariance selection quality through detection problem and AUC bounds

Published online by Cambridge University Press:  11 December 2018

Navid Tafaghodi Khajavi*
Affiliation:
Department of Electrical Engineering, University of Hawaii, Honolulu, HI 96822, USA
Anthony Kuh
Affiliation:
Department of Electrical Engineering, University of Hawaii, Honolulu, HI 96822, USA
*
Corresponding author: Navid Tafaghodi Khajavi Email: navidt@hawaii.edu

Abstract

Graphical models are increasingly being used in many complex engineering problems to model the dynamics between states of the graph. These graphs are often very large and approximation models are needed to reduce the computational complexity. This paper considers the problem of quantifying the quality of an approximation model for a graphical model (model selection problem). The model selection often uses a distance measure such as the Kullback–Leibler (KL) divergence between the original distribution and the model distribution to quantify the quality of the model approximation. We extend and broaden the body of research by formulating the model approximation as a detection problem between the original distribution and the model distribution. We focus on Gaussian random vectors and introduce the Correlation Approximation Matrix (CAM) and use the Area Under the Curve (AUC) for the formulated detection problem. The closeness measures such as the KL divergence, the log-likelihood ratio, and the AUC are functions of the eigenvalues of the CAM. Easily computable upper and lower bounds are found for the AUC. The paper concludes by computing these measures for real and synthetic simulation data. Tree approximations and more complex graphical models are considered for approximation models.

Information

Type
Original Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Authors, 2018
Figure 0

Fig. 1. (a) The complete graph; (b) The tree approximation of the complete graph.

Figure 1

Fig. 2. The ROC curve and the area under the ROC curve. Each point on the ROC curve indicates a detector with given detection and false-alarm probabilities.

Figure 2

Fig. 3. Possible feasible region for the AUC and the Kl divergence pair for all possible detectors or equivalently all possible ROC curves (the KL divergence is between the LLRT statistics under different hypotheses, i.e. ${\cal D}(f_{L_0}(l) || f_{L_1}(l))$ or ${\cal D}(f_{L_1}(l) || f_{L_0}(l))$.).

Figure 3

Fig. 4. Log-scale of the possible feasible region and its asymptotic behavior (linear line) for the AUC and the KL divergence pair for all possible detectors or equivalently all possible ROC curves (the KL divergence is between the LLRT statistics under different hypotheses, i.e. ${\cal D}(f_{L_1}(l) || f_{L_0}(l))$ or ${\cal D}(f_{L_0}(l) || f_{L_1}(l))$.) Close-up part shows the non-linear behavior of the possible feasible region around one.

Figure 4

Fig. 5. The possible feasible region boundaries and its asymptotic behavior for the AUC and the KL divergence pair for all possible detectors or equivalently all possible ROC curves (the KL divergence is between the LLRT statistics under different hypotheses, i.e. ${\cal D}(f_{L_0}(l) || f_{L_1}(l))$ or ${\cal D}(f_{L_1}(l) || f_{L_0}(l))$.).

Figure 5

Fig. 6. 1 −AUC versus the dimension of the graph, n for Star approximation of the Toeplitz example with ρ = 0.1 (left) and ρ = 0.9 (right). In both figures, the numerically evaluated AUC is compared with its bounds.

Figure 6

Fig. 7. 1 −AUC versus the dimension of the graph, n for Chain approximation of the Toeplitz example with ρ = 0.1 (left) and ρ = 0.9 (right). In both figures, the numerically evaluated AUC is compared with its bounds.

Figure 7

Fig. 8. Left: distribution of the generated trees (Normalized histogram) using MCMC versus the KL divergence and Right: distribution of the generated trees (Normalized histogram) using MCMC versus $\log _{10} (1-\hbox{AUC})$ for the Oahu solar measurement grid dataset in summer season at 12:00 PM.

Figure 8

Fig. 9. Left: distribution of all trees (Normalized histogram) versus the KL divergence and Right: distribution of all trees (Normalized histogram) versus the AUC for the Colorado dataset in summer season at 12:00 PM.

Figure 9

Fig. 10. Left: Distribution of the generated trees (Normalized histogram) using MCMC versus the KL divergence and Right: distribution of the generated trees (Normalized histogram) using MCMC versus $\log _{10} (1-\hbox{AUC})$ for the 2D sensor network example with 20 sensors and σ = 1.

Figure 10

Fig. 11. 1 −AUC and its bounds versus the dimension of the graph, n for σ = 1.3 (left) and σ = 1.8 (right), averaged over 1000 runs of sensor networks generated randomly.