Bibliography

Masashi Sugiyama; Taiji Suzuki; Takafumi Kanamori

doi:10.1017/CBO9781139035613.023

Bibliography

Published online by Cambridge University Press: 05 March 2012

Masashi Sugiyama ,

Taiji Suzuki and

Takafumi Kanamori

Show author details

Masashi Sugiyama: Affiliation:
Tokyo Institute of Technology
Taiji Suzuki: Affiliation:
University of Tokyo
Takafumi Kanamori: Affiliation:
Nagoya University, Japan

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Type: Chapter
Information: Density Ratio Estimation in Machine Learning , pp. 309 - 326

DOI: https://doi.org/10.1017/CBO9781139035613.023 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agakov, F., and Barber, D. 2006. Kernelized Infomax Clustering. Pages 17—24 of: Weiss, Y., Schölkopf, B., and Platt, J. (eds), Advances in Neural Information Processing Systems 18.Cambridge, MA: MIT Press.Google Scholar

Aggarwal, C. C., and Yu, P. S. (eds). 2008. Privacy-Preserving Data Mining: Models and Algorithms.New York: Springer.CrossRef Google Scholar

Akaike, H. 1970. Statistical Predictor Identification. Annals of the Institute of Statistical Mathematics, 22, 203–217.CrossRef

Akaike, H. 1974. A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control, AC-19(6), 716–723.CrossRef

Akaike, H. 1980. Likelihood and the Bayes Procedure. Pages 141—166 of: Bernardo, J. M., DeGroot, M. H., Lindley, D. V., and Smith, A. F. M. (eds), Bayesian Statistics.Valencia, Spain: Valencia University Press.Google Scholar

Akiyama, T., Hachiya, H., and Sugiyama, M. 2010. Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning. Neural Networks, 23(5), 639—648.CrossRef

Ali, S. M., and Silvey, S. D. 1966. A General Class of Coefficients of Divergence of One Distribution from Another. Journal of the Royal Statistical Society, Series B, 28(1), 131—142.

Amari, S. 1967. Theory of Adaptive Pattern Classifiers. IEEE Transactions on Electronic Computers, EC-16(3), 299–307. Amari, S. 1998. Natural Gradient Works Efficiently in Learning. Neural Computation, 10(2), 251–276.CrossRef

Amari, S. 2000. Estimating Functions of Independent Component analysis for Temporally Correlated Signals. Neural Computation, 12(9), 2083–2107.CrossRef

Amari, S., and Nagaoka, H. 2000. Methods of Information Geometry.Providence, RI: Oxford University Press.Google Scholar

Amari, S., Fujita, N., and Shinomoto, S. 1992. Four Types of Learning Curves. Neural Computation, 4(4), 605–618.CrossRef

Amari, S., Cichocki, A., and Yang, H. H. 1996. A New Learning Algorithm for Blind Signal Separation. Pages 757—763 of: Touretzky, D. S., Mozer, M., C., and Hasselmo, M. E. (eds), Advances in Neural Information Processing Systems 8.Cambridge, MA: MIT Press.Google Scholar

Anderson, N., Hall, P., and Titterington, D. 1994. Two-Sample Test Statistics for Measuring Discrepancies between Two Multivariate Probability Density Functions Using Kernel-based Density Estimates. Journal of Multivariate Analysis, 50, 41–54.CrossRef

Ando, R. K., and Zhang, T. 2005. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. Journal of Machine Learning Research, 6, 1817—1853.

Antoniak, C. 1974. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. The Annals of Statistics, 2(6), 1152–1174.CrossRef

Aronszajn, N. 1950. Theory of Reproducing Kernels. Transactions of the American Mathematical Society, 68, 337–404.CrossRef

Bach, F., and Harchaoui, Z. 2008. DIFFRAC: A Discriminative and Flexible Framework for Clustering. Pages 49—56 of: Platt, J. C., Koller, D., Singer, Y., and Roweis, S. (eds), Advances in Neural Information Processing Systems 20.Cambridge, MA: MIT Press.Google Scholar

Bach, F., and Jordan, M. I. 2002. Kernel Independent Component Analysis. Journal of Machine Learning Research, 3, 1–8.

Bach, F., and Jordan, M. I. 2006. Learning Spectral Clustering, with Application to Speech Separation. Journal of Machine Learning Research, 7, 1963—2001.

Bachman, G., and Narici, L. 2000. Functional Analysis.Mineola, NY: Dover Publications.Google Scholar

Bakker, B., and Heskes, T. 2003. Task Clustering and Gating for Bayesian Multitask Learning. Journal of Machine Learning Research, 4, 83—99.

Bartlett, P., Bousquet, O., and Mendelson, S. 2005. Local Rademacher Complexities. The Annals of Statistics, 33, 1487–1537.CrossRef

Basseville, M., and Nikiforov, V. 1993. Detection of Abrupt Changes: Theory and Application. Englewood Cliffs, NJ: Prentice-Hall, Inc.Google Scholar

Basu, A., Harris, I. R., Hjort, N. L., and Jones, M. C. 1998. Robust and Efficient Estimation by Minimising a Density Power Divergence. Biometrika, 85(3), 549–559.CrossRef

Baxter, J. 1997. A Bayesian/Information Theoretic Model of Learning to Learn via Mutiple Task Sampling. Machine Learning, 28, 7—39.CrossRef

Baxter, J. 2000. A Model of Inductive Bias Learning. Journal of Artificail Intelligence Research, 12, 149–198.

Belkin, M., and Niyogi, P. 2003. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 15(6), 1373—1396.CrossRef

Bellman, R. 1961. Adaptive Control Processes: A Guided Tour.Princeton, NJ: Princeton University Press.CrossRef Google Scholar

Ben-David, S., and Schuller, R. 2003. Exploiting Task Relatedness for Multiple Task Learning. Pages 567-580 of: Proceedings of the Sixteenth Annual Conference on Learning Theory (COLT2003).

Ben-David, S., Gehrke, J., and Schuller, R. 2002. A Theoretical Framework for Learning from a Pool of Disparate Data Sources. Pages 443–49 of: Proceedings of The Eighth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002).

Bensaid, N., and Fabre, J. P. 2007. Optimal Asymptotic Quadratic Error of Kernel Estimators of Radon-Nikodym Derivatives for Strong Mixing Data. Journal of Nonparametric Statistics, 19(2), 77–88.CrossRef

Bertsekas, D., Nedic, A., and Ozdaglar, A. 2003. Convex Analysis and Optimization.Belmont, MA: Athena Scientific.Google Scholar

Best, M. J. 1982. An Algorithm for the Solution of the Parametric Quadratic Programming Problem. Tech. rept. 82–24. Faculty of Mathematics, University of Waterloo.Google Scholar

Biau, G., and Györfi, L. 2005. On the Asymptotic Properties of a Nonparametric l1-test Statistic of Homogeneity. IEEE Transactions on Information Theory, 51(11), 3965—3973.CrossRef

Bickel, P. 1969. A Distribution Free Version of the Smirnov Two Sample Test in the p-variate Case. The Annals of Mathematical Statistics, 40(1), 1–23.CrossRef

Bickel, S., Brückner, M., and Scheffer, T. 2007. Discriminative Learning for Differing Training and Test Distributions. Pages 81—88 of: Proceedings of the 24th International Conference on Machine Learning (ICML2007).

Bickel, S., Bogojeska, J., Lengauer, T., and Scheffer, T. 2008. Multi-Task Learning for HIV Therapy Screening. Pages 56-63 of: McCallum, A., and Roweis, S. (eds), Proceedings of 25th Annual International Conference on Machine Learning (ICML2008).

Bishop, C. M. 1995. Neural Networks for Pattern Recognition.Oxford, UK: Clarendon Press.Google Scholar

Bishop, C. M. 2006. Pattern Recognition and Machine Learning.New York: Springer.Google Scholar

Blanchard, G., Kawanabe, M., Sugiyama, M., Spokoiny, V., and Müller, K.-R. 2006. In Search of Non-Gaussian Components of a High-dimensional Distribution. Journal of Machine Learning Research, 7(Feb.), 247–282.

Blei, D.M., and Jordan, M.I. 2006. Variational Inference for Dirichlet Process Mixtures. Bayesian Analysis, 1(1), 121–144.CrossRef

Bolton, R. J., and Hand, D. J. 2002. Statistical Fraud Detection: A Review. Statistical Science, 17(3), 235–255.

Bonilla, E., Chai, K. M., and Williams, C. 2008. Multi-Task Gaussian Process Prediction. Pages 153-160 of: Platt, J. C, Roller, D., Singer, Y., and Roweis, S. (eds), Advances in Neural Information Processing Systems 20.Cambridge, MA: MIT Press.Google Scholar

Borgwardt, K. M., Gretton, A., Rasch, M. J., Kriegel, H.-P., Schölkopf, B., and Smola, A. J. 2006. Integrating Structured Biological Data by Kernel Maximum Mean Discrepancy. Bioinformatics, 22(14), e49-e57.CrossRef

Bousquet, O. 2002. A Bennett Concentration Inequality and its Application to Suprema of Empirical Process. Note aux Compte Rendus de l'Académie des Sciences de Paris, 334, 495–500.

Boyd, S., and Vandenberghe, L. 2004. Convex Optimization.Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Bradley, A. P. 1997. The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30(7), 1145—1159.CrossRef

Bregman, L. M. 1967. The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming. USSR Computational Mathematics and Mathematical Physics, 7, 200—217.CrossRef

Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. 2000. LOF: Identifying Density-Based Local Outliers. Pages 93-104 of: Chen, W., Naughton, J. F., and Bernstein, P. A. (eds), Proceedings of the ACM SIGMOD International Conference on Management of Data.

Brodsky, B., and Darkhovsky, B. 1993. Nonparametric Methods in Change-Point Problems. Dordrecht, the Netherlands: Kluwer Academic Publishers. Broniatowski, M., and Keziou, A. 2009. Parametric Estimation and Tests through Divergences and the Duality Technique. Journal of Multivariate Analysis, 100, 16–26.

Buhmann, J. M. 1995. Data Clustering and Learning. Pages 278-281 of: Arbib, M. A. (ed), The Handbook of Brain Theory and Neural Networks.Cambridge, MA: MIT Press.Google Scholar

Bura, E., and Cook, R. D. 2001. Extending Sliced Inverse Regression. Journal of the American Statistical Association, 96(455), 996–1003.CrossRef

Caponnetto, A., and de Vito, E. 2007. Optimal Rates for Regularized Least-Squares Algorithm. Foundations of Computational Mathematics, 7(3), 331—368.CrossRef

Cardoso, J.-F. 1999. High-Order Contrasts for Independent Component Analysis. Neural Computation, 11(1), 157–192.CrossRef

CardosoJ.-F., J.-F., and Souloumiac, A. 1993. Blind Beamforming for Non-Gaussian Signals. Radar and Signal Processing, IEE Proceedings-F, 140(6), 362—370.CrossRef

Caruana, R., Pratt, L., and Thrun, S. 1997. Multitask Learning. Machine Learning, 28, 41–75.CrossRef

Cesa-Bianchi, N., and Lugosi, G. 2006. Prediction, Learning, and Games.Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Chan, J., Bailey, J., and Leckie, C. 2008. Discovering Correlated Spatio-Temporal Changes in Evolving Graphs. Knowledge and Information Systems, 16(1), 53–96.CrossRef

Chang, C. C., and Lin, C. J. 2001. LIBSVM: A Library for Support Vector Machines. Tech. rept. Department of Computer Science, National Taiwan University. http://www.csie.ntu.edu.tw/∼cjlin/libsvm/.

Chapelle, O., Schölkopf, B., and Zien, A. (eds). 2006. Semi-Supervised Learning.Cambridge, MA: MIT Press.CrossRef Google Scholar

Chawla, N. V., Japkowicz, N., and Kotcz, A. 2004. Editorial: Special Issue on Learning from Imbalanced Data Sets. ACMSIGKDD Explorations Newsletter, 6(1), 1–6. Chen, S.-M., Hsu, Y.-S., and Liaw, J.-T. 2009. On Kernel Estimators of Density Ratio. Statistics, 43(5), 463–79.CrossRef

Chen, S. S., Donoho, D. L., and Saunders, M. A. 1998. Atomic Decomposition by Basis Pursuit. SIAM Journal on Scientific Computing, 20(1), 33—61.CrossRef

Cheng, K. F., and Chu, C. K. 2004. Semiparametric Density Estimation under a Two-sample Density Ratio Model. Bernoulli, 10(4), 583–604.CrossRef

Chiaromonte, F., and Cook, R. D. 2002. Sufficient Dimension Reduction and Graphics in Regression. Annals of the Institute of Statistical Mathematics, 54(4), 768—795.CrossRef

Cichocki, A., and Amari, S. 2003. Adaptive Blind Signal and Image Processing: LearningAlgorithms and Applications.New York: Wiley.Google Scholar

Cohn, D. A., Ghahramani, Z., and Jordan, M. I. 1996. Active Learning with Statistical Models. Journal of Artificial Intelligence Research, 4, 129—145.

Collobert, R., and Bengio., S. 2001. SVMTorch: Support Vector Machines for Large-Scale Regression Problems. Journal of Machine Learning Research, 1, 143–160.

Comon, P. 1994. Independent Component Analysis, A New Concept? Signal Processing, 36(3), 287–314.CrossRef

Cook, R. D. 1998a. Principal Hessian Directions Revisited. Journal of the American Statistical Association, 93(441), 84–100.CrossRef

Cook, R. D. 1998b. Regression Graphics: Ideas for Studying Regressions through Graphics.New York: Wiley.CrossRef Google Scholar

Cook, R. D. 2000. SAVE: A Method for Dimension Reduction and Graphics in Regression. Communications in Statistics-Theory and Methods, 29(9), 2109–2121.CrossRef

Cook, R. D., and Forzani, L. 2009. Likelihood-Based Sufficient Dimension Reduction. Journal of the American Statistical Association, 104(485), 197–208.CrossRef

Cook, R. D., and Ni, L. 2005. Sufficient Dimension Reduction via Inverse Regression. Journal of the American Statistical Association, 100(470), 410–428.CrossRef

Cortes, C., and Vapnik, V. 1995. Support-Vector Networks. Machine Learning, 20, 273–297.

Cover, T. M., and Thomas, J. A. 2006. Elements of Information Theory. 2nd edn. Hoboken, NJ: Wiley.Google Scholar

Cramér, H. 1946. Mathematical Methods of Statistics.Princeton, NJ: Princeton University Press.Google Scholar

Craven, P., and Wahba, G. 1979. Smoothing Noisy Data with Spline Functions: Estimating the Correct Degree of Smoothing by the Method of Generalized Cross-Validation. Numerische Mathematik, 31, 377–403.CrossRef

Csiszár, I. 1967. Information-Type Measures of Difference of Probability Distributions and Indirect Observation. Studia Scientiarum Mathematicarum Hungarica, 2, 229–318.

Cwik, J., and Mielniczuk, J. 1989. Estimating Density Ratio with Application to Discriminant Analysis. Communications in Statistics: Theory and Methods, 18(8), 3057—3069.CrossRef

Darbellay, G. A., and Vajda, I. 1999. Estimation of the Information by an Adaptive Partitioning of the Observation Space. IEEE Transactions on Information Theory, 45(4), 1315–1321.CrossRef

Davis, J., Kulis, B., Jain, P., Sra, S., and Dhillon, I. 2007. Information-Theoretic Metric Learning. Pages 209—216 of: Ghahramani, Z. (ed), Proceedings of the 24th Annual International Conference on Machine Learning (ICML2007).

Demmel, J. W.1997. Applied Numerical Linear Algebra.Philadelphia, PA: Society for Industrial and Applied Mathematics.

Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, series B, 39(1), 1—38.

Dhillon, I. S., Guan, Y., and Kulis, B. 2004. Kernel K-Means, Spectral Clustering and Normalized Cuts. Pages 551-556 of: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York: ACM Press.Google Scholar

Donoho, D. L., and Grimes, C. E. 2003. Hessian Eigenmaps: Locally Linear Embedding Techniques for High-Dimensional Data. Pages 5591-5596 of: Proceedings of the National Academy of Arts and Sciences.

Duda, R. O., Hart, P. E., and Stork, D. G. 2001. Pattern Classification. 2nd edn. New York: Wiley.Google Scholar

Duffy, N., and Collins, M. 2002. Convolution Kernels for Natural Language. Pages 625-632 of: Dietterich, T. G., Becker, S., and Ghahramani, Z. (eds), Advances in Neural Information Processing Systems 14.Cambridge, MA: MIT Press.Google Scholar

Durand, J., and Sabatier, R. 1997. Additive Splines for Partial Least Squares Regression. Journal of the American Statistical Association, 92(440), 1546–1554.CrossRef

Edelman, A. 1988. Eigenvalues and Condition Numbers of Random Matrices. SIAM Journal on Matrix Analysis and Applications, 9(4), 543—560.CrossRef

Edelman, A., and Sutton, B. D. 2005. Tails of Condition Number Distributions. SIAM Journal on Matrix Analysis and Applications, 27(2), 547–560.CrossRef

Edelman, A., Arias, T. A., and Smith, S. T. 1998. The Geometry of Algorithms with Orthogonality Constraints. SIAM Journal on Matrix Analysis and Applications, 20(2), 303—353.CrossRef

Efron, B. 1975. The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis. Journal of the American Statistical Association, 70(352), 892–898.CrossRef

Efron, B., and Tibshirani, R. J. 1993. An Introduction to the Bootstrap.New York: Chapman & Hall/CRC.CrossRef Google Scholar

Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. 2004. Least Angle Regression. The Annals of Statistics, 32(2), 407–499.

Elkan, C. 2011. Privacy-Preserving Data Mining via Importance Weighting. In C, Dimitrakakis, A, Gkoulalas-Divanis, A, Mitrokotsa, V. S, Verykios, and Y., Saygin (Eds.): Privacy and Security Issues in Data Mining and Machine Learning, 15—21, Berlin: Springer.Google Scholar

Evgeniou, T., and Pontil, M. 2004. Regularized Multi-Task Learning. Pages 109-117 of: Proceedings of the Tenth A CM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2004).

Faivishevsky, L., and Goldberger, J. 2009. ICA based on a Smooth Estimation of the Differential Entropy. Pages 433-440 of: Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 21.Cambridge, MA: MIT Press.Google Scholar

Faivishevsky, L., and Goldberger, J. 2010 (Jun. 21—25). A Nonparametric Information Theoretic Clustering Algorithm. Pages 351—358 of: Joachims, A. T., and Fürnkranz, J. (eds), Proceedings of 27th International Conference on Machine Learning (ICML2010).

Fan, H., Zaïane, O. R., Foss, A., and Wu, J. 2009. Resolution-Based Outlier Factor: Dtecting the Top-n Most Outlying Data Points in Engineering Data. Knowledge and Information Systems, 19(1), 31–51.CrossRef

Fan, J., Yao, Q., and Tong, H. 1996. Estimation of Conditional Densities and Sensitivity Measures in Nonlinear Dynamical Systems. Biometrika, 83(1), 189–206.CrossRef

Fan, R. -E., Chen, P.-H., and Lin, C.-J. 2005. Working Set Selection Using Second Order Information for Training SVM. Journal of Machine Learning Research, 6, 1889—1918.

Fan, R. -E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J. 2008. LIBLINEAR: ALibrary for Large Linear Classification. Journal of Machine Learning Research, 9, 1871–1874.

Fedorov, V. V. 1972. Theory of Optimal Experiments. New York: Academic Press. Fernandez, E. A. 2005. The dprep Package. Tech. rept. University of Puerto Rico.Google Scholar

Feuerverger, A. 1993. A Consistent Test for Bivariate Dependence. International Statistical Review, 61(3), 419–433.CrossRef

Fisher, R. A. 1936. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179–188.CrossRef

Fishman, G. S. 1996. Monte Carlo: Concepts, Algorithms, and Applications.Berlin, Germany: Springer-Verlag.CrossRef Google Scholar

Fokianos, K., Kedem, B., Qin, J., and Short, D. A. 2001. A Semiparametric Approach to the One-Way Layout. Technometrics, 43, 56—64.CrossRef

Franc, V., and Sonnenburg, S. 2009. Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization. Journal of Machine Learning Research, 10, 2157–2192.

Fraser, A. M., and Swinney, H. L. 1986. Independent Coordinates for Strange Attractors from Mutual Information. Physical Review A, 33(2), 1134–1140.CrossRef

Friedman, J., and Rafsky, L. 1979. Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests. The Annals of Statistics, 7(4), 697–717.CrossRef

Friedman, J. H. 1987. Exploratory Projection Pursuit. Journal of the American Statistical Association, 82(397), 249–266.CrossRef

Friedman, J. H., and Tukey, J. W. 1974. A Projection Pursuit Algorithm for Exploratory Data Analysis. IEEE Transactions on Computers, C-23(9), 881–890.CrossRef

Fujimaki, R., Yairi, T., and Machida, K. 2005. An Approachh to Spacecraft Anomaly Detection Problem Using Kernel Feature Space. Pages 401—410 of: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2005).

Fujisawa, H., and Eguchi, S. 2008. Robust Parameter Estimation with a Small Bias against Heavy Contamination. Journal of Multivariate Analysis, 99(9), 2053–2081.CrossRef

Fukumizu, K. 2000. Statistical Active Learning in Multilayer Perceptrons. IEEE Transactions on Neural Networks, 11(1), 17–26.CrossRef

Fukumizu, K., Bach, F. R., and Jordan, M. I. 2004. Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces. Journal of Machine Learning Research, 5(Jan), 73–99.

Fukumizu, K., Bach, F. R., and Jordan, M. I. 2009. Kernel Dimension Reduction in Regression. The Annals of Statistics, 37(4), 1871–1905.CrossRef

Fukunaga, K. 1990. Introduction to Statistical Pattern Recognition. 2nd edn. Boston, MA: Academic Press, Inc.Google Scholar

Fung, G. M., and Mangasarian, O. L. 2005. Multicategory Proximal Support Vector Machine Classifiers. Machine Learning, 59(1-2), 77–97.CrossRef

Gao, J., Cheng, H., and Tan, P.-N. 2006a. A Novel Framework for Incorporating Labeled Examples into Anomaly Detection. Pages 593—597 of: Proceedings of the 2006 SIAMInternational Conference on Data Mining.

Gao, J., Cheng, H., and Tan, P.-N. 2006b. Semi-Supervised Outlier Detection. Pages 635-636 of: Proceedings of the 2006 ACM symposium on Applied Computing. Gärtner, T. 2003. A Survey of Kernels for Structured Data. SIGKDD Explorations, 5(1), S268–S275.

Gärtner, T., Flach, P., and Wrobel, S. 2003. On Graph Kernels: Hardness Results and Efficient Alternatives. Pages 129-143 of: Schölkopf, B., and Warmuth, M. (eds), Proceedings of the Sixteenth Annual Conference on Computational Learning Theory.

Ghosal, S., and van der Vaart, A. W. 2001. Entropies and Rates of Convergence for Maximum Likelihood and Bayes Estimation for Mixtures of Normal Densities. Annals of Statistics, 29, 1233–1263.CrossRef

Globerson, A., and Roweis, S. 2006. Metric Learning by Collapsing Classes. Pages 451-58 of: Weiss, Y., Schölkopf, B., and Platt, J. (eds), Advances in Neural Information Processing Systems 18.Cambridge, MA: MIT Press.Google Scholar

Godambe, V. P. 1960. An Optimum Property of Regular Maximum Likelihood Estimation. Annals of Mathematical Statistics, 31, 1208–1211.CrossRef

Goldberger, J., Roweis, S., Hinton, G., and Salakhutdinov, R. 2005. Neighbourhood Components Analysis. Pages 513—520 of: Saul, L. K., Weiss, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 17.Cambridge, MA: MIT Press.Google Scholar

Golub, G. H., and Loan, C. F. Van. 1996. Matrix Computations.Baltimore, MD: Johns Hopkins University Press.Google Scholar

Gomes, R., Krause, A., and Perona, P. 2010. Discriminative Clustering by Regularized Information Maximization. Pages 766-77 4 of: Lafferty, J., Williams, C. K. I., Zemel, R., Shawe-Taylor, J., and Culotta, A. (eds), Advances in Neural Information Processing Systems 23.Cambridge, MA: MIT Press.Google Scholar

Goutis, C., and Fearn, T. 1996. Partial Least Squares Regression on Smooth Factors. Journal of the American Statistical Association, 91(434), 627–632.CrossRef

Graham, D. B., and Allinson, N. M. 1998. Characterizing Virtual Eigensignatures for General Purpose Face Recognition. Pages 446-56 of: Computer and Systems Sciences. NATO ASI Series F, vol. 163. Berlin, Germany: Springer.Google Scholar

Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B. 2005. Measuring Statistical Dependence with Hilbert-Schmidt Norms. Pages 63-77 of: Jain, S., Simon, H. U., and Tomita, E. (eds), Algorithmic Learning Theory. Lecture Notes in Artificial Intelligence. Berlin, Germany: Springer-Verlag.Google Scholar

Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., and Smola, A. J. 2007. AKernel Method for the Two-Sample-Problem. Pages 513-520 of: Schölkopf, B., Platt, J., and Hoffman, T. (eds), Advances in Neural Information Processing Systems 19.Cambridge, MA: MIT Press.Google Scholar

Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., and Smola, A. 2008. A Kernel Statistical Test of Independence. Pages 585-592 of: Platt, J. C, Roller, D., Singer, Y., and Roweis, S. (eds), Advances in Neural Information Processing Systems 20.Cambridge, MA: MIT Press.Google Scholar

Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., and Schölkopf, B. 2009. Covariate Shift by Kernel Mean Matching. Chap. 8, pages 131—160 of: Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N. (eds), Dataset Shift in Machine Learning.Cambridge, MA: MIT Press.Google Scholar

Guralnik, V., and Srivastava, J. 1999. Event Detection from Time Series Data. Pages 33-2 of: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD1999).

Gustafsson, F. 2000. Adaptive Filtering and Change Detection.Chichester, UK: Wiley.Google Scholar

Guyon, I., and Elisseeff, A. 2003. An Introduction to Variable Feature Selection. Journal of Machine Learning Research, 3, 1157—1182.

Hachiya, H., Akiyama, T., Sugiyama, M., and Peters, J. 2009. Adaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning. Neural Networks, 22(10), 1399–1410.CrossRef

Hachiya, H., Sugiyama, M., and Ueda, N. 201 1a. Importance-Weighted Least-Squares Probabilistic Classifier for Covariate Shift Adaptation with Application to Human Activity Recognition. Neurocomputing. To appear. Hachiya, H., Peters, J., and Sugiyama, M. 2011b. Reward Weighted Regression with Sample Reuse. Neural Computation, 23(11), 2798–2832.

Hall, P., and Tajvidi, N. 2002. Permutation Tests for Equality of Distributions in Highdimensional Settings. Biometrika, 89(2), 359–374.CrossRef

Härdle, W., Müller, M., Sperlich, S., and Werwatz, A. 2004. Nonparametric and Semiparametric Models.Berlin, Germany: Springer.CrossRef Google Scholar

Hartigan, J. A. 1975. Clustering Algorithms.New York: Wiley.Google Scholar

Hastie, T., and Tibshirani, R. 1996a. Discriminant Adaptive Nearest Neighbor Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6), 607–615.CrossRef

Hastie, T., and Tibshirani, R. 1996b. Discriminant Analysis by Gaussian mixtures. Journal of the Royal Statistical Society, Series B, 58(1), 155–176.

Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction.New York: Springer.CrossRef Google Scholar

Hastie, T., Rosset, S., Tibshirani, R., and Zhu, J. 2004. The Entire Regularization Path for the Support Vector Machine. Journal of Machine Learning Research, 5, 1391—1415.

He, X., and Niyogi, P. 2004. Locality Preserving Projections. Pages 153-160 of: Thrun, S., Saul, L., and Schölkopf, B. (eds), Advances in Neural Information Processing Systems 16.Cambridge, MA: MIT Press.Google Scholar

Heckman, J. J. 1979. Sample Selection Bias as a Specification Error. Econometrica, 47(1), 153–161.CrossRef

Henkel, R. E. 1976. Tests of Significance.Beverly Hills, CA: Sage.CrossRef Google Scholar

Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., and Kanamori, T. 2011. Statistical Outlier Detection Using Direct Density Ratio Estimation. Knowledge and Information Systems, 26(2), 309–336.CrossRef

Hinton, G. E., and Salakhutdinov, R. R. 2006. Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507.CrossRef

Hodge, V., and Austin, J. 2004. A Survey of Outlier Detection Methodologies. Artificial Intelligence Review, 22(2), 85–126.CrossRef

Hoerl, A. E., and Kennard, R. W. 1970. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(3), 55—67.CrossRef

Horn, R., and Johnson, C. 1985. Matrix Analysis.Cambridge, UK: Cambridge University Press. Hotelling, H. 1936. Relations between Two Sets of Variates. Biometrika, 28(3-4), 321–377.CrossRef Google Scholar

Hotelling, H. 1951. A Generalized T Test and Measure of Multivariate Dispersion. Pages 23-41 of: Proceedings of the 2nd Berkeley Symposium on Mathematical Statistics and Probability.Berkeley: University of California Press.Google Scholar

Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., and Schölkopf, B. 2009. Nonlinear Causal Discovery with Additive Noise Models. Pages 689-696 of: Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 21.Cambridge, MA: MIT Press.Google Scholar

Huang, J., Smola, A., Gretton, A., Borgwardt, K. M., and Schölkopf, B. 2007. Correcting Sample Selection Bias by Unlabeled Data. Pages 601-608 of: Schölkopf, B., Platt, J., and Hoffman, T. (eds), Advances in Neural Information Processing Systems 19.Cambridge, MA: MIT Press. Huber, P. J. 1985. Projection Pursuit. The Annals of Statistics, 13(2), 435–75.Google Scholar

Hulle, M. M. Van. 2005. Edgeworth Approximation of Multivariate Differential Entropy. Neural Computation, 17(9), 1903–1910.CrossRef

Hulle, M.M. Van. 2008. Sequential Fixed-Point ICABased on Mutual Information Minimization. Neural Computation, 20(5), 1344–1365.CrossRef

Hyvaerinen, A. 1999. Fast and Robust Fixed-Point Algorithms for Independent Component Analysis. IEEE Transactions on Neural Networks, 10(3), 626.CrossRef

Hyvärinen, A., Karhunen, J., and Oja, E. 2001. Independent Component Analysis. New York: Wiley. Ide, T., and Kashima, H. 2004. Eigenspace-Based Anomaly Detection in Computer Systems. Pages 440-449 of: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2004).

Ishiguro, M., Sakamoto, Y., and Kitagawa, G. 1997. Bootstrapping Log Likelihood and EIC, an Extension of AIC. Annals of the Institute of Statistical Mathematics, 49, 411—434.CrossRef

Jacoba, P., and Oliveirab, P. E. 1997. Kernel Estimators of General Radon-Nikodym Derivatives. Statistics, 30, 25–46.CrossRef

Jain, A. K., and Dubes, R. C. 1988. Algorithms for Clustering Data.Englewood Cliffs, NJ: Prentice Hall.Google Scholar

Jaynes, E. T. 1957. Information Theory and Statistical Mechanics. Physical Review, 106(4), 620–630.CrossRef

Jebara, T. 2004. Kernelized Sorting, Permutation and Alignment for Minimum Volume PCA. Pages 609—623 of: 17th Annual Conference on Learning Theory (COLT2004).

Jiang, X., and Zhu, X. 2009. v Eye: Behavioral Footprinting for Self-Propagating Worm Detection and Profiling. Knowledge and Information Systems, 18(2), 231–262.CrossRef

Joachims, T. 1999. Making Large-Scale SVM Learning Practical. Pages 169-184 of: Schölkopf, B., Burges, C. J. C., and Smola, A. J. (eds), Advances in Kernel Methods—Support Vector Learning.Cambridge, MA: MIT Press.Google Scholar

Joachims, T. 2006. Training Linear SVMs in Linear Time. Pages 217-226 of: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2006).

Jolliffe, I. T. 1986. Principal Component Analysis.New York: Springer-Verlag.CrossRef Google Scholar

Jones, M. C., Hjort, N.L., Harris, I. R., and Basu, A. 2001. A Comparison of Related Density-based Minimum Divergence Estimators. Biometrika, 88, 865–873.CrossRef

Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. 1999. An Introduction to Variational Methods for Graphical Models. Machine Learning, 37(2), 183.CrossRef

Jutten, C., and Herault, J. 1991. Blind Separation of Sources, Part I: An Adaptive algorithm Based on Neuromimetic Architecture. Signal Processing, 24(1), 1–10.CrossRef

Kanamori, T. 2007. Pool-Based Active Learning with Optimal Sampling Distribution and its Information Geometrical Interpretation. Neurocomputing, 71(1—3), 353—362.CrossRef

Kanamori, T., and Shimodaira, H. 2003. Active Learning Algorithm Using the Maximum Weighted Log-Likelihood Estimator. Journal of Statistical Planning and Inference, 116(1), 149–162.CrossRef

Kanamori, T., Hido, S., and Sugiyama, M. 2009. A Least-squares Approach to Direct Importance Estimation. Journal of Machine Learning Research, 10(Jul.), 1391—1445.

Kanamori, T., Suzuki, T., and Sugiyama, M. 2010. Theoretical Analysis of Density Ratio Estimation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E93-A(4), 787–798.CrossRef

Kanamori, T., Suzuki, T., and Sugiyama, M. 2011a. f-Divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models. IEEE Transactions on Information Theory. To appear.

Kanamori, T., Suzuki, T., and Sugiyama, M. 2011b. Statistical Analysis of Kernel-Based Least-Squares Density-Ratio Estimation. Machine Learning. To appear.

Kanamori, T., Suzuki, T., and Sugiyama, M. 2011c. Kernel-Based Least-Squares Density-Ratio Estimation II. Condition Number Analysis. Machine Learning. submitted.

Kankainen, A. 1995. Consistent Testing of Total Independence Based on the Empirical Characteristic Function. Ph.D. thesis, University of Jyväskylä, Jyväskylä, Finland. Karatzoglou, A., Smola, A., Hornik, K., and Zeileis, A. 2004. kernlab—An S4 Package for Kernel Methods in R. Journal of Statistical Planning and Inference, 11(9), 1–20.

Kashima, H., and Koyanagi, T. 2002. Kernels for Semi-Structured Data. Pages 291-298 of: Proceedings of the Nineteenth International Conference on Machine Learning.Kashima, H., Tsuda, K., and Inokuchi, A. 2003. Marginalized Kernels between Labeled Graphs. Pages 321-328 of: Proceedings of the Twentieth International Conference on Machine Learning.

Kato, T., Kashima, H., Sugiyama, M., and Asai, K. 2010. Conic Programming for Multi-Task Learning. IEEE Transactions on Knowledge and Data Engineering, 22(7), 957—968.CrossRef

Kawahara, Y., and Sugiyama, M. 2011. Sequential Change-Point Detection Based on Direct Density-Ratio Estimation. Statistical Analysis and Data Mining. To appear. Kawanabe, M., Sugiyama, M., Blanchard, G., and Müller, K.-R. 2007. A New Algorithm of Non-Gaussian Component Analysis with Radial Kernel Functions. Annals of the Institute of Statistical Mathematics, 59(1), 57–75.

Ke, Y., Sukthankar, R., and Hebert, M. 2007. Event Detection in Crowded Videos. Pages 1-8 of: Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV2007).

Keziou, A. 2003a. Dual Representation of ϕ-Divergences and Applications. Comptes Rendus Mathématique, 336(10), 857–862.CrossRef

Keziou, A. 2003b. Utilisation Des Divergences Entre Mesures en Statistique Inferentielle. Ph.D. thesis, UPMC University. in French.

Keziou, A., and Leoni-Aubin, S. 2005. Test of Homogeneity in Semiparametric Two-sample Density Ratio Models. Comptes Rendus Mathématique, 340(12), 905—910.CrossRef

Keziou, A., and Leoni-Aubin, S. 2008. On Empirical Likelihood for Semiparametric Two-Sample Density Ratio Models. Journal of Statistical Planning and Inference, 138(4), 915–928.CrossRef

Khan, S., Bandyopadhyay, S., Ganguly, A., and Saigal, S. 2007. Relative Performance of Mutual Information Estimation Methods for Quantifying the Dependence among Short and Noisy Data. Physical Review E, 76, 026209.CrossRef

Kifer, D., Ben-David, S, and Gehrke, J. 2004. Detecting Change in Data Streams. Pages 180-191 of: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB2004).

Kimeldorf, G. S., and Wahba, G. 1971. Some Results on Tchebycheffian Spline Functions. Journal of Mathematical Analysis and Applications, 33(1), 82–95.CrossRef

Kimura, M., and Sugiyama, M. 2011. Dependence-Maximization Clustering with Least-Squares Mutual Information. Journal of Advanced Computational Intelligence and Intelligent Informatics, 15(7), 800–805.CrossRef

Koh, K., Kim, S.-J., and Boyd, S. P. 2007. An Interior-point Method for Large-Scale h-Regularized Logistic Regression. Journal of Machine Learning Research, 8, 1519–1555.

Kohonen, T. 1988. Learning Vector Quantization. Neural Networks, 1 (Supplementary 1), 303.

Kohonen, T. 1995. Self-Organizing Maps.Berlin, Germany: Springer.CrossRef Google Scholar

Koltchinskii, V 2006 Local Rademacher Complexities and Oracle Inequalities in Risk Minimization. The Annals of Statistics, 34, 2593—2656.CrossRef

Kondor, R. I., and Lafferty, J. 2002. Diffusion Kernels on Graphs and Other Discrete Input Spaces. Pages 315-322 of: Proceedings of the Nineteenth International Conference on Machine Learning.Konishi, S., and Kitagawa, G. 1996. Generalized Information Criteria in Model Selection. Biometrika, 83(4), 875–890.

Korostelëv, A. P., and Tsybakov, A. B. 1993. Minimax Theory of Image Reconstruction.New York: Springer.CrossRef Google Scholar

Kraskov, A., Stögbauer, H., and Grassberger, P. 2004. Estimating Mutual Information. Physical Review E, 69(6), 066138.CrossRef

Kullback, S. 1959. Information Theory and Statistics.New York: Wiley.Google Scholar

Kullback, S., and Leibler, R. A. 1951. On Information and Sufficiency. Annals of Mathematical Statistics, 22, 79–86.CrossRef

Kurihara, N., Sugiyama, M., Ogawa, H., Kitagawa, K., and Suzuki, K. 2010. Iteratively-Reweighted Local Model Fitting Method for Adaptive and Accurate Single-Shot Surface Profiling. Applied Optics, 49(22), 4270–4277.CrossRef

Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Pages 282—289 of: Proceedings of the 18th International Conference on Machine Learning.

Lagoudakis, M. G., and Parr, R. 2003. Least-Squares Policy Iteration. Journal of Machine Learning Research, 4, 1107—1149.Google Scholar

Lapedriza, À., Masip, D., and Vitrià, J. 2007. A Hierarchical Approach for Multi-task Logistic Regression. Pages 258-265 of: Mart, J., Bened, J. M., Mendonga, A. M., and Serrat, J. (eds), Proceedings of the 3rd Iberian Conference on Pattern Recognition and Image Analysis, Part <I>II. Lecture Notes in Computer Science, vol. 4478. Berlin, Germany: Springer-Verlag.

Larsen, J., and Hansen, L. K. 1996. Linear Unlearning for Cross-Validation. Advances in Computational Mathematics, 5, 269–280.

Latecki, L. J., Lazarevic, A., and Pokrajac, D. 2007. Outlier Detection with Kernel Density Functions. Pages 61-75 of: Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition.Lee, T.-W., Girolami, M., and Sejnowski, T. J. 1999. Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources. Neural Computation, 11(2), 417–441.

Lehmann, E. L. 1986. Testing Statistical Hypotheses. 2nd edn. New York: Wiley.CrossRef Google Scholar

Lehmann, E. L., and Casella, G. 1998. Theory of Point Estimation. 2nd edn. New York: Springer.Google Scholar

Li, K. 1991. Sliced Inverse Regression for Dimension Reduction. Journal of the AmericanStatistical Association, 86(414), 316–342.

Li, K. 1992. On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma. Journal of the American Statistical Association, 87(420), 1025–1039.CrossRef

Li, K. C., Lue, H. H., and Chen, C. H. 2000. Interactive Tree-structured Regression via Principal Hessian Directions. Journal of the American Statistical Association, 95(450), 547—560.CrossRef

Li, L., and Lu, W. 2008. Sufficient Dimension Reduction with Missing Predictors. Journal of the American Statistical Association, 103(482), 822–831.CrossRef

Li, Q. 1996. Nonparametric Testing of Closeness between Two Unknown Distribution Functions. Econometric Reviews, 15(3), 261—274.CrossRef

Li, Y., Liu, Y., and Zhu, J. 2007. Quantile Regression in Reproducing Kernel Hilbert Spaces. Journal of the American Statistical Association, 102(477), 255–268.CrossRef

Li, Y., Kambara, H., Koike, Y., and Sugiyama, M. 2010. Application of Covariate Shift Adaptation Techniques in Brain Computer Interfaces. IEEE Transactions on Biomedical Engineering, 57(6), 1318–1324.CrossRef

Lin, Y 2002. Support Vector Machines and the Bayes Rule in Classification. Data Mining and Knowledge Discovery, 6(3), 259–275.CrossRef

Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., and Watkins, C. 2002. Text Classification Using String Kernels. Journal of Machine Learning Research, 2, 419–44.

Luenberger, D., and Ye, Y. 2008. Linear and Nonlinear Programming.Reading, MA: Springer.Google Scholar

Luntz, A., and Brailovsky, V. 1969. On Estimation of Characters Obtained in Statistical Procedure of Recognition. Technicheskaya Kibernetica, 3. in Russian.

MacKay, D. J. C. 2003. Information Theory, Inference, and Learning Algorithms.Cambridge, UK: Cambridge University Press.Google Scholar

MacQueen, J. B. 1967. Some Methods for Classification and Analysis of Multivariate Observations. Pages 281-297 of: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. Berkeley: University of California Press.Google Scholar

Mallows, C. L. 1973. Some Comments on CP. Technometrics, 15(4), 661—675.

Manevitz, L. M., and Yousef, M. 2002. One-Class SVMs for Document Classification. Journal of Machine Learning Research, 2, 139–154.

Meila, M., and Heckerman, D. 2001. An Experimental Comparison of Model-Based Clustering Methods. Machine Learning, 42(1/2), 9.CrossRef

Mendelson, S. 2002. Improving the Sample Complexity Using Global Data. IEEE Transactions on Information Theory, 48(7), 1977–1991.CrossRef

Mercer, J. 1909. Functions of Positive and Negative Type and Their Connection with the Theory of Integral Equations. Philosophical Transactions of the Royal Society of London, A-209, 415–46.CrossRef

Micchelli, C. A., and Pontil, M. 2005. Kernels for Multi-Task Learning. Pages 921-928 of: Saul, L. K., Weiss, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 17.Cambridge, MA: MIT Press.Google Scholar

Minka, T. P. 2007. AComparison of Numerical Optimizers for Logistic Regression. Tech. rept. Microsoft Research.

Moré, J. J., and Sorensen, D. C. 1984. Newton's Method. In: Golub, G. H. (ed), Studies in Numerical Analysis.Washington, DC: Mathematical Association of America.Google Scholar

Mori, S., Sugiyama, M., Ogawa, H., Kitagawa, K., and Irie, K. 2011. Automatic Parameter Optimization of the Local Model Fitting Method for Single-shot Surface Profiling. Applied Optics, 50(21), 3773–3780.CrossRef

Müller, A. 1997. Integral Probability Metrics and Their Generating Classes of Functions. Advances in Applied Probability, 29, 429—443.CrossRef

Murad, U., and Pinkas, G. 1999. Unsupervised Profiling for Identifying Superimposed Fraud. Pages 251-261 of: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD1999).

Murata, N., Yoshizawa, S., and Amari, S. 1994. Network Information Criterion — Determining the Number of Hidden Units for an Artificial Neural Network Model. IEEE Transactions on Neural Networks, 5(6), 865–872.CrossRef

Ng, A. Y., Jordan, M. I., and Weiss, Y 2002. On Spectral Clustering: Analysis and An Algorithm. Pages 849—856 of: Dietterich, T. G., Becker, S., and Ghahramani, Z. (eds), Advances in Neural Information Processing Systems 14.Cambridge, MA: MIT Press.Google Scholar

Nguyen, X., Wainwright, M. J., and Jordan, M. I. 2010. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization. IEEE Transactions on Information Theory, 56(11), 5847–5861.CrossRef

Nishimori, Y., and Akaho, S. 2005. Learning Algorithms Utilizing Quasi-geodesic Flows on the Stiefel Manifold. Neurocomputing, 67, 106–135.CrossRef

Oja, E. 1982. A Simplified Neuron Model as a Principal Component Analyzer. Journal of Mathematical Biology, 15(3), 267–273.CrossRef

Oja, E. 1989. Neural Networks, Principal Components and Subspaces. International Journal of Neural Systems, 1, 61–68.CrossRef

Patriksson, M. 1999. Nonlinear Programming and Variational Inequality Problems. Dordrecht, the Netherlands: Kluwer Academic. Pearl, J. 2000. Causality: Models, Reasning and Inference.New York: Cambridge University Press.CrossRef Google Scholar

Pearson, K. 1900. On the Criterion That a Given System of Deviations from the Probable in the Case of a Correlated System of Variables Is Such That It Can Be Reasonably Supposed to Have Arisen from Random Sampling. Philosophical Magazine Series 5, 50(302), 157–175.CrossRef

Pérez-Cruz, F. 2008. Kullback-Leibler Divergence Estimation of Continuous Distributions. Pages 1666—1670 of: Proceedings of IEEE International Symposium on Information Theory.

Platt, J. 1999. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Pages 169-184 of: Schölkopf, B., Burges, C. J. C., and Smola, A. J. (eds), Advances in Kernel Methods—Support Vector Learning.Cambridge, MA: MIT Press.Google Scholar

Platt, J. 2000. Probabilities for SV Machines. In: Smola, A. J., Bartlett, P. L., Schölkopf, B., and Schuurmans, D. (eds), Advances in Large Margin Classifiers.Cambridge, MA: MIT Press.Google Scholar

Plumbley, M. D. 2005. Geometrical Methods for Non-Negative ICA: Manifolds, Lie Groups and Toral Subalgebras. Neurocomputing, 67(Aug.), 161–197. Press, W., H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. 1992. Numerical Recipes in C. 2nd edn. Cambridge, UK: Cambridge University Press.Google Scholar

Pukelsheim, F. 1993. Optimal Design of Experiments.New York: Wiley.Google Scholar

Qin, J. 1998. Inferences for Case-control and Semiparametric Two-sample Density Ratio Models. Biometrika, 85(3), 619–630.CrossRef

Qing, W., Kulkarni, S. R., and Verdu, S. 2006. A Nearest-Neighbor Approach to Estimating Divergence between Continuous Random Vectors. Pages 242-246 of: Proceedings of IEEE International Symposium on Information Theory.

Quadrianto, N., Smola, A. J., Song, L., and Tuytelaars, T. 2010. Kernelized Sorting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1809—1821.CrossRef

Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N. (eds). 2009. Dataset Shift in Machine Learning.Cambridge, MA: MIT Press.Google Scholar

R, Development Core Team. 2009. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna, Austria. http://www.r-project.org.

Rao, C. 1945. Information and the Accuracy Attainable in the Estimation of Statistical Parameters. Bulletin of the Calcutta Mathematics Society, 37, 81–89.

Rasmussen, C. E., and Williams, C. K. I. 2006. Gaussian Processes for Machine Learning.Cambridge, MA: MIT Press.Google Scholar

Rätsch, G., Onoda, T., and Müller, K.-R. 2001. Soft Margins for Ada Boost. Machine Learning, 42(3), 287–320.CrossRef

Reiss, P. T., and Ogden, R. T. 2007. Functional Principal Component Regression and Functional Partial Least Squares. Journal of the American Statistical Association, 102(479), 984—996.CrossRef

Rifkin, R., Yeo, G., and Poggio, T. 2003. Regularized Least-Squares Classification. Pages 131—154 of: Suykens, J. A. K., Horvath, G., Basu, S., Micchelli, C., and Vandewalle, J. (eds), Advances in Learning Theory: Methods, Models and Applications. NATO Science Series III: Computer & Systems Sciences, vol. 190. Amsterdam, the Netherlands: IOS Press.Google Scholar

Rissanen, J. 1978. Modeling by Shortest Data Description. Automatica, 14(5), 465—471. Rissanen, J. 1987. Stochastic Complexity. Journal of the Royal Statistical Society, Series B, 49(3), 223–239.CrossRef

Rockafellar, R. T. 1970. Convex Analysis.Princeton, NJ: Princeton University Press.CrossRef Google Scholar

Rosenblatt, M. 1956. Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, 27, 832–837.CrossRef

Roweis, S., and Saul, L. 2000. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500), 2323–2326.CrossRef

Sankar, ., Spielman, D. A., and Teng, S.-H. 2006. Smoothed Analysis of the Condition Numbers and Growth Factors of Matrices. SIAM Journal on Matrix Analysis and Applications, 28(2), 446–176.CrossRef

Saul, L. K., and Roweis, S. T. 2003. Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds. Journal of Machine Learning Research, 4(Jun), 119—155.

Schapire, R., Freund, Y., Bartlett, P., and Lee, W. Sun. 1998. Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. Annals of Statistics, 26, 1651–1686.

Scheinberg, K. 2006. An Efficient Implementation of an Active Set Method for SVMs. Journal of Machine Learning Research, 7, 2237–2257.

Schmidt, M. 2005. min Func. http://people.cs.ubc.ca/∼schmidtm/Software/min Func.html.

Schölkopf, B., and Smola, A. J. 2002. Learning with Kernels.Cambridge, MA: MIT Press.Google Scholar

Schölkopf, B., Smola, A., and Müller, K.-R. 1998. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 10(5), 1299—1319.CrossRef

Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. 2001. Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13(7), 1443–1471.CrossRef

Schwarz, G. 1978. Estimating the Dimension of a Model. The Annals of Statistics, 6, 461–464. Shi, J., and Malik, J. 2000. Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRef

Shibata, R. 1981. An Optimal Selection of Regression Variables. Biometrika, 68(1), 45–54.CrossRef

Shibata, R. 1989. Statistical Aspects of Model Selection. Pages 215-240 of: Willems, J. C. (ed), From Data to Model.New York: Springer-Verlag.Google Scholar

Shimodaira, H. 2000. Improving Predictive Inference under Covariate Shift by Weighting the Log-Likelihood Function. Journal of Statistical Planning and Inference, 90(2), 227–244.CrossRef

Silva, J., and Narayanan, S. 2007. Universal Consistency of Data-Driven Partitions for Divergence Estimation. Pages 2021—2025 of: Proceedings of IEEE International Symposium on Information Theory.Simm, J., Sugiyama, M., and Kato, T. 2011. Computationally Efficient Multi-task Learning with Least-Squares Probabilistic Classifiers. IPSJ Transactions on Computer Vision and Applications, 3, 1—8.

Smola, A., Song, L., and Teo, C. H. 2009. Relative Novelty Detection. Pages 536—543 of: van Dyk, D., and Welling, M. (eds), Proceedings of Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS2009). JMLR Workshop and Conference Proceedings, vol. 5.

Song, L., Smola, A., Gretton, A., and Borgwardt, K. 2007a. A Dependence Maximization View of Clustering. Pages 815-822 of: Ghahramani, Z. (ed), Proceedings of the 24th Annual International Conference on Machine Learning (ICML2007).

Song, L., Smola, A., Gretton, A., Borgwardt, K. M., and Bedo, J. 2007b. Supervised Feature Selection via Dependence Estimation. Pages 823—830 of: Ghahramani, Z. (ed), Proceedings of the 24th Annual International Conference on Machine Learning (ICML2007).

Spielman, D. A., and Teng, S.-H. 2004. Smoothed Analysis of Algorithms: Why the Simplex Algorithm Usually Takes Polynomial Time. Journal of the ACM, 51(3), 385–463.CrossRef

Sriperumbudur, B., Fukumizu, K., Gretton, A., Lanckriet, G., and Schölkopf, B. 2009. Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions. Pages 1750—1758 of: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C. K. I., and Culotta, A. (eds), Advances in Neural Information Processing Systems 22.Cambridge, MA: MIT Press.Google Scholar

Steinwart, I.2001. On the Influence of the Kernel on the Consistency of Support Vector Machines. Journal of Machine Learning Research, 2, 67—93.

Steinwart, I., Hush, D., and Scovel, C. 2009. Optimal Rates for Regularized Least Squares Regression. Pages 79-93 of: Proceedings of the Annual Conference on Learning Theory.

Stone, M. 1974. Cross-validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Statistical Society, Series B, 36, 111–147.

Storkey, A., and Sugiyama, M. 2007. Mixture Regression for Covariate Shift. Pages 1337-1344 of: Schölkopf, B., Platt, J. C., and Hoffmann, T. (eds), Advances in Neural Information Processing Systems 19.Cambridge, MA: MIT Press.Google Scholar

Student. 1908. The Probable Error of A Mean. Biometrika 6, 1–25.

Sugiyama, M. 2006. Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error. Journal of Machine Learning Research, 7(Jan.), 141–166.

Sugiyama, M. 2007. Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis. Journal of Machine Learning Research, 8(May), 1027—1061.

Sugiyama, M. 2009. On Computational Issues of Semi-supervised Local Fisher Discriminant Analysis. IEICE Transactions on Information and Systems, E92-D(5), 1204–1208.CrossRef

Sugiyama, M. 2010. Superfast-Trainable Multi-Class Probabilistic Classifier by Least-Squares Posterior Fitting. IEICE Transactions on Information and Systems, E93-D(10), 2690—2701.CrossRef

Sugiyama, M., and Kawanabe, M. 2012. Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation.Cambridge, MA: MIT Press. to appear.CrossRef Google Scholar

Sugiyama, M., and Müller, K.-R. 2002. The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces. Journal of Machine Learning Research, 3(Nov.), 323–359.

Sugiyama, M., and Müller, K.-R. 2005. Input-Dependent Estimation of Generalization Error under Covariate Shift. Statistics & Decisions, 23(4), 249–279.CrossRef

Sugiyama, M., and Nakajima, S. 2009. Pool-based Active Learning in Approximate Linear Regression. Machine Learning, 75(3), 249—274.CrossRef

Sugiyama, M., and Ogawa, H. 2000. Incremental Active Learning for Optimal Generalization. Neural Computation, 12(12), 2909–2940.CrossRef

Sugiyama, M., and Ogawa, H. 2001a. Active Learning for Optimal Generalization in Trigonometric Polynomial Models. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E84-A(9), 2319–2329.

Sugiyama, M., and Ogawa, H. 2001b. Subspace Information Criterion for Model Selection. Neural Computation, 13(8), 1863–1889.CrossRef

Sugiyama, M., and Ogawa, H. 2003. Active Learning with Model Selection—Simultaneous Optimization of Sample Points and Models for Trigonometric Polynomial Models. IEICE Transactions on Information and Systems, E86-D(12), 2753–2763.

Sugiyama, M., and Rubens, N. 2008. A Batch Ensemble Approach to Active Learning with Model Selection. Neural Networks, 21(9), 1278–1286.CrossRef

Sugiyama, M., and Suzuki, T. 2011. Least-Squares Independence Test. IEICE Transactions on Information and Systems, E94-D(6), 1333–1336.CrossRef

Sugiyama, M., Kawanabe, M., and Müller, K.-R. 2004. Trading Variance Reduction with Unbiasedness: The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression. Neural Computation, 16(5), 1077–1104.CrossRef

Sugiyama, M., Ogawa, H., Kitagawa, K., and Suzuki, K. 2006. Single-shot Surface Profiling by Local Model Fitting. Applied Optics, 45(31), 7999–8005.CrossRef

Sugiyama, M., Krauledat, M., and Müller, K.-R. 2007. Covariate Shift Adaptation by Importance Weighted Cross Validation. Journal of Machine Learning Research, 8(May), 985–1005.

Sugiyama, M., Suzuki, T., Nakajima, S., Kashima, H., von Bünau, P., and Kawanabe, M. 2008. Direct Importance Estimation for Covariate Shift Adaptation. Annals of the Institute of Statistical Mathematics, 60(4), 699–746.CrossRef

Sugiyama, M., Kanamori, T., Suzuki, T., Hido, S., Sese, J., Takeuchi, I., and Wang, L. 2009. A Density-ratio Framework for Statistical Data Processing. IPSJ Transactions on Computer Vision and Applications, 1, 183—208. Sugiyama, M., Kawanabe, M., and Chui, P. L. 2010a. Dimensionality Reduction for Density Ratio Estimation in High-dimensional Spaces. Neural Networks, 23(1), 44–59.CrossRef

Sugiyama, M., Takeuchi, I., Suzuki, T., Kanamori, T., Hachiya, H., and Okanohara, D. 2010b. Least-squares Conditional Density Estimation. IEICE Transactions on Information and Systems, E93-D(3), 583–594.CrossRef

Sugiyama, M., Idé, T., Nakajima, S., and Sese, J. 2010c. Semi-supervised Local Fisher Discriminant Analysis for Dimensionality Reduction. Machine Learning, 78(1-2), 35–61.CrossRef

Sugiyama, M., Suzuki, T., and Kanamori, T. 2011a. Density Ratio Matching under the Bregman Divergence: A Unified Framework of Density Ratio Estimation. Annals of the Institute of Statistical Mathematics. To appear.

Sugiyama, M., Yamada, M., von Bünau, P., Suzuki, T., Kanamori, T., and Kawanabe, M. 2011b. Direct Density-ratio Estimation with Dimensionality Reduction via Least-squares Hetero-distributional Subspace Search. Neural Networks, 24(2), 183–198.CrossRef

Sugiyama, M., Suzuki, T., Itoh, Y., Kanamori, T., and Kimura, M. 2011c. Least-Squares Two-Sample Test. Neural Networks, 24(7), 735–751.CrossRef

Sugiyama, M., Yamada, M., Kimura, M., and Hachiya, H. 2011d. On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution. In: Proceedings of 28th International Conference on Machine Learning (ICML2011),65–72.

Sutton, R. S., and Barto, G. A. 1998. Reinforcement Learning: An Introduction.Cambridge, MA: MIT Press.Google Scholar

Suykens, J. A. K., Gestel, T. Van, Brabanter, J. De, Moor, B. De, and Vandewalle, J. 2002. Least Squares Support Vector Machines. Singapore: World Scientific Pub. Co. Suzuki, T., and Sugiyama, M. 2010. Sufficient Dimension Reduction via Squared-loss Mutual Information Estimation. Pages 804—811 of: Teh, Y. W., and Tiggerington, M. (eds), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS2010). JMLR Workshop and Conference Proceedings, vol. 9.

Suzuki, T., and Sugiyama, M. 2011. Least-Squares Independent Component Analysis. Neural Computation, 23(1), 284–301.CrossRef

Suzuki, T., Sugiyama, M., Sese, J., and Kanamori, T. 2008. Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation. Pages 5-20 of: Saeys, Y., Liu, H., Inza, I., Wehenkel, L., and Peer, , Y, Van (eds), Proceedings of ECML-PKDD2008 Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery 2008 (FSDM2008). JMLR Workshop and Conference Proceedings, vol. 4.

Suzuki, T., Sugiyama, M., and Tanaka, T. 2009a. Mutual Information Approximation via Maximum Likelihood Estimation of Density Ratio. Pages 463—467 of: Proceedings of 2009 IEEE International Symposium on Information Theory (ISIT2009).

Suzuki, T., Sugiyama, M., Kanamori, T., and Sese, J. 2009b. Mutual Information Estimation Reveals Global Associations between Stimuli and Biological Processes. BMC Bioinformatics, 10(1), S52.CrossRef

Suzuki, T., Sugiyama, M., and Tanaka, Toshiyuki. 2011. Mutual Information Approximation via Maximum Likelihood Estimation of Density Ratio. in preparation. Takeuchi, I., Le, Q. V., Sears, T., D., and Smola, A. J. 2006. Nonparametric Quantile Estimation. Journal of Machine Learning Research, 7, 1231–1264.

Takeuchi, I., Nomura, K., and Kanamori, T. 2009. Nonparametric Conditional Density Estimation Using Piecewise-linear Solution Path of Kernel Quantile Regression. Neural Computation, 21(2), 533–559.CrossRef

Takeuchi, K. 1976. Distribution of Information Statistics and Validity Criteria of Models. Mathematical Science, 153, 12—18. in Japanese.

Takimoto, M., Matsugu, M., and Sugiyama, M. 2009. Visual Inspection of Precision Instruments by Least-Squares Outlier Detection. Pages 22-26 of: Proceedings of The Fourth International Workshop on Data-Mining and Statistical Science (DMSS2009).

Talagrand, M. 1996a. New Concentration Inequalities in Product Spaces. Inventiones Mathematicae, 126, 505–563.CrossRef

Talagrand, M. 1996b. ANew Look at Independence. The Annals of Statistics, 24, 1–34.

Tang, Y., and Zhang, H. H. 2006. Multiclass Proximal Support Vector Machines. Journal of Computational and Graphical Statistics, 15(2), 339—355.CrossRef

Tao, T., and Vu, V H. 2007. The Condition Number of a Randomly Perturbed Matrix. Pages 248—255 of: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing.New York: ACM.Google Scholar

Tax, D. M. J., and Duin, R. P. W. 2004. Support Vector Data Description. Machine Learning, 54(1), 45–66.CrossRef

Tenenbaum, J. B., de Silva, V., and Langford, J. C. 2000. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290(5500), 2319–2323.CrossRef

Teo, C. H., Le, Q., Smola, A., and Vishwanathan, S. V. N. 2007. A Scalable Modular Convex Solver for Regularized Risk Minimization. Pages 727—736 of: ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2007).Tibshirani, R. 1996. Regression Shrinkage and Subset Selection with the Lasso. Journal of the Royal Statistical Society, Series B, 58(1), 267–288.

Tipping, M. E., and Bishop, C. M. 1999. Mixtures of Probabilistic Principal Component Analyzers. Neural Computation, 11(2), 443–482.CrossRef

Tresp, V. 2001. Mixtures of Gaussian Processes. Pages 654-660 of: Leen, T. K., Dietterich, T. G., and Tresp, V. (eds), Advances in Neural Information Processing Systems 13.Cambridge, MA: MIT Press.Google Scholar

Tsang, I., Kwok, J., and Cheung, P.-M. 2005. Core Vector Machines: Fast SVM Training on Very Large Data Sets. Journal of Machine Learning Research, 6, 363–392.

Tsuboi, Y., Kashima, H., Hido, S., Bickel, S., and Sugiyama, M. 2009. Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation. Journal of Information Processing, 17, 138–155.CrossRef

Ueki, K., Sugiyama, M., and Ihara, Y. 2011. Lighting Condition Adaptation for Perceived Age Estimation. IEICE Transactions on Information and Systems, E94-D(2), 392—395.CrossRef

van de Geer, S. 2000. Empirical Processes in M-Estimation.Cambridge, UK: Cambridge University Press.Google Scholar

van der Vaart, A. W. 1998. Asymptotic Statistics.Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

vander Vaart, A. W., and Wellner, J. A. 1996. Weak Convergence and Empirical Processes. with Applications to Statistics.New York: Springer-Verlag.CrossRef Google Scholar

Vapnik, V. N. 1998. Statistical Learning Theory.New York: Wiley.Google Scholar

Wahba, G. 1990. Spline Models for Observational Data.Philadelphia, PA: Society for Industrial and Applied Mathematics.CrossRef Google Scholar

Wang, Q., Kulkarmi, S. R., and Verdú, S. 2005. Divergence Estimation of Contiunous Distributions Based on Data-Dependent Partitions. IEEE Transactions on Information Theory, 51(9), 3064–3074.CrossRef

Watanabe, S. 2009. Algebraic Geometry and Statistical Learning Theory.Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Weinberger, K., Blitzer, J., and Saul, L. 2006. Distance Metric Learning for Large Margin Nearest Neighbor Classification. Pages 1473-1480 of: Weiss, Y., Schölkopf, B., and Platt, J. (eds), Advances in Neural Information Processing Systems 18.Cambridge, MA: MIT Press.Google Scholar

Weisberg, S. 1985. Applied Linear Regression.New York: John Wiley.Google Scholar

Wichern, G., Yamada, M., Thornburg, H., Sugiyama, M., and Spanias, A. 2010 (Mar. 14—19). Automatic Audio Tagging Using Covariate Shift Adaptation. Pages 253-256 of: Proceedings of 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2010).

Wiens, D.P. 2000. Robust Weights and Designs for Biased Regression Models: Least Squares and Generalized M-Estimation. Journal of Statistical Planning and Inference, 83(2), 395–412.CrossRef

Williams, P. M. 1995. Bayesian Regularization and Pruning Using a Laplace Prior. Neural Computation, 7(1), 117–143.CrossRef

Wold, H. 1966. Estimation of Principal Components and Related Models by Iterative Least Squares. Pages 391-420 of: Krishnaiah, P. R.(ed),Multivariate Analysis.New York: Academic Press.Google Scholar

Wolff, R. C. L., Yao, Q., and Hall, P. 1999. Methods for Estimating a Conditional Distribution Function. Journal of the American Statistical Association, 94(445), 154—163.

Wu, T.-F., Lin, C.-J., and Weng, R. C. 2004. Probability Estimates for Multi-Class Classification by Pairwise Coupling. Journal of Machine Learning Research, 5, 975–1005.

Xu, L., Neufeld, J., Larson, B., and Schuurmans, D. 2005. Maximum Margin Clustering. Pages 1537—1544 of: Saul, L. K., Weiss, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 17.Cambridge, MA: MIT Press.Google Scholar

Xue, Y., Liao, X., Carin, L., and Krishnapuram, B. 2007. Multi-Task Learning for Classification with Dirichlet Process Priors. Journal of Machine Learning Research, 8, 35—63.

Yamada, M., and Sugiyama, M. 2009. Direct Importance Estimation with Gaussian Mixture Models. IEICE Transactions on Information and Systems, E92-D(10), 2159–2162.CrossRef

Yamada, M., and Sugiyama, M. 2010. Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise. Pages 643—648 of: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI2010).Atlanta, GA: AAAI Press.Google Scholar

Yamada, M., and Sugiyama, M. 2011a. Cross-Domain Object Matching with Model Selection. Pages 807—815 in Gordon, G., Dunson, D., and Dudík, M. (eds), Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, vol. 15.

Yamada, M., and Sugiyama, M. 2011b. Direct Density-Ratio Estimation with Dimensionality Reduction via Hetero-Distributional Subspace Analysis. Pages 549—554 of: Proceedings of the Twenty-Fifth AAAIConference on Artificial Intelligence (AAAI2011).San Francisco: AAAI Press.Google Scholar

Yamada, M., Sugiyama, M., Wichern, G., and Simm, J. 2010a. Direct Importance Estimation with a Mixture of Probabilistic Principal Component Analyzers. IEICE Transactions on Information and Systems, E93-D(10), 2846–2849.CrossRef

Yamada, M., Sugiyama, M., and Matsui, T. 2010b. Semi-supervised Speaker Identification under Covariate Shift. Signal Processing, 90(8), 2353–2361.CrossRef

Yamada, M., Sugiyama, M., Wichern, G., and Simm, J. 2011a. Improving the Accuracy of Least-Squares Probabilistic Classifiers. IEICE Transactions on Information and Systems, E94-D(6), 1337–1340.CrossRef

Yamada, M., Suzuki, T., Kanamori, T., Hachiya, H., and Sugiyama, M. 2011b. Relative Density-Ratio Estimation for Robust Distribution Comparison. To appear in “Advances in Neural Information Processing Systems,” vol. 24.

Yamada, M., Niu, G., Takagi, J., and Sugiyama, M. 2011c. Computationally Efficient Sufficient Dimension Reduction via Squared-Loss Mutual Information. Pages 247-262 of: C.-N., Hsu and W. S., Lee (eds.), Proceedings of the Third Asian Conference on Machine Learning (ACML 2011), JMLR Workshop and Conference Proceedings, vol. 20.

Yamanishi, K., and Takeuchi, J. 2002. A Unifying Framework for Detecting Outliers and Change Points from Non-Stationary Time Series Data. Pages 676-681 of: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002).

Yamanishi, K., Takeuchi, J., Williams, G., and Milne, P. 2004. On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. Data Mining and Knowledge Discovery, 8(3), 275–300.CrossRef

Yamazaki, K., Kawanabe, M., Watanabe, S., Sugiyama, M., and Müller, K.-R. 2007. Asymptotic Bayesian Generalization Error When Training and Test Distributions Are Different. Pages 1079—1086 of: Ghahramani, Z. (ed), Proceedings of 24th International Conference on Machine Learning (ICML2007).

Yankov, D., Keogh, E., and Rebbapragada, U. 2008. Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets. Knowledge and Information Systems, 17(2), 241–262.CrossRef

Yokota, T., Sugiyama, M., Ogawa, H., Kitagawa, K., and Suzuki, K. 2009. The Interpolated Local Model Fitting Method for Accurate and Fast Single-shot Surface Profiling. Applied Optics, 48(18), 3497–3508.CrossRef

Yu, K., Tresp, V., and Schwaighofer, A. 2005. Learning Gaussian Processes from Multiple Tasks. Pages 1012-1019 of: Proceedings of the 22nd International Conference on Machine Learning (ICML2005).New York: ACM.Google Scholar

Zadrozny, B. 2004. Learning and Evaluating Classifiers under Sample Selection Bias. Pages 903—910 of: Proceedings of the Twenty-First International Conference on Machine Learning (ICML2004).New York: ACM.Google Scholar

Zeidler, E. 1986. Nonlinear Functional Analysis and Its Applications, I: Fixed-Point Theorems.New York: Springer-Verlag.CrossRef Google Scholar

Zelnik-Manor, L., and Perona, P. 2005. Self-Tuning Spectral Clustering. Pages 1601—1608 of: Saul, L. K., Weiss, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 17.Cambridge, MA: MIT Press.Google Scholar

Zhu, L., Miao, B., and Peng, H. 2006. On Sliced Inverse Regression with High-Dimensional Covariates. Journal of the American Statistical Association, 101(474), 630—643.CrossRef

Book contents

Bibliography

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive