Skip to main content Accessibility help
×
Hostname: page-component-76fb5796d-vfjqv Total loading time: 0 Render date: 2024-04-27T19:44:55.243Z Has data issue: false hasContentIssue false

Bibliography

Published online by Cambridge University Press:  05 March 2012

Masashi Sugiyama
Affiliation:
Tokyo Institute of Technology
Taiji Suzuki
Affiliation:
University of Tokyo
Takafumi Kanamori
Affiliation:
Nagoya University, Japan
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agakov, F., and Barber, D. 2006. Kernelized Infomax Clustering. Pages 17—24 of: Weiss, Y., Schölkopf, B., and Platt, J. (eds), Advances in Neural Information Processing Systems 18.Cambridge, MA: MIT Press.Google Scholar
Aggarwal, C. C., and Yu, P. S. (eds). 2008. Privacy-Preserving Data Mining: Models and Algorithms.New York: Springer.CrossRefGoogle Scholar
Akaike, H. 1970. Statistical Predictor Identification. Annals of the Institute of Statistical Mathematics, 22, 203–217.CrossRef
Akaike, H. 1974. A New Look at the Statistical Model Identification. IEEE Transactions on Automatic Control, AC-19(6), 716–723.CrossRef
Akaike, H. 1980. Likelihood and the Bayes Procedure. Pages 141—166 of: Bernardo, J. M., DeGroot, M. H., Lindley, D. V., and Smith, A. F. M. (eds), Bayesian Statistics.Valencia, Spain: Valencia University Press.Google Scholar
Akiyama, T., Hachiya, H., and Sugiyama, M. 2010. Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning. Neural Networks, 23(5), 639—648.CrossRef
Ali, S. M., and Silvey, S. D. 1966. A General Class of Coefficients of Divergence of One Distribution from Another. Journal of the Royal Statistical Society, Series B, 28(1), 131—142.
Amari, S. 1967. Theory of Adaptive Pattern Classifiers. IEEE Transactions on Electronic Computers, EC-16(3), 299–307. Amari, S. 1998. Natural Gradient Works Efficiently in Learning. Neural Computation, 10(2), 251–276.CrossRef
Amari, S. 2000. Estimating Functions of Independent Component analysis for Temporally Correlated Signals. Neural Computation, 12(9), 2083–2107.CrossRef
Amari, S., and Nagaoka, H. 2000. Methods of Information Geometry.Providence, RI: Oxford University Press.Google Scholar
Amari, S., Fujita, N., and Shinomoto, S. 1992. Four Types of Learning Curves. Neural Computation, 4(4), 605–618.CrossRef
Amari, S., Cichocki, A., and Yang, H. H. 1996. A New Learning Algorithm for Blind Signal Separation. Pages 757—763 of: Touretzky, D. S., Mozer, M., C., and Hasselmo, M. E. (eds), Advances in Neural Information Processing Systems 8.Cambridge, MA: MIT Press.Google Scholar
Anderson, N., Hall, P., and Titterington, D. 1994. Two-Sample Test Statistics for Measuring Discrepancies between Two Multivariate Probability Density Functions Using Kernel-based Density Estimates. Journal of Multivariate Analysis, 50, 41–54.CrossRef
Ando, R. K., and Zhang, T. 2005. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. Journal of Machine Learning Research, 6, 1817—1853.
Antoniak, C. 1974. Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems. The Annals of Statistics, 2(6), 1152–1174.CrossRef
Aronszajn, N. 1950. Theory of Reproducing Kernels. Transactions of the American Mathematical Society, 68, 337–404.CrossRef
Bach, F., and Harchaoui, Z. 2008. DIFFRAC: A Discriminative and Flexible Framework for Clustering. Pages 49—56 of: Platt, J. C., Koller, D., Singer, Y., and Roweis, S. (eds), Advances in Neural Information Processing Systems 20.Cambridge, MA: MIT Press.Google Scholar
Bach, F., and Jordan, M. I. 2002. Kernel Independent Component Analysis. Journal of Machine Learning Research, 3, 1–8.
Bach, F., and Jordan, M. I. 2006. Learning Spectral Clustering, with Application to Speech Separation. Journal of Machine Learning Research, 7, 1963—2001.
Bachman, G., and Narici, L. 2000. Functional Analysis.Mineola, NY: Dover Publications.Google Scholar
Bakker, B., and Heskes, T. 2003. Task Clustering and Gating for Bayesian Multitask Learning. Journal of Machine Learning Research, 4, 83—99.
Bartlett, P., Bousquet, O., and Mendelson, S. 2005. Local Rademacher Complexities. The Annals of Statistics, 33, 1487–1537.CrossRef
Basseville, M., and Nikiforov, V. 1993. Detection of Abrupt Changes: Theory and Application. Englewood Cliffs, NJ: Prentice-Hall, Inc.Google Scholar
Basu, A., Harris, I. R., Hjort, N. L., and Jones, M. C. 1998. Robust and Efficient Estimation by Minimising a Density Power Divergence. Biometrika, 85(3), 549–559.CrossRef
Baxter, J. 1997. A Bayesian/Information Theoretic Model of Learning to Learn via Mutiple Task Sampling. Machine Learning, 28, 7—39.CrossRef
Baxter, J. 2000. A Model of Inductive Bias Learning. Journal of Artificail Intelligence Research, 12, 149–198.
Belkin, M., and Niyogi, P. 2003. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, 15(6), 1373—1396.CrossRef
Bellman, R. 1961. Adaptive Control Processes: A Guided Tour.Princeton, NJ: Princeton University Press.CrossRefGoogle Scholar
Ben-David, S., and Schuller, R. 2003. Exploiting Task Relatedness for Multiple Task Learning. Pages 567-580 of: Proceedings of the Sixteenth Annual Conference on Learning Theory (COLT2003).
Ben-David, S., Gehrke, J., and Schuller, R. 2002. A Theoretical Framework for Learning from a Pool of Disparate Data Sources. Pages 443–49 of: Proceedings of The Eighth ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002).
Bensaid, N., and Fabre, J. P. 2007. Optimal Asymptotic Quadratic Error of Kernel Estimators of Radon-Nikodym Derivatives for Strong Mixing Data. Journal of Nonparametric Statistics, 19(2), 77–88.CrossRef
Bertsekas, D., Nedic, A., and Ozdaglar, A. 2003. Convex Analysis and Optimization.Belmont, MA: Athena Scientific.Google Scholar
Best, M. J. 1982. An Algorithm for the Solution of the Parametric Quadratic Programming Problem. Tech. rept. 82–24. Faculty of Mathematics, University of Waterloo.Google Scholar
Biau, G., and Györfi, L. 2005. On the Asymptotic Properties of a Nonparametric l1-test Statistic of Homogeneity. IEEE Transactions on Information Theory, 51(11), 3965—3973.CrossRef
Bickel, P. 1969. A Distribution Free Version of the Smirnov Two Sample Test in the p-variate Case. The Annals of Mathematical Statistics, 40(1), 1–23.CrossRef
Bickel, S., Brückner, M., and Scheffer, T. 2007. Discriminative Learning for Differing Training and Test Distributions. Pages 81—88 of: Proceedings of the 24th International Conference on Machine Learning (ICML2007).
Bickel, S., Bogojeska, J., Lengauer, T., and Scheffer, T. 2008. Multi-Task Learning for HIV Therapy Screening. Pages 56-63 of: McCallum, A., and Roweis, S. (eds), Proceedings of 25th Annual International Conference on Machine Learning (ICML2008).
Bishop, C. M. 1995. Neural Networks for Pattern Recognition.Oxford, UK: Clarendon Press.Google Scholar
Bishop, C. M. 2006. Pattern Recognition and Machine Learning.New York: Springer.Google Scholar
Blanchard, G., Kawanabe, M., Sugiyama, M., Spokoiny, V., and Müller, K.-R. 2006. In Search of Non-Gaussian Components of a High-dimensional Distribution. Journal of Machine Learning Research, 7(Feb.), 247–282.
Blei, D.M., and Jordan, M.I. 2006. Variational Inference for Dirichlet Process Mixtures. Bayesian Analysis, 1(1), 121–144.CrossRef
Bolton, R. J., and Hand, D. J. 2002. Statistical Fraud Detection: A Review. Statistical Science, 17(3), 235–255.
Bonilla, E., Chai, K. M., and Williams, C. 2008. Multi-Task Gaussian Process Prediction. Pages 153-160 of: Platt, J. C, Roller, D., Singer, Y., and Roweis, S. (eds), Advances in Neural Information Processing Systems 20.Cambridge, MA: MIT Press.Google Scholar
Borgwardt, K. M., Gretton, A., Rasch, M. J., Kriegel, H.-P., Schölkopf, B., and Smola, A. J. 2006. Integrating Structured Biological Data by Kernel Maximum Mean Discrepancy. Bioinformatics, 22(14), e49-e57.CrossRef
Bousquet, O. 2002. A Bennett Concentration Inequality and its Application to Suprema of Empirical Process. Note aux Compte Rendus de l'Académie des Sciences de Paris, 334, 495–500.
Boyd, S., and Vandenberghe, L. 2004. Convex Optimization.Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Bradley, A. P. 1997. The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognition, 30(7), 1145—1159.CrossRef
Bregman, L. M. 1967. The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming. USSR Computational Mathematics and Mathematical Physics, 7, 200—217.CrossRef
Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. 2000. LOF: Identifying Density-Based Local Outliers. Pages 93-104 of: Chen, W., Naughton, J. F., and Bernstein, P. A. (eds), Proceedings of the ACM SIGMOD International Conference on Management of Data.
Brodsky, B., and Darkhovsky, B. 1993. Nonparametric Methods in Change-Point Problems. Dordrecht, the Netherlands: Kluwer Academic Publishers. Broniatowski, M., and Keziou, A. 2009. Parametric Estimation and Tests through Divergences and the Duality Technique. Journal of Multivariate Analysis, 100, 16–26.
Buhmann, J. M. 1995. Data Clustering and Learning. Pages 278-281 of: Arbib, M. A. (ed), The Handbook of Brain Theory and Neural Networks.Cambridge, MA: MIT Press.Google Scholar
Bura, E., and Cook, R. D. 2001. Extending Sliced Inverse Regression. Journal of the American Statistical Association, 96(455), 996–1003.CrossRef
Caponnetto, A., and de Vito, E. 2007. Optimal Rates for Regularized Least-Squares Algorithm. Foundations of Computational Mathematics, 7(3), 331—368.CrossRef
Cardoso, J.-F. 1999. High-Order Contrasts for Independent Component Analysis. Neural Computation, 11(1), 157–192.CrossRef
CardosoJ.-F., J.-F., and Souloumiac, A. 1993. Blind Beamforming for Non-Gaussian Signals. Radar and Signal Processing, IEE Proceedings-F, 140(6), 362—370.CrossRef
Caruana, R., Pratt, L., and Thrun, S. 1997. Multitask Learning. Machine Learning, 28, 41–75.CrossRef
Cesa-Bianchi, N., and Lugosi, G. 2006. Prediction, Learning, and Games.Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Chan, J., Bailey, J., and Leckie, C. 2008. Discovering Correlated Spatio-Temporal Changes in Evolving Graphs. Knowledge and Information Systems, 16(1), 53–96.CrossRef
Chang, C. C., and Lin, C. J. 2001. LIBSVM: A Library for Support Vector Machines. Tech. rept. Department of Computer Science, National Taiwan University. http://www.csie.ntu.edu.tw/∼cjlin/libsvm/.
Chapelle, O., Schölkopf, B., and Zien, A. (eds). 2006. Semi-Supervised Learning.Cambridge, MA: MIT Press.CrossRefGoogle Scholar
Chawla, N. V., Japkowicz, N., and Kotcz, A. 2004. Editorial: Special Issue on Learning from Imbalanced Data Sets. ACMSIGKDD Explorations Newsletter, 6(1), 1–6. Chen, S.-M., Hsu, Y.-S., and Liaw, J.-T. 2009. On Kernel Estimators of Density Ratio. Statistics, 43(5), 463–79.CrossRef
Chen, S. S., Donoho, D. L., and Saunders, M. A. 1998. Atomic Decomposition by Basis Pursuit. SIAM Journal on Scientific Computing, 20(1), 33—61.CrossRef
Cheng, K. F., and Chu, C. K. 2004. Semiparametric Density Estimation under a Two-sample Density Ratio Model. Bernoulli, 10(4), 583–604.CrossRef
Chiaromonte, F., and Cook, R. D. 2002. Sufficient Dimension Reduction and Graphics in Regression. Annals of the Institute of Statistical Mathematics, 54(4), 768—795.CrossRef
Cichocki, A., and Amari, S. 2003. Adaptive Blind Signal and Image Processing: LearningAlgorithms and Applications.New York: Wiley.Google Scholar
Cohn, D. A., Ghahramani, Z., and Jordan, M. I. 1996. Active Learning with Statistical Models. Journal of Artificial Intelligence Research, 4, 129—145.
Collobert, R., and Bengio., S. 2001. SVMTorch: Support Vector Machines for Large-Scale Regression Problems. Journal of Machine Learning Research, 1, 143–160.
Comon, P. 1994. Independent Component Analysis, A New Concept? Signal Processing, 36(3), 287–314.CrossRef
Cook, R. D. 1998a. Principal Hessian Directions Revisited. Journal of the American Statistical Association, 93(441), 84–100.CrossRef
Cook, R. D. 1998b. Regression Graphics: Ideas for Studying Regressions through Graphics.New York: Wiley.CrossRefGoogle Scholar
Cook, R. D. 2000. SAVE: A Method for Dimension Reduction and Graphics in Regression. Communications in Statistics-Theory and Methods, 29(9), 2109–2121.CrossRef
Cook, R. D., and Forzani, L. 2009. Likelihood-Based Sufficient Dimension Reduction. Journal of the American Statistical Association, 104(485), 197–208.CrossRef
Cook, R. D., and Ni, L. 2005. Sufficient Dimension Reduction via Inverse Regression. Journal of the American Statistical Association, 100(470), 410–428.CrossRef
Cortes, C., and Vapnik, V. 1995. Support-Vector Networks. Machine Learning, 20, 273–297.
Cover, T. M., and Thomas, J. A. 2006. Elements of Information Theory. 2nd edn. Hoboken, NJ: Wiley.Google Scholar
Cramér, H. 1946. Mathematical Methods of Statistics.Princeton, NJ: Princeton University Press.Google Scholar
Craven, P., and Wahba, G. 1979. Smoothing Noisy Data with Spline Functions: Estimating the Correct Degree of Smoothing by the Method of Generalized Cross-Validation. Numerische Mathematik, 31, 377–403.CrossRef
Csiszár, I. 1967. Information-Type Measures of Difference of Probability Distributions and Indirect Observation. Studia Scientiarum Mathematicarum Hungarica, 2, 229–318.
Cwik, J., and Mielniczuk, J. 1989. Estimating Density Ratio with Application to Discriminant Analysis. Communications in Statistics: Theory and Methods, 18(8), 3057—3069.CrossRef
Darbellay, G. A., and Vajda, I. 1999. Estimation of the Information by an Adaptive Partitioning of the Observation Space. IEEE Transactions on Information Theory, 45(4), 1315–1321.CrossRef
Davis, J., Kulis, B., Jain, P., Sra, S., and Dhillon, I. 2007. Information-Theoretic Metric Learning. Pages 209—216 of: Ghahramani, Z. (ed), Proceedings of the 24th Annual International Conference on Machine Learning (ICML2007).
Demmel, J. W.1997. Applied Numerical Linear Algebra.Philadelphia, PA: Society for Industrial and Applied Mathematics.
Dempster, A. P., Laird, N. M., and Rubin, D. B. 1977. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, series B, 39(1), 1—38.
Dhillon, I. S., Guan, Y., and Kulis, B. 2004. Kernel K-Means, Spectral Clustering and Normalized Cuts. Pages 551-556 of: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York: ACM Press.Google Scholar
Donoho, D. L., and Grimes, C. E. 2003. Hessian Eigenmaps: Locally Linear Embedding Techniques for High-Dimensional Data. Pages 5591-5596 of: Proceedings of the National Academy of Arts and Sciences.
Duda, R. O., Hart, P. E., and Stork, D. G. 2001. Pattern Classification. 2nd edn. New York: Wiley.Google Scholar
Duffy, N., and Collins, M. 2002. Convolution Kernels for Natural Language. Pages 625-632 of: Dietterich, T. G., Becker, S., and Ghahramani, Z. (eds), Advances in Neural Information Processing Systems 14.Cambridge, MA: MIT Press.Google Scholar
Durand, J., and Sabatier, R. 1997. Additive Splines for Partial Least Squares Regression. Journal of the American Statistical Association, 92(440), 1546–1554.CrossRef
Edelman, A. 1988. Eigenvalues and Condition Numbers of Random Matrices. SIAM Journal on Matrix Analysis and Applications, 9(4), 543—560.CrossRef
Edelman, A., and Sutton, B. D. 2005. Tails of Condition Number Distributions. SIAM Journal on Matrix Analysis and Applications, 27(2), 547–560.CrossRef
Edelman, A., Arias, T. A., and Smith, S. T. 1998. The Geometry of Algorithms with Orthogonality Constraints. SIAM Journal on Matrix Analysis and Applications, 20(2), 303—353.CrossRef
Efron, B. 1975. The Efficiency of Logistic Regression Compared to Normal Discriminant Analysis. Journal of the American Statistical Association, 70(352), 892–898.CrossRef
Efron, B., and Tibshirani, R. J. 1993. An Introduction to the Bootstrap.New York: Chapman & Hall/CRC.CrossRefGoogle Scholar
Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. 2004. Least Angle Regression. The Annals of Statistics, 32(2), 407–499.
Elkan, C. 2011. Privacy-Preserving Data Mining via Importance Weighting. In C, Dimitrakakis, A, Gkoulalas-Divanis, A, Mitrokotsa, V. S, Verykios, and Y., Saygin (Eds.): Privacy and Security Issues in Data Mining and Machine Learning, 15—21, Berlin: Springer.Google Scholar
Evgeniou, T., and Pontil, M. 2004. Regularized Multi-Task Learning. Pages 109-117 of: Proceedings of the Tenth A CM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2004).
Faivishevsky, L., and Goldberger, J. 2009. ICA based on a Smooth Estimation of the Differential Entropy. Pages 433-440 of: Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 21.Cambridge, MA: MIT Press.Google Scholar
Faivishevsky, L., and Goldberger, J. 2010 (Jun. 21—25). A Nonparametric Information Theoretic Clustering Algorithm. Pages 351—358 of: Joachims, A. T., and Fürnkranz, J. (eds), Proceedings of 27th International Conference on Machine Learning (ICML2010).
Fan, H., Zaïane, O. R., Foss, A., and Wu, J. 2009. Resolution-Based Outlier Factor: Dtecting the Top-n Most Outlying Data Points in Engineering Data. Knowledge and Information Systems, 19(1), 31–51.CrossRef
Fan, J., Yao, Q., and Tong, H. 1996. Estimation of Conditional Densities and Sensitivity Measures in Nonlinear Dynamical Systems. Biometrika, 83(1), 189–206.CrossRef
Fan, R. -E., Chen, P.-H., and Lin, C.-J. 2005. Working Set Selection Using Second Order Information for Training SVM. Journal of Machine Learning Research, 6, 1889—1918.
Fan, R. -E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., and Lin, C.-J. 2008. LIBLINEAR: ALibrary for Large Linear Classification. Journal of Machine Learning Research, 9, 1871–1874.
Fedorov, V. V. 1972. Theory of Optimal Experiments. New York: Academic Press. Fernandez, E. A. 2005. The dprep Package. Tech. rept. University of Puerto Rico.Google Scholar
Feuerverger, A. 1993. A Consistent Test for Bivariate Dependence. International Statistical Review, 61(3), 419–433.CrossRef
Fisher, R. A. 1936. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2), 179–188.CrossRef
Fishman, G. S. 1996. Monte Carlo: Concepts, Algorithms, and Applications.Berlin, Germany: Springer-Verlag.CrossRefGoogle Scholar
Fokianos, K., Kedem, B., Qin, J., and Short, D. A. 2001. A Semiparametric Approach to the One-Way Layout. Technometrics, 43, 56—64.CrossRef
Franc, V., and Sonnenburg, S. 2009. Optimized Cutting Plane Algorithm for Large-Scale Risk Minimization. Journal of Machine Learning Research, 10, 2157–2192.
Fraser, A. M., and Swinney, H. L. 1986. Independent Coordinates for Strange Attractors from Mutual Information. Physical Review A, 33(2), 1134–1140.CrossRef
Friedman, J., and Rafsky, L. 1979. Multivariate Generalizations of the Wald-Wolfowitz and Smirnov Two-Sample Tests. The Annals of Statistics, 7(4), 697–717.CrossRef
Friedman, J. H. 1987. Exploratory Projection Pursuit. Journal of the American Statistical Association, 82(397), 249–266.CrossRef
Friedman, J. H., and Tukey, J. W. 1974. A Projection Pursuit Algorithm for Exploratory Data Analysis. IEEE Transactions on Computers, C-23(9), 881–890.CrossRef
Fujimaki, R., Yairi, T., and Machida, K. 2005. An Approachh to Spacecraft Anomaly Detection Problem Using Kernel Feature Space. Pages 401—410 of: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2005).
Fujisawa, H., and Eguchi, S. 2008. Robust Parameter Estimation with a Small Bias against Heavy Contamination. Journal of Multivariate Analysis, 99(9), 2053–2081.CrossRef
Fukumizu, K. 2000. Statistical Active Learning in Multilayer Perceptrons. IEEE Transactions on Neural Networks, 11(1), 17–26.CrossRef
Fukumizu, K., Bach, F. R., and Jordan, M. I. 2004. Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces. Journal of Machine Learning Research, 5(Jan), 73–99.
Fukumizu, K., Bach, F. R., and Jordan, M. I. 2009. Kernel Dimension Reduction in Regression. The Annals of Statistics, 37(4), 1871–1905.CrossRef
Fukunaga, K. 1990. Introduction to Statistical Pattern Recognition. 2nd edn. Boston, MA: Academic Press, Inc.Google Scholar
Fung, G. M., and Mangasarian, O. L. 2005. Multicategory Proximal Support Vector Machine Classifiers. Machine Learning, 59(1-2), 77–97.CrossRef
Gao, J., Cheng, H., and Tan, P.-N. 2006a. A Novel Framework for Incorporating Labeled Examples into Anomaly Detection. Pages 593—597 of: Proceedings of the 2006 SIAMInternational Conference on Data Mining.
Gao, J., Cheng, H., and Tan, P.-N. 2006b. Semi-Supervised Outlier Detection. Pages 635-636 of: Proceedings of the 2006 ACM symposium on Applied Computing. Gärtner, T. 2003. A Survey of Kernels for Structured Data. SIGKDD Explorations, 5(1), S268–S275.
Gärtner, T., Flach, P., and Wrobel, S. 2003. On Graph Kernels: Hardness Results and Efficient Alternatives. Pages 129-143 of: Schölkopf, B., and Warmuth, M. (eds), Proceedings of the Sixteenth Annual Conference on Computational Learning Theory.
Ghosal, S., and van der Vaart, A. W. 2001. Entropies and Rates of Convergence for Maximum Likelihood and Bayes Estimation for Mixtures of Normal Densities. Annals of Statistics, 29, 1233–1263.CrossRef
Globerson, A., and Roweis, S. 2006. Metric Learning by Collapsing Classes. Pages 451-58 of: Weiss, Y., Schölkopf, B., and Platt, J. (eds), Advances in Neural Information Processing Systems 18.Cambridge, MA: MIT Press.Google Scholar
Godambe, V. P. 1960. An Optimum Property of Regular Maximum Likelihood Estimation. Annals of Mathematical Statistics, 31, 1208–1211.CrossRef
Goldberger, J., Roweis, S., Hinton, G., and Salakhutdinov, R. 2005. Neighbourhood Components Analysis. Pages 513—520 of: Saul, L. K., Weiss, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 17.Cambridge, MA: MIT Press.Google Scholar
Golub, G. H., and Loan, C. F. Van. 1996. Matrix Computations.Baltimore, MD: Johns Hopkins University Press.Google Scholar
Gomes, R., Krause, A., and Perona, P. 2010. Discriminative Clustering by Regularized Information Maximization. Pages 766-77 4 of: Lafferty, J., Williams, C. K. I., Zemel, R., Shawe-Taylor, J., and Culotta, A. (eds), Advances in Neural Information Processing Systems 23.Cambridge, MA: MIT Press.Google Scholar
Goutis, C., and Fearn, T. 1996. Partial Least Squares Regression on Smooth Factors. Journal of the American Statistical Association, 91(434), 627–632.CrossRef
Graham, D. B., and Allinson, N. M. 1998. Characterizing Virtual Eigensignatures for General Purpose Face Recognition. Pages 446-56 of: Computer and Systems Sciences. NATO ASI Series F, vol. 163. Berlin, Germany: Springer.Google Scholar
Gretton, A., Bousquet, O., Smola, A., and Schölkopf, B. 2005. Measuring Statistical Dependence with Hilbert-Schmidt Norms. Pages 63-77 of: Jain, S., Simon, H. U., and Tomita, E. (eds), Algorithmic Learning Theory. Lecture Notes in Artificial Intelligence. Berlin, Germany: Springer-Verlag.Google Scholar
Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B., and Smola, A. J. 2007. AKernel Method for the Two-Sample-Problem. Pages 513-520 of: Schölkopf, B., Platt, J., and Hoffman, T. (eds), Advances in Neural Information Processing Systems 19.Cambridge, MA: MIT Press.Google Scholar
Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B., and Smola, A. 2008. A Kernel Statistical Test of Independence. Pages 585-592 of: Platt, J. C, Roller, D., Singer, Y., and Roweis, S. (eds), Advances in Neural Information Processing Systems 20.Cambridge, MA: MIT Press.Google Scholar
Gretton, A., Smola, A., Huang, J., Schmittfull, M., Borgwardt, K., and Schölkopf, B. 2009. Covariate Shift by Kernel Mean Matching. Chap. 8, pages 131—160 of: Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N. (eds), Dataset Shift in Machine Learning.Cambridge, MA: MIT Press.Google Scholar
Guralnik, V., and Srivastava, J. 1999. Event Detection from Time Series Data. Pages 33-2 of: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD1999).
Gustafsson, F. 2000. Adaptive Filtering and Change Detection.Chichester, UK: Wiley.Google Scholar
Guyon, I., and Elisseeff, A. 2003. An Introduction to Variable Feature Selection. Journal of Machine Learning Research, 3, 1157—1182.
Hachiya, H., Akiyama, T., Sugiyama, M., and Peters, J. 2009. Adaptive Importance Sampling for Value Function Approximation in Off-policy Reinforcement Learning. Neural Networks, 22(10), 1399–1410.CrossRef
Hachiya, H., Sugiyama, M., and Ueda, N. 201 1a. Importance-Weighted Least-Squares Probabilistic Classifier for Covariate Shift Adaptation with Application to Human Activity Recognition. Neurocomputing. To appear. Hachiya, H., Peters, J., and Sugiyama, M. 2011b. Reward Weighted Regression with Sample Reuse. Neural Computation, 23(11), 2798–2832.
Hall, P., and Tajvidi, N. 2002. Permutation Tests for Equality of Distributions in Highdimensional Settings. Biometrika, 89(2), 359–374.CrossRef
Härdle, W., Müller, M., Sperlich, S., and Werwatz, A. 2004. Nonparametric and Semiparametric Models.Berlin, Germany: Springer.CrossRefGoogle Scholar
Hartigan, J. A. 1975. Clustering Algorithms.New York: Wiley.Google Scholar
Hastie, T., and Tibshirani, R. 1996a. Discriminant Adaptive Nearest Neighbor Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(6), 607–615.CrossRef
Hastie, T., and Tibshirani, R. 1996b. Discriminant Analysis by Gaussian mixtures. Journal of the Royal Statistical Society, Series B, 58(1), 155–176.
Hastie, T., Tibshirani, R., and Friedman, J. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction.New York: Springer.CrossRefGoogle Scholar
Hastie, T., Rosset, S., Tibshirani, R., and Zhu, J. 2004. The Entire Regularization Path for the Support Vector Machine. Journal of Machine Learning Research, 5, 1391—1415.
He, X., and Niyogi, P. 2004. Locality Preserving Projections. Pages 153-160 of: Thrun, S., Saul, L., and Schölkopf, B. (eds), Advances in Neural Information Processing Systems 16.Cambridge, MA: MIT Press.Google Scholar
Heckman, J. J. 1979. Sample Selection Bias as a Specification Error. Econometrica, 47(1), 153–161.CrossRef
Henkel, R. E. 1976. Tests of Significance.Beverly Hills, CA: Sage.CrossRefGoogle Scholar
Hido, S., Tsuboi, Y., Kashima, H., Sugiyama, M., and Kanamori, T. 2011. Statistical Outlier Detection Using Direct Density Ratio Estimation. Knowledge and Information Systems, 26(2), 309–336.CrossRef
Hinton, G. E., and Salakhutdinov, R. R. 2006. Reducing the Dimensionality of Data with Neural Networks. Science, 313(5786), 504–507.CrossRef
Hodge, V., and Austin, J. 2004. A Survey of Outlier Detection Methodologies. Artificial Intelligence Review, 22(2), 85–126.CrossRef
Hoerl, A. E., and Kennard, R. W. 1970. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 12(3), 55—67.CrossRef
Horn, R., and Johnson, C. 1985. Matrix Analysis.Cambridge, UK: Cambridge University Press. Hotelling, H. 1936. Relations between Two Sets of Variates. Biometrika, 28(3-4), 321–377.CrossRefGoogle Scholar
Hotelling, H. 1951. A Generalized T Test and Measure of Multivariate Dispersion. Pages 23-41 of: Proceedings of the 2nd Berkeley Symposium on Mathematical Statistics and Probability.Berkeley: University of California Press.Google Scholar
Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J., and Schölkopf, B. 2009. Nonlinear Causal Discovery with Additive Noise Models. Pages 689-696 of: Koller, D., Schuurmans, D., Bengio, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 21.Cambridge, MA: MIT Press.Google Scholar
Huang, J., Smola, A., Gretton, A., Borgwardt, K. M., and Schölkopf, B. 2007. Correcting Sample Selection Bias by Unlabeled Data. Pages 601-608 of: Schölkopf, B., Platt, J., and Hoffman, T. (eds), Advances in Neural Information Processing Systems 19.Cambridge, MA: MIT Press. Huber, P. J. 1985. Projection Pursuit. The Annals of Statistics, 13(2), 435–75.Google Scholar
Hulle, M. M. Van. 2005. Edgeworth Approximation of Multivariate Differential Entropy. Neural Computation, 17(9), 1903–1910.CrossRef
Hulle, M.M. Van. 2008. Sequential Fixed-Point ICABased on Mutual Information Minimization. Neural Computation, 20(5), 1344–1365.CrossRef
Hyvaerinen, A. 1999. Fast and Robust Fixed-Point Algorithms for Independent Component Analysis. IEEE Transactions on Neural Networks, 10(3), 626.CrossRef
Hyvärinen, A., Karhunen, J., and Oja, E. 2001. Independent Component Analysis. New York: Wiley. Ide, T., and Kashima, H. 2004. Eigenspace-Based Anomaly Detection in Computer Systems. Pages 440-449 of: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2004).
Ishiguro, M., Sakamoto, Y., and Kitagawa, G. 1997. Bootstrapping Log Likelihood and EIC, an Extension of AIC. Annals of the Institute of Statistical Mathematics, 49, 411—434.CrossRef
Jacoba, P., and Oliveirab, P. E. 1997. Kernel Estimators of General Radon-Nikodym Derivatives. Statistics, 30, 25–46.CrossRef
Jain, A. K., and Dubes, R. C. 1988. Algorithms for Clustering Data.Englewood Cliffs, NJ: Prentice Hall.Google Scholar
Jaynes, E. T. 1957. Information Theory and Statistical Mechanics. Physical Review, 106(4), 620–630.CrossRef
Jebara, T. 2004. Kernelized Sorting, Permutation and Alignment for Minimum Volume PCA. Pages 609—623 of: 17th Annual Conference on Learning Theory (COLT2004).
Jiang, X., and Zhu, X. 2009. v Eye: Behavioral Footprinting for Self-Propagating Worm Detection and Profiling. Knowledge and Information Systems, 18(2), 231–262.CrossRef
Joachims, T. 1999. Making Large-Scale SVM Learning Practical. Pages 169-184 of: Schölkopf, B., Burges, C. J. C., and Smola, A. J. (eds), Advances in Kernel Methods—Support Vector Learning.Cambridge, MA: MIT Press.Google Scholar
Joachims, T. 2006. Training Linear SVMs in Linear Time. Pages 217-226 of: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2006).
Jolliffe, I. T. 1986. Principal Component Analysis.New York: Springer-Verlag.CrossRefGoogle Scholar
Jones, M. C., Hjort, N.L., Harris, I. R., and Basu, A. 2001. A Comparison of Related Density-based Minimum Divergence Estimators. Biometrika, 88, 865–873.CrossRef
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. 1999. An Introduction to Variational Methods for Graphical Models. Machine Learning, 37(2), 183.CrossRef
Jutten, C., and Herault, J. 1991. Blind Separation of Sources, Part I: An Adaptive algorithm Based on Neuromimetic Architecture. Signal Processing, 24(1), 1–10.CrossRef
Kanamori, T. 2007. Pool-Based Active Learning with Optimal Sampling Distribution and its Information Geometrical Interpretation. Neurocomputing, 71(1—3), 353—362.CrossRef
Kanamori, T., and Shimodaira, H. 2003. Active Learning Algorithm Using the Maximum Weighted Log-Likelihood Estimator. Journal of Statistical Planning and Inference, 116(1), 149–162.CrossRef
Kanamori, T., Hido, S., and Sugiyama, M. 2009. A Least-squares Approach to Direct Importance Estimation. Journal of Machine Learning Research, 10(Jul.), 1391—1445.
Kanamori, T., Suzuki, T., and Sugiyama, M. 2010. Theoretical Analysis of Density Ratio Estimation. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E93-A(4), 787–798.CrossRef
Kanamori, T., Suzuki, T., and Sugiyama, M. 2011a. f-Divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models. IEEE Transactions on Information Theory. To appear.
Kanamori, T., Suzuki, T., and Sugiyama, M. 2011b. Statistical Analysis of Kernel-Based Least-Squares Density-Ratio Estimation. Machine Learning. To appear.
Kanamori, T., Suzuki, T., and Sugiyama, M. 2011c. Kernel-Based Least-Squares Density-Ratio Estimation II. Condition Number Analysis. Machine Learning. submitted.
Kankainen, A. 1995. Consistent Testing of Total Independence Based on the Empirical Characteristic Function. Ph.D. thesis, University of Jyväskylä, Jyväskylä, Finland. Karatzoglou, A., Smola, A., Hornik, K., and Zeileis, A. 2004. kernlab—An S4 Package for Kernel Methods in R. Journal of Statistical Planning and Inference, 11(9), 1–20.
Kashima, H., and Koyanagi, T. 2002. Kernels for Semi-Structured Data. Pages 291-298 of: Proceedings of the Nineteenth International Conference on Machine Learning.Kashima, H., Tsuda, K., and Inokuchi, A. 2003. Marginalized Kernels between Labeled Graphs. Pages 321-328 of: Proceedings of the Twentieth International Conference on Machine Learning.
Kato, T., Kashima, H., Sugiyama, M., and Asai, K. 2010. Conic Programming for Multi-Task Learning. IEEE Transactions on Knowledge and Data Engineering, 22(7), 957—968.CrossRef
Kawahara, Y., and Sugiyama, M. 2011. Sequential Change-Point Detection Based on Direct Density-Ratio Estimation. Statistical Analysis and Data Mining. To appear. Kawanabe, M., Sugiyama, M., Blanchard, G., and Müller, K.-R. 2007. A New Algorithm of Non-Gaussian Component Analysis with Radial Kernel Functions. Annals of the Institute of Statistical Mathematics, 59(1), 57–75.
Ke, Y., Sukthankar, R., and Hebert, M. 2007. Event Detection in Crowded Videos. Pages 1-8 of: Proceedings of the 11th IEEE International Conference on Computer Vision (ICCV2007).
Keziou, A. 2003a. Dual Representation of ϕ-Divergences and Applications. Comptes Rendus Mathématique, 336(10), 857–862.CrossRef
Keziou, A. 2003b. Utilisation Des Divergences Entre Mesures en Statistique Inferentielle. Ph.D. thesis, UPMC University. in French.
Keziou, A., and Leoni-Aubin, S. 2005. Test of Homogeneity in Semiparametric Two-sample Density Ratio Models. Comptes Rendus Mathématique, 340(12), 905—910.CrossRef
Keziou, A., and Leoni-Aubin, S. 2008. On Empirical Likelihood for Semiparametric Two-Sample Density Ratio Models. Journal of Statistical Planning and Inference, 138(4), 915–928.CrossRef
Khan, S., Bandyopadhyay, S., Ganguly, A., and Saigal, S. 2007. Relative Performance of Mutual Information Estimation Methods for Quantifying the Dependence among Short and Noisy Data. Physical Review E, 76, 026209.CrossRef
Kifer, D., Ben-David, S, and Gehrke, J. 2004. Detecting Change in Data Streams. Pages 180-191 of: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB2004).
Kimeldorf, G. S., and Wahba, G. 1971. Some Results on Tchebycheffian Spline Functions. Journal of Mathematical Analysis and Applications, 33(1), 82–95.CrossRef
Kimura, M., and Sugiyama, M. 2011. Dependence-Maximization Clustering with Least-Squares Mutual Information. Journal of Advanced Computational Intelligence and Intelligent Informatics, 15(7), 800–805.CrossRef
Koh, K., Kim, S.-J., and Boyd, S. P. 2007. An Interior-point Method for Large-Scale h-Regularized Logistic Regression. Journal of Machine Learning Research, 8, 1519–1555.
Kohonen, T. 1988. Learning Vector Quantization. Neural Networks, 1 (Supplementary 1), 303.
Kohonen, T. 1995. Self-Organizing Maps.Berlin, Germany: Springer.CrossRefGoogle Scholar
Koltchinskii, V 2006 Local Rademacher Complexities and Oracle Inequalities in Risk Minimization. The Annals of Statistics, 34, 2593—2656.CrossRef
Kondor, R. I., and Lafferty, J. 2002. Diffusion Kernels on Graphs and Other Discrete Input Spaces. Pages 315-322 of: Proceedings of the Nineteenth International Conference on Machine Learning.Konishi, S., and Kitagawa, G. 1996. Generalized Information Criteria in Model Selection. Biometrika, 83(4), 875–890.
Korostelëv, A. P., and Tsybakov, A. B. 1993. Minimax Theory of Image Reconstruction.New York: Springer.CrossRefGoogle Scholar
Kraskov, A., Stögbauer, H., and Grassberger, P. 2004. Estimating Mutual Information. Physical Review E, 69(6), 066138.CrossRef
Kullback, S. 1959. Information Theory and Statistics.New York: Wiley.Google Scholar
Kullback, S., and Leibler, R. A. 1951. On Information and Sufficiency. Annals of Mathematical Statistics, 22, 79–86.CrossRef
Kurihara, N., Sugiyama, M., Ogawa, H., Kitagawa, K., and Suzuki, K. 2010. Iteratively-Reweighted Local Model Fitting Method for Adaptive and Accurate Single-Shot Surface Profiling. Applied Optics, 49(22), 4270–4277.CrossRef
Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. Pages 282—289 of: Proceedings of the 18th International Conference on Machine Learning.
Lagoudakis, M. G., and Parr, R. 2003. Least-Squares Policy Iteration. Journal of Machine Learning Research, 4, 1107—1149.Google Scholar
Lapedriza, À., Masip, D., and Vitrià, J. 2007. A Hierarchical Approach for Multi-task Logistic Regression. Pages 258-265 of: Mart, J., Bened, J. M., Mendonga, A. M., and Serrat, J. (eds), Proceedings of the 3rd Iberian Conference on Pattern Recognition and Image Analysis, Part <I>II. Lecture Notes in Computer Science, vol. 4478. Berlin, Germany: Springer-Verlag.
Larsen, J., and Hansen, L. K. 1996. Linear Unlearning for Cross-Validation. Advances in Computational Mathematics, 5, 269–280.
Latecki, L. J., Lazarevic, A., and Pokrajac, D. 2007. Outlier Detection with Kernel Density Functions. Pages 61-75 of: Proceedings of the 5th International Conference on Machine Learning and Data Mining in Pattern Recognition.Lee, T.-W., Girolami, M., and Sejnowski, T. J. 1999. Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources. Neural Computation, 11(2), 417–441.
Lehmann, E. L. 1986. Testing Statistical Hypotheses. 2nd edn. New York: Wiley.CrossRefGoogle Scholar
Lehmann, E. L., and Casella, G. 1998. Theory of Point Estimation. 2nd edn. New York: Springer.Google Scholar
Li, K. 1991. Sliced Inverse Regression for Dimension Reduction. Journal of the AmericanStatistical Association, 86(414), 316–342.
Li, K. 1992. On Principal Hessian Directions for Data Visualization and Dimension Reduction: Another Application of Stein's Lemma. Journal of the American Statistical Association, 87(420), 1025–1039.CrossRef
Li, K. C., Lue, H. H., and Chen, C. H. 2000. Interactive Tree-structured Regression via Principal Hessian Directions. Journal of the American Statistical Association, 95(450), 547—560.CrossRef
Li, L., and Lu, W. 2008. Sufficient Dimension Reduction with Missing Predictors. Journal of the American Statistical Association, 103(482), 822–831.CrossRef
Li, Q. 1996. Nonparametric Testing of Closeness between Two Unknown Distribution Functions. Econometric Reviews, 15(3), 261—274.CrossRef
Li, Y., Liu, Y., and Zhu, J. 2007. Quantile Regression in Reproducing Kernel Hilbert Spaces. Journal of the American Statistical Association, 102(477), 255–268.CrossRef
Li, Y., Kambara, H., Koike, Y., and Sugiyama, M. 2010. Application of Covariate Shift Adaptation Techniques in Brain Computer Interfaces. IEEE Transactions on Biomedical Engineering, 57(6), 1318–1324.CrossRef
Lin, Y 2002. Support Vector Machines and the Bayes Rule in Classification. Data Mining and Knowledge Discovery, 6(3), 259–275.CrossRef
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., and Watkins, C. 2002. Text Classification Using String Kernels. Journal of Machine Learning Research, 2, 419–44.
Luenberger, D., and Ye, Y. 2008. Linear and Nonlinear Programming.Reading, MA: Springer.Google Scholar
Luntz, A., and Brailovsky, V. 1969. On Estimation of Characters Obtained in Statistical Procedure of Recognition. Technicheskaya Kibernetica, 3. in Russian.
MacKay, D. J. C. 2003. Information Theory, Inference, and Learning Algorithms.Cambridge, UK: Cambridge University Press.Google Scholar
MacQueen, J. B. 1967. Some Methods for Classification and Analysis of Multivariate Observations. Pages 281-297 of: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. Berkeley: University of California Press.Google Scholar
Mallows, C. L. 1973. Some Comments on CP. Technometrics, 15(4), 661—675.
Manevitz, L. M., and Yousef, M. 2002. One-Class SVMs for Document Classification. Journal of Machine Learning Research, 2, 139–154.
Meila, M., and Heckerman, D. 2001. An Experimental Comparison of Model-Based Clustering Methods. Machine Learning, 42(1/2), 9.CrossRef
Mendelson, S. 2002. Improving the Sample Complexity Using Global Data. IEEE Transactions on Information Theory, 48(7), 1977–1991.CrossRef
Mercer, J. 1909. Functions of Positive and Negative Type and Their Connection with the Theory of Integral Equations. Philosophical Transactions of the Royal Society of London, A-209, 415–46.CrossRef
Micchelli, C. A., and Pontil, M. 2005. Kernels for Multi-Task Learning. Pages 921-928 of: Saul, L. K., Weiss, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 17.Cambridge, MA: MIT Press.Google Scholar
Minka, T. P. 2007. AComparison of Numerical Optimizers for Logistic Regression. Tech. rept. Microsoft Research.
Moré, J. J., and Sorensen, D. C. 1984. Newton's Method. In: Golub, G. H. (ed), Studies in Numerical Analysis.Washington, DC: Mathematical Association of America.Google Scholar
Mori, S., Sugiyama, M., Ogawa, H., Kitagawa, K., and Irie, K. 2011. Automatic Parameter Optimization of the Local Model Fitting Method for Single-shot Surface Profiling. Applied Optics, 50(21), 3773–3780.CrossRef
Müller, A. 1997. Integral Probability Metrics and Their Generating Classes of Functions. Advances in Applied Probability, 29, 429—443.CrossRef
Murad, U., and Pinkas, G. 1999. Unsupervised Profiling for Identifying Superimposed Fraud. Pages 251-261 of: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD1999).
Murata, N., Yoshizawa, S., and Amari, S. 1994. Network Information Criterion — Determining the Number of Hidden Units for an Artificial Neural Network Model. IEEE Transactions on Neural Networks, 5(6), 865–872.CrossRef
Ng, A. Y., Jordan, M. I., and Weiss, Y 2002. On Spectral Clustering: Analysis and An Algorithm. Pages 849—856 of: Dietterich, T. G., Becker, S., and Ghahramani, Z. (eds), Advances in Neural Information Processing Systems 14.Cambridge, MA: MIT Press.Google Scholar
Nguyen, X., Wainwright, M. J., and Jordan, M. I. 2010. Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization. IEEE Transactions on Information Theory, 56(11), 5847–5861.CrossRef
Nishimori, Y., and Akaho, S. 2005. Learning Algorithms Utilizing Quasi-geodesic Flows on the Stiefel Manifold. Neurocomputing, 67, 106–135.CrossRef
Oja, E. 1982. A Simplified Neuron Model as a Principal Component Analyzer. Journal of Mathematical Biology, 15(3), 267–273.CrossRef
Oja, E. 1989. Neural Networks, Principal Components and Subspaces. International Journal of Neural Systems, 1, 61–68.CrossRef
Patriksson, M. 1999. Nonlinear Programming and Variational Inequality Problems. Dordrecht, the Netherlands: Kluwer Academic. Pearl, J. 2000. Causality: Models, Reasning and Inference.New York: Cambridge University Press.CrossRefGoogle Scholar
Pearson, K. 1900. On the Criterion That a Given System of Deviations from the Probable in the Case of a Correlated System of Variables Is Such That It Can Be Reasonably Supposed to Have Arisen from Random Sampling. Philosophical Magazine Series 5, 50(302), 157–175.CrossRef
Pérez-Cruz, F. 2008. Kullback-Leibler Divergence Estimation of Continuous Distributions. Pages 1666—1670 of: Proceedings of IEEE International Symposium on Information Theory.
Platt, J. 1999. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Pages 169-184 of: Schölkopf, B., Burges, C. J. C., and Smola, A. J. (eds), Advances in Kernel Methods—Support Vector Learning.Cambridge, MA: MIT Press.Google Scholar
Platt, J. 2000. Probabilities for SV Machines. In: Smola, A. J., Bartlett, P. L., Schölkopf, B., and Schuurmans, D. (eds), Advances in Large Margin Classifiers.Cambridge, MA: MIT Press.Google Scholar
Plumbley, M. D. 2005. Geometrical Methods for Non-Negative ICA: Manifolds, Lie Groups and Toral Subalgebras. Neurocomputing, 67(Aug.), 161–197. Press, W., H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. 1992. Numerical Recipes in C. 2nd edn. Cambridge, UK: Cambridge University Press.Google Scholar
Pukelsheim, F. 1993. Optimal Design of Experiments.New York: Wiley.Google Scholar
Qin, J. 1998. Inferences for Case-control and Semiparametric Two-sample Density Ratio Models. Biometrika, 85(3), 619–630.CrossRef
Qing, W., Kulkarni, S. R., and Verdu, S. 2006. A Nearest-Neighbor Approach to Estimating Divergence between Continuous Random Vectors. Pages 242-246 of: Proceedings of IEEE International Symposium on Information Theory.
Quadrianto, N., Smola, A. J., Song, L., and Tuytelaars, T. 2010. Kernelized Sorting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 1809—1821.CrossRef
Quiñonero-Candela, J., Sugiyama, M., Schwaighofer, A., and Lawrence, N. (eds). 2009. Dataset Shift in Machine Learning.Cambridge, MA: MIT Press.Google Scholar
R, Development Core Team. 2009. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna, Austria. http://www.r-project.org.
Rao, C. 1945. Information and the Accuracy Attainable in the Estimation of Statistical Parameters. Bulletin of the Calcutta Mathematics Society, 37, 81–89.
Rasmussen, C. E., and Williams, C. K. I. 2006. Gaussian Processes for Machine Learning.Cambridge, MA: MIT Press.Google Scholar
Rätsch, G., Onoda, T., and Müller, K.-R. 2001. Soft Margins for Ada Boost. Machine Learning, 42(3), 287–320.CrossRef
Reiss, P. T., and Ogden, R. T. 2007. Functional Principal Component Regression and Functional Partial Least Squares. Journal of the American Statistical Association, 102(479), 984—996.CrossRef
Rifkin, R., Yeo, G., and Poggio, T. 2003. Regularized Least-Squares Classification. Pages 131—154 of: Suykens, J. A. K., Horvath, G., Basu, S., Micchelli, C., and Vandewalle, J. (eds), Advances in Learning Theory: Methods, Models and Applications. NATO Science Series III: Computer & Systems Sciences, vol. 190. Amsterdam, the Netherlands: IOS Press.Google Scholar
Rissanen, J. 1978. Modeling by Shortest Data Description. Automatica, 14(5), 465—471. Rissanen, J. 1987. Stochastic Complexity. Journal of the Royal Statistical Society, Series B, 49(3), 223–239.CrossRef
Rockafellar, R. T. 1970. Convex Analysis.Princeton, NJ: Princeton University Press.CrossRefGoogle Scholar
Rosenblatt, M. 1956. Remarks on some nonparametric estimates of a density function. Annals of Mathematical Statistics, 27, 832–837.CrossRef
Roweis, S., and Saul, L. 2000. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 290(5500), 2323–2326.CrossRef
Sankar, ., Spielman, D. A., and Teng, S.-H. 2006. Smoothed Analysis of the Condition Numbers and Growth Factors of Matrices. SIAM Journal on Matrix Analysis and Applications, 28(2), 446–176.CrossRef
Saul, L. K., and Roweis, S. T. 2003. Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds. Journal of Machine Learning Research, 4(Jun), 119—155.
Schapire, R., Freund, Y., Bartlett, P., and Lee, W. Sun. 1998. Boosting the Margin: A New Explanation for the Effectiveness of Voting Methods. Annals of Statistics, 26, 1651–1686.
Scheinberg, K. 2006. An Efficient Implementation of an Active Set Method for SVMs. Journal of Machine Learning Research, 7, 2237–2257.
Schmidt, M. 2005. min Func. http://people.cs.ubc.ca/∼schmidtm/Software/min Func.html.
Schölkopf, B., and Smola, A. J. 2002. Learning with Kernels.Cambridge, MA: MIT Press.Google Scholar
Schölkopf, B., Smola, A., and Müller, K.-R. 1998. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 10(5), 1299—1319.CrossRef
Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., and Williamson, R. C. 2001. Estimating the Support of a High-Dimensional Distribution. Neural Computation, 13(7), 1443–1471.CrossRef
Schwarz, G. 1978. Estimating the Dimension of a Model. The Annals of Statistics, 6, 461–464. Shi, J., and Malik, J. 2000. Normalized Cuts and Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.CrossRef
Shibata, R. 1981. An Optimal Selection of Regression Variables. Biometrika, 68(1), 45–54.CrossRef
Shibata, R. 1989. Statistical Aspects of Model Selection. Pages 215-240 of: Willems, J. C. (ed), From Data to Model.New York: Springer-Verlag.Google Scholar
Shimodaira, H. 2000. Improving Predictive Inference under Covariate Shift by Weighting the Log-Likelihood Function. Journal of Statistical Planning and Inference, 90(2), 227–244.CrossRef
Silva, J., and Narayanan, S. 2007. Universal Consistency of Data-Driven Partitions for Divergence Estimation. Pages 2021—2025 of: Proceedings of IEEE International Symposium on Information Theory.Simm, J., Sugiyama, M., and Kato, T. 2011. Computationally Efficient Multi-task Learning with Least-Squares Probabilistic Classifiers. IPSJ Transactions on Computer Vision and Applications, 3, 1—8.
Smola, A., Song, L., and Teo, C. H. 2009. Relative Novelty Detection. Pages 536—543 of: van Dyk, D., and Welling, M. (eds), Proceedings of Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS2009). JMLR Workshop and Conference Proceedings, vol. 5.
Song, L., Smola, A., Gretton, A., and Borgwardt, K. 2007a. A Dependence Maximization View of Clustering. Pages 815-822 of: Ghahramani, Z. (ed), Proceedings of the 24th Annual International Conference on Machine Learning (ICML2007).
Song, L., Smola, A., Gretton, A., Borgwardt, K. M., and Bedo, J. 2007b. Supervised Feature Selection via Dependence Estimation. Pages 823—830 of: Ghahramani, Z. (ed), Proceedings of the 24th Annual International Conference on Machine Learning (ICML2007).
Spielman, D. A., and Teng, S.-H. 2004. Smoothed Analysis of Algorithms: Why the Simplex Algorithm Usually Takes Polynomial Time. Journal of the ACM, 51(3), 385–463.CrossRef
Sriperumbudur, B., Fukumizu, K., Gretton, A., Lanckriet, G., and Schölkopf, B. 2009. Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions. Pages 1750—1758 of: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C. K. I., and Culotta, A. (eds), Advances in Neural Information Processing Systems 22.Cambridge, MA: MIT Press.Google Scholar
Steinwart, I.2001. On the Influence of the Kernel on the Consistency of Support Vector Machines. Journal of Machine Learning Research, 2, 67—93.
Steinwart, I., Hush, D., and Scovel, C. 2009. Optimal Rates for Regularized Least Squares Regression. Pages 79-93 of: Proceedings of the Annual Conference on Learning Theory.
Stone, M. 1974. Cross-validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Statistical Society, Series B, 36, 111–147.
Storkey, A., and Sugiyama, M. 2007. Mixture Regression for Covariate Shift. Pages 1337-1344 of: Schölkopf, B., Platt, J. C., and Hoffmann, T. (eds), Advances in Neural Information Processing Systems 19.Cambridge, MA: MIT Press.Google Scholar
Student. 1908. The Probable Error of A Mean. Biometrika 6, 1–25.
Sugiyama, M. 2006. Active Learning in Approximately Linear Regression Based on Conditional Expectation of Generalization Error. Journal of Machine Learning Research, 7(Jan.), 141–166.
Sugiyama, M. 2007. Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis. Journal of Machine Learning Research, 8(May), 1027—1061.
Sugiyama, M. 2009. On Computational Issues of Semi-supervised Local Fisher Discriminant Analysis. IEICE Transactions on Information and Systems, E92-D(5), 1204–1208.CrossRef
Sugiyama, M. 2010. Superfast-Trainable Multi-Class Probabilistic Classifier by Least-Squares Posterior Fitting. IEICE Transactions on Information and Systems, E93-D(10), 2690—2701.CrossRef
Sugiyama, M., and Kawanabe, M. 2012. Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation.Cambridge, MA: MIT Press. to appear.CrossRefGoogle Scholar
Sugiyama, M., and Müller, K.-R. 2002. The Subspace Information Criterion for Infinite Dimensional Hypothesis Spaces. Journal of Machine Learning Research, 3(Nov.), 323–359.
Sugiyama, M., and Müller, K.-R. 2005. Input-Dependent Estimation of Generalization Error under Covariate Shift. Statistics & Decisions, 23(4), 249–279.CrossRef
Sugiyama, M., and Nakajima, S. 2009. Pool-based Active Learning in Approximate Linear Regression. Machine Learning, 75(3), 249—274.CrossRef
Sugiyama, M., and Ogawa, H. 2000. Incremental Active Learning for Optimal Generalization. Neural Computation, 12(12), 2909–2940.CrossRef
Sugiyama, M., and Ogawa, H. 2001a. Active Learning for Optimal Generalization in Trigonometric Polynomial Models. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E84-A(9), 2319–2329.
Sugiyama, M., and Ogawa, H. 2001b. Subspace Information Criterion for Model Selection. Neural Computation, 13(8), 1863–1889.CrossRef
Sugiyama, M., and Ogawa, H. 2003. Active Learning with Model Selection—Simultaneous Optimization of Sample Points and Models for Trigonometric Polynomial Models. IEICE Transactions on Information and Systems, E86-D(12), 2753–2763.
Sugiyama, M., and Rubens, N. 2008. A Batch Ensemble Approach to Active Learning with Model Selection. Neural Networks, 21(9), 1278–1286.CrossRef
Sugiyama, M., and Suzuki, T. 2011. Least-Squares Independence Test. IEICE Transactions on Information and Systems, E94-D(6), 1333–1336.CrossRef
Sugiyama, M., Kawanabe, M., and Müller, K.-R. 2004. Trading Variance Reduction with Unbiasedness: The Regularized Subspace Information Criterion for Robust Model Selection in Kernel Regression. Neural Computation, 16(5), 1077–1104.CrossRef
Sugiyama, M., Ogawa, H., Kitagawa, K., and Suzuki, K. 2006. Single-shot Surface Profiling by Local Model Fitting. Applied Optics, 45(31), 7999–8005.CrossRef
Sugiyama, M., Krauledat, M., and Müller, K.-R. 2007. Covariate Shift Adaptation by Importance Weighted Cross Validation. Journal of Machine Learning Research, 8(May), 985–1005.
Sugiyama, M., Suzuki, T., Nakajima, S., Kashima, H., von Bünau, P., and Kawanabe, M. 2008. Direct Importance Estimation for Covariate Shift Adaptation. Annals of the Institute of Statistical Mathematics, 60(4), 699–746.CrossRef
Sugiyama, M., Kanamori, T., Suzuki, T., Hido, S., Sese, J., Takeuchi, I., and Wang, L. 2009. A Density-ratio Framework for Statistical Data Processing. IPSJ Transactions on Computer Vision and Applications, 1, 183—208. Sugiyama, M., Kawanabe, M., and Chui, P. L. 2010a. Dimensionality Reduction for Density Ratio Estimation in High-dimensional Spaces. Neural Networks, 23(1), 44–59.CrossRef
Sugiyama, M., Takeuchi, I., Suzuki, T., Kanamori, T., Hachiya, H., and Okanohara, D. 2010b. Least-squares Conditional Density Estimation. IEICE Transactions on Information and Systems, E93-D(3), 583–594.CrossRef
Sugiyama, M., Idé, T., Nakajima, S., and Sese, J. 2010c. Semi-supervised Local Fisher Discriminant Analysis for Dimensionality Reduction. Machine Learning, 78(1-2), 35–61.CrossRef
Sugiyama, M., Suzuki, T., and Kanamori, T. 2011a. Density Ratio Matching under the Bregman Divergence: A Unified Framework of Density Ratio Estimation. Annals of the Institute of Statistical Mathematics. To appear.
Sugiyama, M., Yamada, M., von Bünau, P., Suzuki, T., Kanamori, T., and Kawanabe, M. 2011b. Direct Density-ratio Estimation with Dimensionality Reduction via Least-squares Hetero-distributional Subspace Search. Neural Networks, 24(2), 183–198.CrossRef
Sugiyama, M., Suzuki, T., Itoh, Y., Kanamori, T., and Kimura, M. 2011c. Least-Squares Two-Sample Test. Neural Networks, 24(7), 735–751.CrossRef
Sugiyama, M., Yamada, M., Kimura, M., and Hachiya, H. 2011d. On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution. In: Proceedings of 28th International Conference on Machine Learning (ICML2011),65–72.
Sutton, R. S., and Barto, G. A. 1998. Reinforcement Learning: An Introduction.Cambridge, MA: MIT Press.Google Scholar
Suykens, J. A. K., Gestel, T. Van, Brabanter, J. De, Moor, B. De, and Vandewalle, J. 2002. Least Squares Support Vector Machines. Singapore: World Scientific Pub. Co. Suzuki, T., and Sugiyama, M. 2010. Sufficient Dimension Reduction via Squared-loss Mutual Information Estimation. Pages 804—811 of: Teh, Y. W., and Tiggerington, M. (eds), Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS2010). JMLR Workshop and Conference Proceedings, vol. 9.
Suzuki, T., and Sugiyama, M. 2011. Least-Squares Independent Component Analysis. Neural Computation, 23(1), 284–301.CrossRef
Suzuki, T., Sugiyama, M., Sese, J., and Kanamori, T. 2008. Approximating Mutual Information by Maximum Likelihood Density Ratio Estimation. Pages 5-20 of: Saeys, Y., Liu, H., Inza, I., Wehenkel, L., and Peer, , Y, Van (eds), Proceedings of ECML-PKDD2008 Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery 2008 (FSDM2008). JMLR Workshop and Conference Proceedings, vol. 4.
Suzuki, T., Sugiyama, M., and Tanaka, T. 2009a. Mutual Information Approximation via Maximum Likelihood Estimation of Density Ratio. Pages 463—467 of: Proceedings of 2009 IEEE International Symposium on Information Theory (ISIT2009).
Suzuki, T., Sugiyama, M., Kanamori, T., and Sese, J. 2009b. Mutual Information Estimation Reveals Global Associations between Stimuli and Biological Processes. BMC Bioinformatics, 10(1), S52.CrossRef
Suzuki, T., Sugiyama, M., and Tanaka, Toshiyuki. 2011. Mutual Information Approximation via Maximum Likelihood Estimation of Density Ratio. in preparation. Takeuchi, I., Le, Q. V., Sears, T., D., and Smola, A. J. 2006. Nonparametric Quantile Estimation. Journal of Machine Learning Research, 7, 1231–1264.
Takeuchi, I., Nomura, K., and Kanamori, T. 2009. Nonparametric Conditional Density Estimation Using Piecewise-linear Solution Path of Kernel Quantile Regression. Neural Computation, 21(2), 533–559.CrossRef
Takeuchi, K. 1976. Distribution of Information Statistics and Validity Criteria of Models. Mathematical Science, 153, 12—18. in Japanese.
Takimoto, M., Matsugu, M., and Sugiyama, M. 2009. Visual Inspection of Precision Instruments by Least-Squares Outlier Detection. Pages 22-26 of: Proceedings of The Fourth International Workshop on Data-Mining and Statistical Science (DMSS2009).
Talagrand, M. 1996a. New Concentration Inequalities in Product Spaces. Inventiones Mathematicae, 126, 505–563.CrossRef
Talagrand, M. 1996b. ANew Look at Independence. The Annals of Statistics, 24, 1–34.
Tang, Y., and Zhang, H. H. 2006. Multiclass Proximal Support Vector Machines. Journal of Computational and Graphical Statistics, 15(2), 339—355.CrossRef
Tao, T., and Vu, V H. 2007. The Condition Number of a Randomly Perturbed Matrix. Pages 248—255 of: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing.New York: ACM.Google Scholar
Tax, D. M. J., and Duin, R. P. W. 2004. Support Vector Data Description. Machine Learning, 54(1), 45–66.CrossRef
Tenenbaum, J. B., de Silva, V., and Langford, J. C. 2000. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 290(5500), 2319–2323.CrossRef
Teo, C. H., Le, Q., Smola, A., and Vishwanathan, S. V. N. 2007. A Scalable Modular Convex Solver for Regularized Risk Minimization. Pages 727—736 of: ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2007).Tibshirani, R. 1996. Regression Shrinkage and Subset Selection with the Lasso. Journal of the Royal Statistical Society, Series B, 58(1), 267–288.
Tipping, M. E., and Bishop, C. M. 1999. Mixtures of Probabilistic Principal Component Analyzers. Neural Computation, 11(2), 443–482.CrossRef
Tresp, V. 2001. Mixtures of Gaussian Processes. Pages 654-660 of: Leen, T. K., Dietterich, T. G., and Tresp, V. (eds), Advances in Neural Information Processing Systems 13.Cambridge, MA: MIT Press.Google Scholar
Tsang, I., Kwok, J., and Cheung, P.-M. 2005. Core Vector Machines: Fast SVM Training on Very Large Data Sets. Journal of Machine Learning Research, 6, 363–392.
Tsuboi, Y., Kashima, H., Hido, S., Bickel, S., and Sugiyama, M. 2009. Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation. Journal of Information Processing, 17, 138–155.CrossRef
Ueki, K., Sugiyama, M., and Ihara, Y. 2011. Lighting Condition Adaptation for Perceived Age Estimation. IEICE Transactions on Information and Systems, E94-D(2), 392—395.CrossRef
van de Geer, S. 2000. Empirical Processes in M-Estimation.Cambridge, UK: Cambridge University Press.Google Scholar
van der Vaart, A. W. 1998. Asymptotic Statistics.Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
vander Vaart, A. W., and Wellner, J. A. 1996. Weak Convergence and Empirical Processes. with Applications to Statistics.New York: Springer-Verlag.CrossRefGoogle Scholar
Vapnik, V. N. 1998. Statistical Learning Theory.New York: Wiley.Google Scholar
Wahba, G. 1990. Spline Models for Observational Data.Philadelphia, PA: Society for Industrial and Applied Mathematics.CrossRefGoogle Scholar
Wang, Q., Kulkarmi, S. R., and Verdú, S. 2005. Divergence Estimation of Contiunous Distributions Based on Data-Dependent Partitions. IEEE Transactions on Information Theory, 51(9), 3064–3074.CrossRef
Watanabe, S. 2009. Algebraic Geometry and Statistical Learning Theory.Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Weinberger, K., Blitzer, J., and Saul, L. 2006. Distance Metric Learning for Large Margin Nearest Neighbor Classification. Pages 1473-1480 of: Weiss, Y., Schölkopf, B., and Platt, J. (eds), Advances in Neural Information Processing Systems 18.Cambridge, MA: MIT Press.Google Scholar
Weisberg, S. 1985. Applied Linear Regression.New York: John Wiley.Google Scholar
Wichern, G., Yamada, M., Thornburg, H., Sugiyama, M., and Spanias, A. 2010 (Mar. 14—19). Automatic Audio Tagging Using Covariate Shift Adaptation. Pages 253-256 of: Proceedings of 2010 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2010).
Wiens, D.P. 2000. Robust Weights and Designs for Biased Regression Models: Least Squares and Generalized M-Estimation. Journal of Statistical Planning and Inference, 83(2), 395–412.CrossRef
Williams, P. M. 1995. Bayesian Regularization and Pruning Using a Laplace Prior. Neural Computation, 7(1), 117–143.CrossRef
Wold, H. 1966. Estimation of Principal Components and Related Models by Iterative Least Squares. Pages 391-420 of: Krishnaiah, P. R.(ed),Multivariate Analysis.New York: Academic Press.Google Scholar
Wolff, R. C. L., Yao, Q., and Hall, P. 1999. Methods for Estimating a Conditional Distribution Function. Journal of the American Statistical Association, 94(445), 154—163.
Wu, T.-F., Lin, C.-J., and Weng, R. C. 2004. Probability Estimates for Multi-Class Classification by Pairwise Coupling. Journal of Machine Learning Research, 5, 975–1005.
Xu, L., Neufeld, J., Larson, B., and Schuurmans, D. 2005. Maximum Margin Clustering. Pages 1537—1544 of: Saul, L. K., Weiss, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 17.Cambridge, MA: MIT Press.Google Scholar
Xue, Y., Liao, X., Carin, L., and Krishnapuram, B. 2007. Multi-Task Learning for Classification with Dirichlet Process Priors. Journal of Machine Learning Research, 8, 35—63.
Yamada, M., and Sugiyama, M. 2009. Direct Importance Estimation with Gaussian Mixture Models. IEICE Transactions on Information and Systems, E92-D(10), 2159–2162.CrossRef
Yamada, M., and Sugiyama, M. 2010. Dependence Minimizing Regression with Model Selection for Non-Linear Causal Inference under Non-Gaussian Noise. Pages 643—648 of: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI2010).Atlanta, GA: AAAI Press.Google Scholar
Yamada, M., and Sugiyama, M. 2011a. Cross-Domain Object Matching with Model Selection. Pages 807—815 in Gordon, G., Dunson, D., and Dudík, M. (eds), Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR Workshop and Conference Proceedings, vol. 15.
Yamada, M., and Sugiyama, M. 2011b. Direct Density-Ratio Estimation with Dimensionality Reduction via Hetero-Distributional Subspace Analysis. Pages 549—554 of: Proceedings of the Twenty-Fifth AAAIConference on Artificial Intelligence (AAAI2011).San Francisco: AAAI Press.Google Scholar
Yamada, M., Sugiyama, M., Wichern, G., and Simm, J. 2010a. Direct Importance Estimation with a Mixture of Probabilistic Principal Component Analyzers. IEICE Transactions on Information and Systems, E93-D(10), 2846–2849.CrossRef
Yamada, M., Sugiyama, M., and Matsui, T. 2010b. Semi-supervised Speaker Identification under Covariate Shift. Signal Processing, 90(8), 2353–2361.CrossRef
Yamada, M., Sugiyama, M., Wichern, G., and Simm, J. 2011a. Improving the Accuracy of Least-Squares Probabilistic Classifiers. IEICE Transactions on Information and Systems, E94-D(6), 1337–1340.CrossRef
Yamada, M., Suzuki, T., Kanamori, T., Hachiya, H., and Sugiyama, M. 2011b. Relative Density-Ratio Estimation for Robust Distribution Comparison. To appear in “Advances in Neural Information Processing Systems,” vol. 24.
Yamada, M., Niu, G., Takagi, J., and Sugiyama, M. 2011c. Computationally Efficient Sufficient Dimension Reduction via Squared-Loss Mutual Information. Pages 247-262 of: C.-N., Hsu and W. S., Lee (eds.), Proceedings of the Third Asian Conference on Machine Learning (ACML 2011), JMLR Workshop and Conference Proceedings, vol. 20.
Yamanishi, K., and Takeuchi, J. 2002. A Unifying Framework for Detecting Outliers and Change Points from Non-Stationary Time Series Data. Pages 676-681 of: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2002).
Yamanishi, K., Takeuchi, J., Williams, G., and Milne, P. 2004. On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms. Data Mining and Knowledge Discovery, 8(3), 275–300.CrossRef
Yamazaki, K., Kawanabe, M., Watanabe, S., Sugiyama, M., and Müller, K.-R. 2007. Asymptotic Bayesian Generalization Error When Training and Test Distributions Are Different. Pages 1079—1086 of: Ghahramani, Z. (ed), Proceedings of 24th International Conference on Machine Learning (ICML2007).
Yankov, D., Keogh, E., and Rebbapragada, U. 2008. Disk Aware Discord Discovery: Finding Unusual Time Series in Terabyte Sized Datasets. Knowledge and Information Systems, 17(2), 241–262.CrossRef
Yokota, T., Sugiyama, M., Ogawa, H., Kitagawa, K., and Suzuki, K. 2009. The Interpolated Local Model Fitting Method for Accurate and Fast Single-shot Surface Profiling. Applied Optics, 48(18), 3497–3508.CrossRef
Yu, K., Tresp, V., and Schwaighofer, A. 2005. Learning Gaussian Processes from Multiple Tasks. Pages 1012-1019 of: Proceedings of the 22nd International Conference on Machine Learning (ICML2005).New York: ACM.Google Scholar
Zadrozny, B. 2004. Learning and Evaluating Classifiers under Sample Selection Bias. Pages 903—910 of: Proceedings of the Twenty-First International Conference on Machine Learning (ICML2004).New York: ACM.Google Scholar
Zeidler, E. 1986. Nonlinear Functional Analysis and Its Applications, I: Fixed-Point Theorems.New York: Springer-Verlag.CrossRefGoogle Scholar
Zelnik-Manor, L., and Perona, P. 2005. Self-Tuning Spectral Clustering. Pages 1601—1608 of: Saul, L. K., Weiss, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 17.Cambridge, MA: MIT Press.Google Scholar
Zhu, L., Miao, B., and Peng, H. 2006. On Sliced Inverse Regression with High-Dimensional Covariates. Journal of the American Statistical Association, 101(474), 630—643.CrossRef

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • Bibliography
  • Masashi Sugiyama, Tokyo Institute of Technology, Taiji Suzuki, University of Tokyo, Takafumi Kanamori, Nagoya University, Japan
  • Book: Density Ratio Estimation in Machine Learning
  • Online publication: 05 March 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9781139035613.023
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • Bibliography
  • Masashi Sugiyama, Tokyo Institute of Technology, Taiji Suzuki, University of Tokyo, Takafumi Kanamori, Nagoya University, Japan
  • Book: Density Ratio Estimation in Machine Learning
  • Online publication: 05 March 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9781139035613.023
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • Bibliography
  • Masashi Sugiyama, Tokyo Institute of Technology, Taiji Suzuki, University of Tokyo, Takafumi Kanamori, Nagoya University, Japan
  • Book: Density Ratio Estimation in Machine Learning
  • Online publication: 05 March 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9781139035613.023
Available formats
×