Massive SVM Parallelization Using Hardware Accelerators

doi:10.1017/CBO9781139042918.008

7 - Massive SVM Parallelization Using Hardware Accelerators

from Part Two - Supervised and Unsupervised Learning Algorithms

Published online by Cambridge University Press: 05 February 2012

Srimat Chakradhar and

Edited by

Mikhail Bilenko and

Igor Durdanovic: Affiliation:
NEC Labs America, Princeton, NJ, USA
Eric Cosatto: Affiliation:
NEC Labs America, Princeton, NJ, USA
Hans Peter Graf: Affiliation:
NEC Labs America, Princeton, NJ, USA
Srihari Cadambi: Affiliation:
NEC Labs America, Princeton, NJ, USA
Venkata Jakkula: Affiliation:
NEC Labs America, Princeton, NJ, USA
Srimat Chakradhar: Affiliation:
NEC Labs America, Princeton, NJ, USA
Abhinandan Majumdar: Affiliation:
NEC Labs America, Princeton, NJ, USA
Ron Bekkerman: Affiliation:
LinkedIn Corporation, Mountain View, California
Mikhail Bilenko: Affiliation:
Microsoft Research, Redmond, Washington
John Langford: Affiliation:
Yahoo! Research, New York

Book contents

Get access

Summary

Support Vector Machines (SVMs) are some of the most widely used classification and regression algorithms for data analysis, pattern recognition, or cognitive tasks. Yet learning problems that can be solved by SVMs are limited in size because of high computational cost and excessive storage requirements. Many variations of the original SVM algorithm were introduced that scale better to large problems. They change the SVM framework quite drastically, such as apply optimizations other than the maximum margin, or introduce different error metrics for the cost function. Such algorithms may work for some applications, but they do not have the robustness and universality that make SVMs so popular.

The approach taken here is to maintain the SVM algorithm in its original form and scale it to large problems through parallelization. Computer performance cannot be improved anymore at the pace of the last few decades by increasing the clock frequencies. Today, significant accelerations are achieved mostly through parallel architectures, and multicore processors are commonplace nowadays. Mapping the SVM algorithm to multicore processors with shared-memory architectures is straightforward, yet this approach does not scale to a large number of processors. Here we investigate parallelization concepts that scale to hundreds and thousands of cores where, for example, cache coherence can no longer be maintained.

A number of SVM implementations on clusters or graphics processors (GPUs) have been proposed recently. A parallel optimization algorithm based on gradient projections has been demonstrated (see Zanghirati, and Zanni, 2003; Zanni, Serafini, and Zanghirati, 2006) that uses a spectral gradient method for fast convergence while maintaining the Karush-Kuhn-Tucker (KKT) constraints.

Information

Type: Chapter
Information: Scaling up Machine Learning
Parallel and Distributed Approaches
, pp. 127 - 147

DOI: https://doi.org/10.1017/CBO9781139042918.008 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Cadambi, S., Durdanovic, , Igor, , Jakkula, , Venkata, , Sankaradass, , Murugan, , Cosatto, , Eric, , Chakradhar, , Srimat, , and Graf, , Hans, Peter. 2009. A Massively Parallel FPGA-Based Coprocessor for Support Vector Machines. Field-Programmable Custom Computing Machines, Annual IEEE Symposium on, 0, 115–122.Google Scholar

Cadambi, S., Majumdar, A., Becchi, M., Chakradhar, S. T., and Graf, H. P. 2010. A Programmable Parallel Accelerator for Learning and Classification.

Catanzaro, B., Sundaram, N., and Keutzer, K. 2008. Fast Support Vector Machine Training and Classification on Graphics Processors. Pages 104–111 of: Proceedings of the 25th International Conference on Machine Learning (ICML 2008).Google Scholar

Chatterji, S., Narayanan, M., Duell, J., and Oliker, L. 2003. Performance Evaluation of Two Emerging Media Processors: VIRAM and Imagine. Page 229 of: IPDPS.Google Scholar

D'Apuzzo, M., and Marino, M. 2003. Parallel computational issues of an interior point method for solving large bound-constrained quadratic programming problems. Parallel Computing, 29(4), 467–483.CrossRef Google Scholar

Diamond, J. R., Robatmili, B., Keckler, S. W., van de Geijn, R. A., Goto, K., and Burger, D. 2008. High Performance Dense Linear Algebra on a Spatially Distributed Processor. Pages 63–72 of: PPOPP.Google Scholar

Durdanovic, I., Cosatto, E., and Graf, H. P. 2007. Large Scale Parallel SVM Implementation. In: Bottou, L., Chapelle, O., DeCoste, D., and Weston, J. (eds), Large Scale Kernel Machines. Cambridge, MA: MIT Press.Google Scholar

Fan, R.-E., Chen, P.-H., and Lin, C.-J. 2005. Working Set Selection Using Second Order Information for Training Support Vector Machines. Journal of Machine Learning Research, 6, 1889–1918.Google Scholar

Graf, H. P., Cosatto, E., Bottou, L., Durdanovic, I., and Vapnik, V. 2005. Parallel Support Vector Machines: The Cascade SVM. Pages 521–528 of: Saul, L. K., Weiss, Y., and Bottou, L. (eds), Advances in Neural Information Processing Systems 17. Cambridge, MA: MIT Press.Google Scholar

Kapasi, U. J., Rixner, S., Dally, W. J., Khailany, B., Ahn, J. H., Mattson, P. R., and Owens, J. D. 2003. Programmable Stream Processors. IEEE Computer, 36(8), 54–62.CrossRef Google Scholar

Kelm, J. H., Johnson, D. R., Johnson, M. R., Crago, N. C., Tuohy, W., Mahesri, A., Lumetta, S. S., Frank, M. I., and Patel, S. J. 2009. Rigel: An Architecture and Scalable Programming Interface for a 1000-Core Accelerator. Pages 140–151 of: ISCA.CrossRef Google Scholar

Owens, J. D., Luebke, D., Govindaraju, N., Harris, M., Krueger, J., Lefohn, A. E., and Purcell, T. J. 2007. A Survey of General-Purpose Computation on Graphics Hardware. Computer Graphics Forum, 26(1), 80–113.CrossRef Google Scholar

Platt, J. 1999. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Pages 185–208 of: Schölkopf, B., Burges, C. J. C., and Smola, A. J. (eds), Advances in Kernel Methods – Support Vector Learning. Cambridge, MA: MIT Press.Google Scholar

Rousseaux, S., Hubaux, D., Guisset, P., and Legat, J. 2007. A High Performance FPGA-Based Accelerator for BLAS Library Implementation. In: Proceedings of the Third Annual Reconfigurable Systems Summer Institute (RSSI'07).Google Scholar

Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., and Hanrahan, P. 2008. Larrabee: A Many-Core x86 Architecture for Visual Computing. ACM Transactions on Graphics, 27(3).CrossRef Google Scholar

Taylor, M. B., Kim, J. S., Miller, J. E., Wentzlaff, D., Ghodrat, F., Greenwald, B., Hoffmann, H., Johnson, P., Lee, J.-W., Lee, W., Ma, A., Saraf, A., Seneski, M., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S. P., and Agarwal, A. 2002. The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs. Institute of Electrical and Electronics Engineers Micro, 22(2), 25–35.Google Scholar

Zanghirati, G., and Zanni, L. 2003. A Parallel Solver for Large Quadratic Programs in Training Support Vector Machines. Parallel Computing, 29(4), 535–551.CrossRef Google Scholar

Zanni, L., Serafini, T., and Zanghirati, G. 2006. Parallel Software for Training Large Scale Support Vector Machines on Multiprocessor Systems. Journal of Machine Learning Research, 1467–1492.Google Scholar

Zhuo, L., and Prasanna, V. K. 2005. High Performance Linear Algebra Operations on Reconfigurable Systems. Page 2 of: SC.Google Scholar

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.