Bibliography

Song Guo; Zhihao Qu

doi:10.1017/9781108955959.013

Bibliography

Published online by Cambridge University Press: 14 January 2022

Song Guo and

Zhihao Qu

Show author details

Song Guo: Affiliation:
The Hong Kong Polytechnic University
Zhihao Qu: Affiliation:
The Hong Kong Polytechnic University

Book contents

Get access

Summary

A summary is not available for this content so a preview has been provided. Please use the Get access link above for information on how to access this content.

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'

Information

Type: Chapter
Information: Edge Learning for Distributed Big Data Analytics
Theory, Algorithms, and System Design
, pp. 190 - 214

DOI: https://doi.org/10.1017/9781108955959.013 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2022

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

(2017). Baidu ring-allreduce: Bringing hpc techniques to deep learning. https://github.com/baidu-research/baidu-allreduce.Google Scholar

(2020). Keras: the python deep learning api. https://keras.io/.Google Scholar

(2020). Mpi forum: Message passing interface (mpi) forum home page. https://www.mpi-forum.org/.Google Scholar

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P. A., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X. (2016). Tensorflow: A system for large-scale machine learning. In Proc. OSDI.Google Scholar

Ablin, P., Moreau, T., Massias, M., and Gramfort, A. (2019). Learning step sizes for unfolded sparse coding. (NeurIPS).Google Scholar

Addanki, R., Venkatakrishnan, S. B., Gupta, S., Mao, H., and Alizadeh, M. (2019). Placeto: Learning generalizable device placement algorithms for distributed machine learning. CoRR.Google Scholar

Agarwal, N., Bullins, B., Chen, X., Hazan, E., Singh, K., Zhang, C., and Zhang, Y. (2019). Efficient full-matrix adaptive regularization. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:139–147.Google Scholar

Agarwal, N. and Singh, K. (2017). The price of differential privacy for online learning. In Proceedings of the 34th International Conference on Machine Learning - Volume 70.Google Scholar

Aji, A. F. and Heafield, K. (2017). Sparse communication for distributed gradient descent. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP 2017, Copenhagen, Denmark, September 9-11, 2017, pages 440– 445.Google Scholar

Al-Fares, M., Loukissas, A., and Vahdat, A. (2008). A scalable, commodity data center network architecture. In Proc. SIGCOMM.CrossRef Google Scholar

Alistarh, D., Allen-Zhu, Z., and Li, J. (2018a). Byzantine stochastic gradient descent. In Advances in Neural Information Processing Systems 31.Google Scholar

Alistarh, D., Grubic, D., Li, J., Tomioka, R., and Vojnovic, M. (2017a). QSGD: communication-efficient SGD via gradient quantization and encoding. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R., editors, Proceedings of Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar

Alistarh, D., Grubic, D., Li, J., Tomioka, R., and Vojnovic, M. (2017b). QSGD: Communication-efficient SGD via gradient quantization and encoding. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pages 1709–1720.Google Scholar

Alistarh, D., Grubic, D., Li, J., Tomioka, R., and Vojnovic, M. (2017c). QSGD: communication-efficient SGD via gradient quantization and encoding. In Proceedings of Conference on Neural Information Processing Systems (NeurIPS).Google Scholar

Alistarh, D., Hoefler, T., Johansson, M., Konstantinov, N., Khirirat, S., and Renggli, C. (2018b). The convergence of sparsified gradient methods. In Proceedings of Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar

Alistarh, D., Hoefler, T., Johansson, M., Konstantinov, N., Khirirat, S., and Renggli, C. (2018c). The convergence of sparsified gradient methods. In Proceedings of Neural Information Processing Systems (NeurIPS).Google Scholar

Amiri, M. M. and Gunduz, D. (2019). Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air.CrossRef Google Scholar

Amiri, M. M., Gunduz, D., Kulkarni, S. R., and Poor, H. V. (2020a). Convergence of update aware device scheduling for federated learning at the wireless edge.Google Scholar

Amiri, M. M., Gunduz, D., Kulkarni, S. R., and Poor, H. V. (2020b). Update aware device scheduling for federated learning at the wireless edge. arXiv preprint arXiv:2001.10402.Google Scholar

Aydöre, S., Thirion, B., and Varoquaux, G. (2019). Feature grouping as a stochastic regularizer for high-dimensional structured data. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:592–603.Google Scholar

Banner, R., Hubara, I., Hoffer, E., and Soudry, D. (2018). Scalable methods for 8-bit training of neural networks. Advances in Neural Information Processing Systems, 2018-December(NeurIPS):5145–5153.Google Scholar

Basu, D. D. (2019). Qsparse-local-SGD: Communication Efficient Distributed SGD with Quantization, Sparsification, and Local Computations. PhD thesis, University of California, Los Angeles, USA.CrossRef Google Scholar

Bernacchia, A., Lengyel, M., and Hennequin, G. (2018). Exact natural gradient in deep linear networks and application to the nonlinear case. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS):5941–5950.Google Scholar

Bhagoji, A. N., Chakraborty, S., Mittal, P., and Calo, S. B. (2018). Analyzing federated learning through an adversarial lens. CoRR, abs/1811.12470.Google Scholar

Ying, Bicheng, Yuan, Kun and Sayed, A. H. (2017). Variance-Reduced Stochastic Learning under Random Reshufflin. In Advances in Neural Information Processing Systems, volume 2017-Decem, pages 1624–1634.Google Scholar

Blanchard, P., El Mhamdi, E. M., Guerraoui, R., and Stainer, J. (2017). Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems 30.Google Scholar

Bonawitz, K., Eichner, H., Grieskamp, W., Huba, D., Ingerman, A., Ivanov, V., Kiddon, C., Konecný, J., Mazzocchi, S., McMahan, H. B., Overveldt, T. V., Petrou, D., Ramage, D., and Roselander, J. (2019). Towards federated learning at scale: System design. ArXiv, abs/1902.01046.Google Scholar

Bonawitz, K., Ivanov, V., Kreuter, B., Marcedone, A., McMahan, H. B., Patel, S., Ramage, D., Segal, A., and Seth, K. (2017). Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.Google Scholar

Brendel, W., Rauber, J., and Bethge, M. (2017). Decision-based adversarial attacks: Reliable attacks against black-box machine learning models.Google Scholar

Brutzkus, A., Elisha, O., and Gilad-Bachrach, R. (2018). Low latency privacy preserving inference. CoRR, abs/1812.10659.Google Scholar

Buckman, J., Roy, A., Raffel, C., and Goodfellow, I. (2018). Thermometer encoding: One hot way to resist adversarial examples. In International Conference on Learning Representations.Google Scholar

Canel, C., Kim, T., Zhou, G., Li, C., Lim, H., Andersen, D. G., Kaminsky, M., and Dulloor, S. R. (2019). Scaling video analytics on constrained edge nodes.Google Scholar

Charles, Z., Papailiopoulos, D., and Ellenberg, J. (2017). Approximate gradient coding via sparse random graphs. arXiv.Google Scholar

Chen, C., Choi, J., Brand, D., Agrawal, A., Zhang, W., and Gopalakrishnan, K. (2018a). Adacomp : Adaptive residual gradient compression for data-parallel distributed training. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI.CrossRef Google Scholar

Chen, D., Leon, A. S., Engle, S. P., Fuentes, C., and Chen, Q. (2017a). Offline training for improving online performance of a genetic algorithm based optimization model for hourly multi-reservoir operation. Environ. Model. Softw., 96:46–57.Google Scholar

Chen, H., Wu, H. C., Chan, S. C., and Lam, W. H. (2019a). A stochastic quasi-newton method for large-scale nonconvex optimization with applications. IEEE transactions on neural networks and learning systems.Google Scholar

Chen, J., Monga, R., Bengio, S., and Józefowicz, R. (2016). Revisiting distributed synchronous SGD. arXiv, abs/1604.00981.Google Scholar

Chen, M., Yang, Z., Saad, W., Yin, C., Poor, H. V., and Cui, S. (2019b). A joint learning and communications framework for federated learning over wireless networks.Google Scholar

Chen, T., Giannakis, G., Sun, T., and Yin, W. (2018b). LAG: Lazily aggregated gradient for communication-efficient distributed learning. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pages 5050–5060.Google Scholar

Chen, X., Liu, S., Xu, K., Li, X., Lin, X., Hong, M., and Cox, D. (2019c). ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization. (NeurIPS): 1–12.Google Scholar

Chen, Y., Su, L., and Xu, J. (2017). Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proceedings of the ACM on Measurement and Analysis of Computing Systems (SIGMETRICS), 1(2), 1–25.Google Scholar

Chen, Y.-K., Wu, A.-Y., Bayoumi, M. A., and Koushanfar, F. (2013). Editorial low-power, intelligent, and secure solutions for realization of internet of things. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 3(1):1–4.Google Scholar

Chin, T.-W., Ding, R., and Marculescu, D. (2019). Adascale: Towards real-time video object detection using adaptive scaling. arXiv preprint arXiv:1902.02910.Google Scholar

Chou, Y.-M., Chan, Y.-M., Lee, J.-H., Chiu, C.-Y., and Chen, C.-S. (2018). Unifying and merging well-trained deep neural networks for inference stage. arXiv preprint arXiv:1805.04980.Google Scholar

Cipar, J., Ho, Q., Kim, J. K., Lee, S., Ganger, G. R., Gibson, G., Keeton, K., and Xing, E. (2013). Solving the straggler problem with bounded staleness. In Presented as part of the 14th Workshop on Hot Topics in Operating Systems.Google Scholar

Cortes, C. and Vapnik, V. (2004). Support-vector networks. Machine Learning, 20:273–297.Google Scholar

Daga, H., Nicholson, P. K., Gavrilovska, A., and Lugones, D. (2019). Cartel: A system for collaborative transfer learning at the edge. Proceedings of the ACM Symposium on Cloud Computing.CrossRef Google Scholar

Datta, S., Bhaduri, K., Giannella, C., Wolff, R., and Kargupta, H. (2006). Distributed data mining in peer-to-peer networks. IEEE Internet Computing, 10(4):18–26.CrossRef Google Scholar

Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M., Senior, A., Tucker, P., Yang, K., et al. (2012). Large scale distributed deep networks. In Advances in neural information processing systems, pages 1223–1231.Google Scholar

Defazio, A., Bach, F. R., and Lacoste-Julien, S. (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Proceedings of Annual Conference on Neural Information Processing Systems.Google Scholar

Dekel, O., Gilad-Bachrach, R., Shamir, O., and Xiao, L. (2012). Optimal distributed online prediction using mini-batches. J. Mach. Learn. Res., 13:165–202.Google Scholar

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. (2009). Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee.Google Scholar

Dennis, D. K., Pabbaraju, C., Simhadri, H. V., and Jain, P. (2018). Multiple instance learning for efficient sequential data classification on resource-constrained devices. Advances in Neural Information Processing Systems, 2018-December(NeurIPS):10953– 10964.Google Scholar

Denton, E. L., Zaremba, W., Bruna, J., LeCun, Y., and Fergus, R. (2014). Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems, pages 1269–1277.Google Scholar

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.Google Scholar

Dhillon, G. S., Azizzadenesheli, K., Lipton, Z. C., Bernstein, J., Kossaifi, J., Khanna, A., and Anandkumar, A. (2018). Stochastic activation pruning for robust adversarial defense. CoRR, abs/1803.01442.Google Scholar

Dieuleveut, A. and Patel, K. K. (2019). Communication trade-offs for local-sgd with large step size. In Proceedings of Advances in Neural Information Processing Systems, NeurIPS.Google Scholar

Dong, X., Liu, L., Li, G., Li, J., Zhao, P., Wang, X., and Feng, X. (2019). Exploiting the input sparsity to accelerate deep neural networks: poster. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, pages 401–402.Google Scholar

Dozat, T. (2016). Incorporating Nesterov Momentum into Adam. in Proceedings of ICLR Workshop, (1):2013–2016.Google Scholar

Du, S. S. and Hu, W. (2019). Width provably matters in optimization for deep linear neural networks. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:2956–2970.Google Scholar

Du, Y. and Huang, K. (2018). Fast analog transmission for high-mobility wireless data acquisition in edge learning. Arxiv Online Available: https://arxiv.org/abs/1807.11250.Google Scholar

Egan, K. J., Pinto-Bruno, Á. C., Bighelli, I., Berg-Weger, M., van Straten, A., Albanese, E., and Pot, A. (2018). Online training and support programs designed to improve mental health and reduce burden among caregivers of people with dementia: A systematic review. Journal of the American Medical Directors Association, 19 3:200–206.e1.Google Scholar

Elgabli, A., Park, J., Issaid, C. B., and Bennis, M. (2020). Harnessing wireless channels for scalable and privacy-preserving federated learning. ArXiv, abs/2007.01790.Google Scholar

Epasto, A., Esfandiari, H., and Mirrokni, V. (2019). On-device algorithms for public-private data with absolute privacy. In The World Wide Web Conference.Google Scholar

Fang, B., Zeng, X., and Zhang, M. (2018). Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. Proceedings of the 24th Annual International Conference on Mobile Computing and Networking.Google Scholar

Faraji, I., Mirsadeghi, S. H., and Afsahi, A. (2016). Topology-aware GPU selection on multi-gpu nodes. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops.Google Scholar

Gao, Y., Chen, L., and Li, B. (2018a). Spotlight: Optimizing device placement for training deep neural networks. In International Conference on Machine Learning.Google Scholar

Gao, Y., Chen, L., and Li, B. (2018b). Spotlight: Optimizing device placement for training deep neural networks. In Proceedings of the 35th International Conference on Machine Learning, ICML.Google Scholar

Gaunt, A., Johnson, M., Riechert, M., Tarlow, D., Tomioka, R., Vytiniotis, D., and Webster, S. (2017). Ampnet: Asynchronous model-parallel training for dynamic neural networks. arXiv, abs/1705.09786.Google Scholar

Gazagnadou, N., Gower, R. M., and Salmon, J. (2019). Optimal mini-batch and step sizes for SAGA. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:3734–3742.Google Scholar

Ge, J., Wang, Z., Wang, M., and Liu, H. (2018). Minimax-optimal privacy-preserving sparse pca in distributed systems. In Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics.Google Scholar

Geng, Y., Yang, Y., and Cao, G. (2018). Energy-efficient computation offloading for multicore-based mobile devices. IEEE INFOCOM 2018 - IEEE Conference on Computer Communications, pages 46–54.Google Scholar

Ghadimi, S. and Lan, G. (2013). Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368.Google Scholar

Ghadimi, S. and Lan, G. (2016). Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming, 156:59–99.Google Scholar

Giacomelli, I., Jha, S., Joye, M., Page, C. D., and Yoon, K. (2018). Privacy-preserving ridge regression with only linearly-homomorphic encryption. In Applied Cryptography and Network Security.Google Scholar

Gibbons, R. (1992). Primer in Game Theory. Harvester Wheatsheaf.Google Scholar

Gope, D., Dasika, G., and Mattina, M. (2019). Ternary hybrid neural-tree networks for highly constrained iot applications.Google Scholar

Goyal, P., Dollár, P., Girshick, R., Noordhuis, P., Wesolowski, L., Kyrola, A., Tulloch, A., Jia, Y., and He, K. (2017). Accurate, large minibatch sgd: Training imagenet in 1 hour.Google Scholar

Gu, J., Chowdhury, M., Shin, K. G., Zhu, Y., Jeon, M., Qian, J., Liu, H., and Guo, C. (2019a). Tiresias: A {GPU} cluster manager for distributed deep learning. In 16th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 19), pages 485–500.Google Scholar

Gu, L., Zeng, D., Guo, S., Barnawi, A., and Xiang, Y. (2017). Cost efficient resource management in fog computing supported medical cyber-physical system. IEEE Transactions on Emerging Topics in Computing, 5(1):108–119.Google Scholar

Gu, R., Yang, S., and Wu, F. (2019b). Distributed machine learning on mobile devices: A survey. arXiv preprint arXiv:1909.08329.Google Scholar

Gunasekar, S., Lee, J. D., Soudry, D., and Srebro, N. (2018). Characterizing implicit bias in terms of optimization geometry. In Proceedings of the 35th International Conference on Machine Learning, ICML.Google Scholar

Gündüz, D., de Kerret, P., Sidiropoulos, N. D., Gesbert, D., Murthy, C. R., and van der Schaar, M. (2019). Machine learning in the air. IEEE J. Sel. Areas Commun., 37(10):2184–2199.CrossRef Google Scholar

Guo, C., Lu, G., Li, D., Wu, H., Zhang, X., Shi, Y., Tian, C., Zhang, Y., and Lu, S. (2009). Bcube: A high performance, server-centric network architecture for modular data centers. In Proc. SIGCOMM.Google Scholar

Guo, P., Hu, B., Li, R., and Hu, W. (2018a). Foggycache: Cross-device approximate computation reuse. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking, pages 19–34. ACM.Google Scholar

Guo, Y., Yao, A., and Chen, Y. (2016). Dynamic network surgery for efficient dnns. In Annual Conference on Neural Information Processing Systems, NeurIPS, December 5-10, 2016, Barcelona, Spain, pages 1379–1387.Google Scholar

Guo, Y., Zhang, C., Zhang, C., and Chen, Y. (2018b). Sparse dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems 31.Google Scholar

Gupta, H., Srikant, R., and Ying, L. (2019). Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning. (NeurIPS).Google Scholar

Haddadpour, F., Kamani, M. M., Mahdavi, M., and Cadambe, V. R. (2019a). Local SGD with periodic averaging: Tighter analysis and adaptive synchronization. In Proceedings of Advances in Neural Information Processing Systems, NeurIPS.Google Scholar

Haddadpour, F., Kamani, M. M., Mahdavi, M., and Cadambe, V. R. (2019b). Trading redundancy for communication: Speeding up distributed SGD for non-convex optimization. In Proceedings of the 36th International Conference on Machine Learning, ICML.Google Scholar

Hadjis, S., Zhang, C., Mitliagkas, I., Iter, D., and Ré, C. (2016). Omnivore: An optimizer for multi-device deep learning on cpus and gpus. arXiv.Google Scholar

Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., and Xu, C. (2019). Ghostnet: More features from cheap operations. arXiv preprint arXiv:1911.11907.Google Scholar

Han, P., Wang, S., and Leung, K. K. (2020). Adaptive gradient sparsification for efficient federated learning: An online learning approach. CoRR, abs/2001.04756.Google Scholar

Han, S., Mao, H., and Dally, W. J. (2015a). Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149.Google Scholar

Han, S., Mao, H., and Dally, W. J. (2016). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. arXiv.Google Scholar

Han, S., Pool, J., Tran, J., and Dally, W. (2015b). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135–1143.Google Scholar

Han, S., Pool, J., Tran, J., and Dally, W. J. (2015c). Learning both weights and connections for efficient neural network. In Annual Conference on Neural Information Processing Systems, NeurIPS, December 7-12, 2015, Montreal, Quebec, Canada, pages 1135–1143.Google Scholar

Harlap, A., Narayanan, D., Phanishayee, A., Seshadri, V., Devanur, N. R., Ganger, G. R., and Gibbons, P. B. (2018). Pipedream: Fast and efficient pipeline parallel DNN training. arXiv, abs/1806.03377.Google Scholar

He, F., Liu, T., and Tao, D. (2019). Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence. NeurIPS, (NeurIPS):10.Google Scholar

He, J., Chen, Y., Fu, T. Z. J., Long, X., Winslett, M., You, L., and Zhang, Z. (2018a). Haas: Cloud-based real-time data analytics with heterogeneity-aware scheduling. In 38th IEEE International Conference on Distributed Computing Systems, ICDCS.Google Scholar

He, K., Zhang, X., Ren, S., and Sun, J. (2016). Deep residual learning for image recognition. In Proc. CVPR, pages 770–778.Google Scholar

He, W., Li, B., and Song, D. (2018b). Decision boundary analysis of adversarial examples. In International Conference on Learning Representations.Google Scholar

Hearst, M. A., Dumais, S. T., Osuna, E., Platt, J., and Scholkopf, B. (1998). Support vector machines. IEEE Intelligent Systems and their applications, 13(4):18–28.Google Scholar

Heikkilä, M., Lagerspetz, E., Kaski, S., Shimizu, K., Tarkoma, S., and Honkela, A. (2017). Differentially private bayesian learning on distributed data. In Advances in Neural Information Processing Systems 30.Google Scholar

Hesamifard, E., Takabi, H., and Ghasemi, M. (2017). Cryptodl: Deep neural networks over encrypted data. CoRR, abs/1711.05189.Google Scholar

Hinton, G., Vinyals, O., and Dean, J. (2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.Google Scholar

Hitaj, B., Ateniese, G., and Perez-Cruz, F. (2017). Deep models under the gan: Information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.Google Scholar

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8):1735–1780.Google Scholar

Holmes, C., Mawhirter, D., He, Y., Yan, F., and Wu, B. (2019). Grnn: Low-latency and scalable rnn inference on gpus. Proceedings of the 14th EuroSys Conference 2019.Google Scholar

Hosmer, D. W. and Lemeshow, S. (1989). Applied logistic regression.Google Scholar

Hsieh, K., Harlap, A., Vijaykumar, N., Konomis, D., Ganger, G. R., Gibbons, P. B., and Mutlu, O. (2017a). Gaia: Geo-distributed machine learning approaching LAN speeds. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI.Google Scholar

Hsieh, K., Harlap, A., Vijaykumar, N., Konomis, D., Ganger, G. R., Gibbons, P. B., and Mutlu, O. (2017b). Gaia: Geo-distributed machine learning approaching {LAN} speeds. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17), pages 629–647.Google Scholar

Hsieh, K., Harlap, A., Vijaykumar, N., Konomis, D., Ganger, G. R., Gibbons, P. B., and Mutlu, O. (2017c). Gaia: Geo-distributed machine learning approaching lan speeds. In Proc. NSDI.Google Scholar

Hu, J., Shen, L., and Sun, G. (2018). Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7132–7141.Google Scholar

Huang, C., Zhai, S., Talbott, W., Bautista, M. A., Sun, S. Y., Guestrin, C., and Susskind, J. (2019a). Addressing the loss-metric mismatch with adaptive loss alignment. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:5145–5154.Google Scholar

Huang, H., Wang, C., and Dong, B. (2019b). Nostalgic ADAM: Weighting more of the past gradients when designing the adaptive learning rate. in Proceedings of IJCAI International Joint Conference on Artificial Intelligence, 2019-Augus:2556–2562.Google Scholar

Huang, J., Qian, F., Guo, Y., Zhou, Y., Xu, Q., Mao, Z. M., Sen, S., and Spatscheck, O. (2013). An in-depth study of lte: effect of network protocol and application behavior on performance. ACM SIGCOMM Computer Communication Review, 43(4):363–374.Google Scholar

Huang, L., Yin, Y., Fu, Z., Zhang, S., Deng, H., and Liu, D. (2018). Loadaboost: Loss-based adaboost federated machine learning on medical data. arXiv preprint arXiv:1811.12629.Google Scholar

Huang, T., Ye, B., Qu, Z., Tang, B., Xie, L., and Lu, S. (2020). Physical-layer arithmetic for federated learning in uplink mu-mimo enabled wireless networks. In Proceedings of IEEE Conference on Computer Communications, INFOCOM.Google Scholar

Huang, Y., Cheng, Y., Chen, D., Lee, H., Ngiam, J., Le, Q. V., and Chen, Z. (2019c). Gpipe: Efficient training of giant neural networks using pipeline parallelism. In Proc. NeurIPS.Google Scholar

Hui, L., Li, X., Gong, C., Fang, M., Zhou, J. T., and Yang, J. (2019). Inter-Class Angular Loss for Convolutional Neural Networks. in Proceedings of the AAAI Conference on Artificial Intelligence, 33:3894–3901.Google Scholar

Iandola, F. N., Han, S., Moskewicz, M. W., Ashraf, K., Dally, W. J., and Keutzer, K. (2016). Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360.Google Scholar

Jackson, M. O. (2014). Mechanism theory. Available at SSRN 2542983.Google Scholar

Jaggi, M., Smith, V., Takác, M., Terhorst, J., Krishnan, S., Hofmann, T., and Jordan, M. I. (2014). Communication-efficient distributed dual coordinate ascent. In Proc. NIPS.Google Scholar

Jain, P., Thakkar, O., and Thakurta, A. (2017). Differentially private matrix completion, revisited. CoRR, abs/1712.09765.Google Scholar

Jayaraman, B., Wang, L., Evans, D., and Gu, Q. (2018). Distributed learning without distress: Privacy-preserving empirical risk minimization. In Advances in Neural Information Processing Systems 31.Google Scholar

Jeon, Y.-S., Amiri, M. M., Li, J., and Poor, H. V. (2020). Gradient estimation for federated learning over massive mimo communication systems.Google Scholar

Jeong, E., Oh, S., Kim, H., Park, J., Bennis, M., and Kim, S.-L. (2018a). Communication-efficient on-device machine learning: Federated distillation and augmentation under non-iid private data. arXiv preprint arXiv:1811.11479.Google Scholar

Jeong, H.-J., Lee, H.-J., Shin, C. H., and Moon, S.-M. (2018b). Ionn: Incremental offloading of neural network computations from mobile devices to edge servers. In Proceedings of the ACM Symposium on Cloud Computing, pages 401–411.Google Scholar

Jia, Q., Guo, L., Jin, Z., and Fang, Y. (2018). Preserving model privacy for machine learning in distributed systems. IEEE Transactions on Parallel and Distributed Systems, 29(8):1808–1822.Google Scholar

Jiang, J., Cui, B., Zhang, C., and Yu, L. (2017a). Heterogeneity-aware distributed parameter servers. Proceedings of the ACM SIGMOD International Conference on Management of Data, Part F127746:463–478.Google Scholar

Jiang, J., Cui, B., Zhang, C., and Yu, L. (2017b). Heterogeneity-aware distributed parameter servers. In Proc. SIGMOD.Google Scholar

Jiang, R. and Zhou, S. (2020). Cluster-based cooperative digital over-the-air aggregation for wireless federated edge learning. ArXiv, abs/2008.00994.Google Scholar

Jin, S., Di, S., Liang, X., Tian, J., Tao, D., and Cappello, F. (2019). Deepsz: A novel framework to compress deep neural networks by using error-bounded lossy compression. In Proceedings of the 28th International Symposium on High-Performance Parallel and Distributed Computing, pages 159–170.Google Scholar

Johnson, R. and Zhang, T. (2013). Accelerating stochastic gradient descent using predictive variance reduction. In Proceedings of 27th Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar

Jouppi, N. P., Young, C., Patil, N., Patterson, D. A., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., Boyle, R., Cantin, P., Chao, C., Clark, C., Coriell, J., Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami, T. V., Gottipati, R., Gulland, W., Hagmann, R., Ho, C. R., Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey, A., Jaworski, A., Kaplan, A., Khaitan, H., Killebrew, D., Koch, A., Kumar, N., Lacy, S., Laudon, J., Law, J., Le, D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean, G., Maggiore, A., Mahony, M., Miller, K., Nagarajan, R., Narayanaswami, R., Ni, R., Nix, K., Norrie, T., Omernick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M., Salek, A., Samadiani, E., Severn, C., Sizikov, G., Snelham, M., Souter, J., Steinberg, D., Swing, A., Tan, M., Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan, V., Walter, R., Wang, W., Wilcox, E., and Yoon, D. H. (2017). In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA, Toronto, ON, Canada, June 24-28, 2017, pages 1–12.Google Scholar

Kalchbrenner, N., Danihelka, I., and Graves, A. (2015). Grid long short-term memory. arXiv.Google Scholar

Karimireddy, S. P., Rebjock, Q., Stich, S. U., and Jaggi, M. (2019). Error feedback fixes signsgd and other gradient compression schemes. arXiv preprint arXiv:1901.09847.Google Scholar

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Proc. NeurIPS.Google Scholar

Kim, Y., Kim, J., Chae, D., Kim, D., and Kim, J. (2019). μlayer: Low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In Proceedings of the Fourteenth EuroSys Conference 2019, pages 1–15.Google Scholar

Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar

Koda, Y., Yamamoto, K., Nishio, T., and Morikura, M. (2020). Differentially private aircomp federated learning with power adaptation harnessing receiver noise. ArXiv, abs/2004.06337.Google Scholar

Koloskova, A., Lin, T., Stich, S. U., and Jaggi, M. (2019a). Decentralized deep learning with arbitrary communication compression. arXiv preprint arXiv:1907.09356.Google Scholar

Koloskova, A., Stich, S. U., and Jaggi, M. (2019b). Decentralized stochastic optimization and gossip algorithms with compressed communication. arXiv preprint arXiv:1902.00340.Google Scholar

Konecný, J., McMahan, H. B., and Ramage, D. (2015). Federated optimization: Distributed optimization beyond the datacenter. ArXiv, abs/1511.03575.Google Scholar

Koutnik, J., Greff, K., Gomez, F., and Schmidhuber, J. (2014). A clockwork rnn. arXiv.Google Scholar

Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105.Google Scholar

Kumar, A., Fu, J., Tucker, G., and Levine, S. (2019). Stabilizing off-policy q-learning via bootstrapping error reduction. In NeurIPS.Google Scholar

Kusupati, A., Singh, M., Bhatia, K., Kumar, A., Jain, P., and Varma, M. (2018). Fastgrnn: A fast, accurate, stable and tiny kilobyte sized gated recurrent neural network. In Advances in Neural Information Processing Systems, pages 9017–9028.Google Scholar

Lathauwer, L. D., Moor, B. D., and Vandewalle, J. (2000). A multilinear singular value decomposition. SIAM J. Matrix Analysis Applications, 21:1253–1278.Google Scholar

LeCun, Y. (1998). The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/.Google Scholar

Lee, K., Lam, M., Pedarsani, R., Papailiopoulos, D., and Ramchandran, K. (2018). Speeding up distributed machine learning using codes. IEEE Transactions on Information Theory, 64(3):1514–1529.Google Scholar

Lee, S., Kim, J. K., Zheng, X., Ho, Q., Gibson, G. A., and Xing, E. P. (Canada, 2014). On model parallelization and scheduling strategies for distributed machine learning. In Proc. NeurIPS.Google Scholar

Lei, L., Tan, Y., Liu, S., Zheng, K., et al. (2019). Deep reinforcement learning for autonomous internet of things: Model, applications and challenges. arXiv preprint arXiv:1907.09059.Google Scholar

Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B. (2014a). Scaling distributed machine learning with the parameter server. In Proceedings of 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI.Google Scholar

Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B. Y. (2014b). Scaling distributed machine learning with the parameter server. in Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2014, pages 583–598.Google Scholar

Li, M., Andersen, D. G., Park, J. W., Smola, A. J., Ahmed, A., Josifovski, V., Long, J., Shekita, E. J., and Su, B.-Y. (2014c). Scaling distributed machine learning with the parameter server. pages 583–598.Google Scholar

Li, M., Zhang, T., Chen, Y., and Smola, A. J. (New York, 2014d). Efficient mini-batch training for stochastic optimization. In Proc. SIGKDD.Google Scholar

Li, P. and Guo, S. (2015). Incentive mechanisms for device-to-device communications. IEEE Network, 29(4):75–79.Google Scholar

Li, P., Wu, X., Shen, W., Tong, W., and Guo, S. (2019a). Collaboration of heterogeneous unmanned vehicles for smart cities. IEEE Network, 33(4):133–137.Google Scholar

Li, S., Kalan, S. M. M., Yu, Q., Soltanolkotabi, M., and Avestimehr, A. S. (2018a). Polynomially coded regression: Optimal straggler mitigation via data encoding. arXiv.Google Scholar

Li, T., Sahu, A. K., Zaheer, M., Sanjabi, M., Talwalkar, A., and Smith, V. (2018b). Federated optimization in heterogeneous networks. arXiv preprint arXiv:1812.06127.Google Scholar

Li, T., Sanjabi, M., Beirami, A., and Smith, V. (2019b). Fair resource allocation in federated learning.Google Scholar

Li, X. and Orabona, F. (2018). On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes. 89.Google Scholar

Li, X., Wang, W., Hu, X., and Yang, J. (2019c). Selective kernel networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 510–519.Google Scholar

Li, Y., Ma, T., and Zhang, H. (2018c). Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In Bubeck, S., Perchet, V., and Rigollet, P., editors, Proceedings of Conference On Learning Theory, COLT.Google Scholar

Li, Y., Wei, C., and Ma, T. (2019d). Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks. (NeurIPS):1–12.Google Scholar

Li, Z., Brendel, W., Walker, E. Y., Cobos, E., Muhammad, T., Reimer, J., Bethge, M., Sinz, F. H., Pitkow, X., and Tolias, A. S. (2019e). Learning From Brains How to Regularize Machines. (NeurIPS):1–11.Google Scholar

Li, Z., Xu, C., and Leng, B. (2018d). Rethinking Loss Design for Large-scale 3D Shape Retrieval. pages 840–846.Google Scholar

Li, Z., Xu, C., and Leng, B. (2019f). Angular Triplet-Center Loss for Multi-View 3D Shape Retrieval. in Proceedings of the AAAI Conference on Artificial Intelligence, AAAI.Google Scholar

Lian, X., Zhang, C., Zhang, H., Hsieh, C.-J., Zhang, W., and Liu, J. (2017). Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 5330–5340.Google Scholar

Lian, X., Zhang, W., Zhang, C., and Liu, J. (2018). Asynchronous decentralized parallel stochastic gradient descent. In Dy, J. G. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, ICML.Google Scholar

Ligett, K., Neel, S., Roth, A., Waggoner, B., and Wu, S. Z. (2017). Accuracy first: Selecting a differential privacy level for accuracy constrained erm. In Advances in Neural Information Processing Systems 30.Google Scholar

Lim, W. Y. B., Luong, N. C., Hoang, D. T., Jiao, Y., Liang, Y.-C., Yang, Q., Niyato, D., and Miao, C. (2019). Federated learning in mobile edge networks: A comprehensive survey. arXiv preprint arXiv:1909.11875.Google Scholar

Lin, T., Stich, S. U., Patel, K. K., and Jaggi, M. (2020). Don’t use large mini-batches, use local SGD. In Proceedings of 8th International Conference on Learning Representations, ICLR.Google Scholar

Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., and Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer.Google Scholar

Lin, Y., Han, S., Mao, H., Wang, Y., and Dally, B. (2018). Deep gradient compression: Reducing the communication bandwidth for distributed training. In Proceedings of 6th International Conference on Learning Representations, ICLR.Google Scholar

Liu, F. and Shroff, N. B. (2019). Data poisoning attacks on stochastic bandits. In Proceedings of the 36th International Conference on Machine Learning.Google Scholar

Liu, W., Zang, X., Li, Y., and Vucetic, B. (2020). Over-the-air computation systems: Optimization, analysis and scaling laws. IEEE Transactions on Wireless Communications, 19(8):5488–5502.Google Scholar

Liu, Y., Shang, F., and Jiao, L. (2019). Accelerated incremental gradient descent using momentum acceleration with scaling factor. In Proceedings of IJCAI International Joint Conference on Artificial Intelligence, 2019-Augus:3045–3051.Google Scholar

Liu, Y., Xu, C., Zhan, Y., Liu, Z., Guan, J., and Zhang, H. (2017). Incentive mechanism for computation offloading using edge computing: A stackelberg game approach. Computer Networks, 129:399–409.Google Scholar

Loshchilov, I. and Hutter, F. (2017). FIXING WEIGHT DECAY REGULARIZATION IN ADAM.Google Scholar

Louizos, C., Reisser, M., Blankevoort, T., Gavves, E., and Welling, M. (2019). Relaxed quantization for discretized neural networks. In 7th International Conference on Learning Representations, ICLR, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.Google Scholar

Lu, Y. and Sa, C. D. (2020). Moniqua: Modulo quantized communication in decentralized SGD.Google Scholar

Luo, L., Xiong, Y., Liu, Y., and Sun, X. (2019). Adaptive gradient methods with dynamic bound of learning rate. In Proceedings of 7th International Conference on Learning Representations, ICLR 2019, (2018):1–19.Google Scholar

Luping, W., Wei, W., and Bo, L. (2019). Cmfl: Mitigating communication overhead for federated learning. In Proceedings of IEEE 39th International Conference on Distributed Computing Systems, ICDCS.Google Scholar

Jagielski, M., Oprea, A., B. B. C. L. C. N.-R. and Li, B. (2018). Manipulating machine learning: Poisoning attacks and countermeasures for regression learning. In 2018 IEEE Symposium on Security and Privacy (SP).Google Scholar

Ma, X., Sun, H., and Qingyang Hu, R. (2020). Scheduling Policy and Power Allocation for Federated Learning in NOMA Based MEC. arXiv e-prints, page arXiv:2006.13044.Google Scholar

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks.Google Scholar

Mahloujifar, S., Mahmoody, M., and Mohammed, A. (2019). Data poisoning attacks in multi-party learning. In Proceedings of the 36th International Conference on Machine Learning.Google Scholar

Maity, R. K., Rawa, A. S., and Mazumdar, A. (2019). Robust gradient descent via moment encoding and ldpc codes. In Proceedings of IEEE International Symposium on Information Theory (ISIT).Google Scholar

Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. (2010). Pregel: A system for large-scale graph processing. In Proc. SIGMOD.Google Scholar

Manessi, F., Rozza, A., Bianco, S., Napoletano, P., and Schettini, R. (2018). Automated pruning for deep neural network compression. In 24th International Conference on Pattern Recognition, ICPR.Google Scholar

Martinez, I., Francis, S., and Hafid, A. S. (2019). Record and reward federated learning contributions with blockchain. In Proc of IEEE CyberC, pages 50–57.Google Scholar

Mathur, A., Lane, N. D., Bhattacharya, S., Boran, A., Forlivesi, C., and Kawsar, F. (2017). Deepeye: Resource efficient local execution of multiple deep vision models using wearable commodity hardware. Proceedings of the 15th Annual International Conference on Mobile Systems, MobiSys, Applications, and Services.Google Scholar

McMahan, B., Moore, E., Ramage, D., Hampson, S., and Arcas, B. A. (2017a). Communication-efficient learning of deep networks from decentralized data. In Proceedings of International Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar

McMahan, B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017b). Communication-efficient learning of deep networks from decentralized data. In Proceedings of Artificial Intelligence and Statistics, AISTATS.Google Scholar

McMahan, H. B., Moore, E., Ramage, D., Hampson, S., et al. (2017c). Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar

McMahan, H. B., Moore, E., Ramage, D., Hampson, S., and y Arcas, B. A. (2017d). Communication-efficient learning of deep networks from decentralized data. In Proc. AISTATS.Google Scholar

McMahan, H. B., Ramage, D., Talwar, K., and Zhang, L. (2017e). Learning differentially private language models without losing accuracy. CoRR, abs/1710.06963.Google Scholar

Melis, L., Song, C., De Cristofaro, E., and Shmatikov, V. (2019). Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy (SP).CrossRef Google Scholar

Meng, Q., Chen, W., Wang, Y., Ma, Z.-M., and Liu, T.-Y. (2019). Convergence analysis of distributed stochastic gradient descent with shuffling. ArXiv, abs/1709.10432.Google Scholar

Mirhoseini, A., Goldie, A., Pham, H., Steiner, B., Le, Q. V., and Dean, J. (2018). A hierarchical model for device placement.Google Scholar

Mirza, M. and Osindero, S. (2014). Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784.Google Scholar

Mohammadi, M., Al-Fuqaha, A., Sorour, S., and Guizani, M. (2018). Deep learning for iot big data and streaming analytics: A survey. IEEE Communications Surveys & Tutorials, 20(4):2923–2960.Google Scholar

Mohassel, P. and Rindal, P. (2018). Aby3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.Google Scholar

Mohri, M., Sivek, G., and Suresh, A. T. (2019). Agnostic federated learning. arXiv preprint arXiv:1902.00146.Google Scholar

Mokhtari, A. and Ribeiro, A. (2016). DSA: decentralized double stochastic averaging gradient algorithm. J. Mach. Learn. Res., 17:61:1–61:35.Google Scholar

Murshed, M. G. S., Murphy, C., Hou, D., Khan, N., Ananthanarayanan, G., and Hussain, F. (2019). Machine learning at the network edge: A survey. pages 1–28.Google Scholar

Nakamoto, S. (2009). Bitcoin: A peer-to-peer electronic cash system. [Online] Available: https://bitcoin.org/bitcoin.pdf.Google Scholar

Nakandala, S., Kumar, A., and Papakonstantinou, Y. (2019). Incremental and approximate inference for faster occlusion-based deep cnn explanations. In Proceedings of the 2019 International Conference on Management of Data, pages 1589–1606.Google Scholar

Nar, K. and Shankar Sastry, S. (2018). Step size matters in deep learning. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS):3436–3444.Google Scholar

Narayanan, D., Harlap, A., Phanishayee, A., Seshadri, V., Devanur, N. R., Ganger, G. R., Gibbons, P. B., and Zaharia, M. (2019). Pipedream: generalized pipeline parallelism for DNN training. In Proceedings of the 27th ACM Symposium on Operating Systems Principles, SOSP, Huntsville, ON, Canada, October 27-30, 2019, pages 1–15.Google Scholar

Nasr, M., Shokri, R., and Houmansadr, A. (2018). Machine learning with membership privacy using adversarial regularization. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security.Google Scholar

Nasr, M., Shokri, R., and Houmansadr, A. (2019). Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy (SP).Google Scholar

Neel, S. and Roth, A. (2018). Mitigating bias in adaptive data gathering via differential privacy. CoRR, abs/1806.02329.Google Scholar

Nesterov, Y. (2012a). Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optimization, 22:341–362.Google Scholar

Nesterov, Y. E. (2012b). Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optimization, 22(2):341–362.Google Scholar

Neter, J., Wasserman, W. J., and Kutner, M. H. (1974). Applied linear statistical models : regression, analysis of variance, and experimental designs.Google Scholar

Nguyen, L. M., van Dijk, M., Phan, D. T., Nguyen, P. H., Weng, T.-w., and Kalagnanam, J. R. (2019). Finite-Sum Smooth Optimization with SARAH. pages 1–26.Google Scholar

Niknam, S., Dhillon, H. S., and Reed, J. H. (2019). Federated learning for wireless communications: Motivation, opportunities and challenges. arXiv preprint arXiv: 1908.06847.Google Scholar

Nishio, T. and Yonetani, R. (2019a). Client selection for federated learning with heterogeneous resources in mobile edge. ICC 2019 – 2019 IEEE International Conference on Communications (ICC).Google Scholar

Nishio, T. and Yonetani, R. (2019b). Client selection for federated learning with heterogeneous resources in mobile edge. In ICC 2019-2019 IEEE International Conference on Communications (ICC), pages 1–7. IEEE.Google Scholar

Ozfatura, E., Gündüz, D., and Ulukus, S. (2019). Speeding up distributed gradient descent by utilizing non-persistent stragglers. In Proceedings of IEEE International Symposium on Information Theory, ISIT.Google Scholar

Panageas, I., Piliouras, G., and Wang, X. (2019). First-order methods almost always avoid saddle points: the case of vanishing step-sizes. (NeurIPS):1–10.Google Scholar

Pandey, S. R., Tran, N. H., Bennis, M., Tun, Y. K., Manzoor, A., and Hong, C. S. (2020). A crowdsourcing framework for on-device federated learning. IEEE Transactions on Wireless Communications, 19(5):3241–3256.Google Scholar

Pang, T., Du, C., Dong, Y., and Zhu, J. (2018). Towards robust detection of adversarial examples. In Advances in Neural Information Processing Systems 31.Google Scholar

Papernot, N., Abadi, M., ÂĺÂšlfar Erlingsson, , Goodfellow, I., and Talwar, K. (2016). Semi-supervised knowledge transfer for deep learning from private training data.Google Scholar

Parashar, A., Rhu, M., Mukkara, A., Puglielli, A., Venkatesan, R., Khailany, B., Emer, J. S., Keckler, S. W., and Dally, W. J. (2017). SCNN: an accelerator for compressed-sparse convolutional neural networks. In Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA.Google Scholar

Park, H., Zhai, S., Lu, L., and Lin, F. X. (2019). Streambox-tz: Secure stream analytics at the edge with trustzone. In 2019 USENIX Annual Technical Conference (USENIX ATC 19).Google Scholar

Park, J. H., Yun, G., Chang, M. Y., Nguyen, N. T., Lee, S., Choi, J., Noh, S. H., and Choi, Y.-r. (2020). Hetpipe: Enabling large {DNN} training on (whimpy) heterogeneous {GPU} clusters through integration of pipelined model parallelism and data parallelism. In 2020 {USENIX} Annual Technical Conference ({USENIX} {ATC} 20), pages 307–321.Google Scholar

Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H., and Kim, Y. (2018). Data synthesis based on generative adversarial networks. Proc. VLDB Endow., 11(10): 1071–1083.Google Scholar

Patel, K. K. and Dieuleveut, A. (2019). Communication trade-offs for synchronized distributed SGD with large step size. (NeurIPS):1–12.Google Scholar

Payman Mohassel, Y. Z. (2017). Secureml: A system for scalable privacy-preserving machine learning. In 2017 IEEE Symposium on Security and Privacy (SP).Google Scholar

Peng, Y., Bao, Y., Chen, Y., Wu, C., and Guo, C. (2018). Optimus: an efficient dynamic resource scheduler for deep learning clusters. In Proceedings of the Thirteenth EuroSys Conference, EuroSys.Google Scholar

Peteiro-Barral, D. and Guijarro-Berdiñas, B. (2013). A survey of methods for distributed machine learning. Progress in Artificial Intelligence, 2(1):1–11.Google Scholar

Pilla, L. (2020). Optimal task assignment to heterogeneous federated learning devices. ArXiv, abs/2010.00239.Google Scholar

Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., and Gulin, A. (2018). Catboost: unbiased boosting with categorical features. In Advances in neural information processing systems, pages 6638–6648.Google Scholar

Qiao, A., Aragam, B., Zhang, B., and Xing, E. P. (2019). Fault tolerance in iterative-convergent machine learning. In Proceedings of the 36th International Conference on Machine Learning ICML, pages 5220–5230.Google Scholar

Raviv, N., Tandon, R., Dimakis, A., and Tamo, I. (2018). Gradient coding from cyclic MDS codes and expander graphs. In Proceedings of the 35th International Conference on Machine Learning, ICML, pages 4302–4310.Google Scholar

Reddi, S. J., Kale, S., and Kumar, S. (2018). On the convergence of Adam and beyond. in Proceedings of 6th International Conference on Learning Representations, ICLR, pages 1–23.Google Scholar

Ren, J., Yu, G., and Ding, G. (2019a). Accelerating dnn training in wireless federated edge learning system.Google Scholar

Ren, S., Zhang, Z., Liu, S., Zhou, M., and Ma, S. (2019b). Unsupervised Neural Machine Translation with SMT as Posterior Regularization. in Proceedings of the AAAI Conference on Artificial Intelligence, 33:241–248.Google Scholar

Rob Hall, S. E. F. and Nardi, Y. (2011). Secure multiple linear regression based on homomorphic encryption. Journal of Official Statistics, 27(4):669–691.Google Scholar

Robbins, H. E. (2007). A stochastic approximation method. Annals of Mathematical Statistics, 22:400–407.Google Scholar

Sakr, C., Wang, N., Chen, C., Choi, J., Agrawal, A., Shanbhag, N. R., and Gopalakrishnan, K. (2019). Accumulation bit-width scaling for ultra-low precision training of deep networks. In 7th International Conference on Learning Representations, ICLR, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.Google Scholar

Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., and Chen, L.-C. (2018). Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4510–4520.Google Scholar

Sarikaya, Y. and Ercetin, O. (2019). Motivating workers in federated learning: A stackelberg game perspective. IEEE Networking Letters, 2(1):23–27.Google Scholar

Sattler, F., Müller, K.-R., and Samek, W. (2019a). Clustered federated learning: Model-agnostic distributed multi-task optimization under privacy constraints. arXiv preprint arXiv:1910.01991.Google Scholar

Sattler, F., Wiedemann, S., Müller, K.-R., and Samek, W. (2019b). Robust and communication-efficient federated learning from non-iid data. IEEE transactions on neural networks and learning systems.Google Scholar

Schein, A., Wu, Z. S., Schofield, A., Zhou, M., and Wallach, H. (2018). Locally private bayesian inference for count models.Google Scholar

Schmidt, M. W., Roux, N. L., and Bach, F. R. (2017). Minimizing finite sums with the stochastic average gradient. Mathematical Programming, 162:83–112.Google Scholar

Seide, F., Fu, H., Droppo, J., Li, G., and Yu, D. (2014a). 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In Proceedings of 15th Annual Conference of the International Speech Communication Association, INTERSPEECH.Google Scholar

Seide, F., Fu, H., Droppo, J., Li, G., and Yu, D. (2014b). 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech dnns. In Fifteenth Annual Conference of the International Speech Communication Association.Google Scholar

Sergeev, A. and Balso, M. D. (2018). Horovod: fast and easy distributed deep learning in tensorflow. ArXiv, abs/1802.05799.Google Scholar

Sery, T., Shlezinger, N., Cohen, K., and Eldar, Y. C. (2020). Over-the-air federated learning from heterogeneous data. ArXiv, abs/2009.12787.Google Scholar

Sharif-Nassab, A., Salehkaleybar, S., and Golestani, S. J. (2019). Order optimal one-shot distributed learning. In Proceedings of Advances in Neural Information Processing Systems, NeurIPS.Google Scholar

Shen, Y. and Sanghavi, S. (2018). Iteratively learning from the best. CoRR, abs/1810.11874.Google Scholar

Shi, S., Zhao, K., Wang, Q., Tang, Z., and Chu, X. (2019a). A convergence analysis of distributed SGD with communication-efficient gradient sparsification. In Kraus, S., editor, Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019, Macao, China, August 10-16, 2019, pages 3411–3417. ijcai.org.Google Scholar

Shi, S., Zhao, K., Wang, Q., Tang, Z., and Chu, X. (2019b). A convergence analysis of distributed SGD with communication-efficient gradient sparsification. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI).Google Scholar

Shi, W., Cao, J., Zhang, Q., Li, Y., and Xu, L. (2016). Edge computing: Vision and challenges. IEEE Internet of Things Journal, 3:637–646.Google Scholar

Shi, W., Ling, Q., Wu, G., and Yin, W. (2015). EXTRA: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim., 25(2):944–966.Google Scholar

Shi, W., Ling, Q., Yuan, K., Wu, G., and Yin, W. (2014). On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process., 62(7):1750–1761.Google Scholar

Shi, W., Zhou, S., and Niu, Z. (2019c). Device scheduling with fast convergence for wireless federated learning.Google Scholar

Shoham, N., Avidor, T., Keren, A., Israel, N., Benditkis, D., Mor-Yosef, L., and Zeitak, I. (2019). Overcoming forgetting in federated learning on non-iid data. arXiv preprint arXiv:1910.07796.Google Scholar

Shokri, R., Stronati, M., Song, C., and Shmatikov, V. (2017). Membership inference attacks against machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP).Google Scholar

Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar

Singh, N., Data, D., George, J., and Diggavi, S. (2020). Squarm-sgd: Communication-efficient momentum sgd for decentralized optimization. arXiv preprint arXiv: 2005.07041.Google Scholar

Smith, A., Thakurta, A., and Upadhyay, J. (2017). Is interaction necessary for distributed private learning? In 2017 IEEE Symposium on Security and Privacy (SP).Google Scholar

Song, C., Ristenpart, T., and Shmatikov, V. (2017). Machine learning models that remember too much. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security.Google Scholar

Song, S., Lichtenberg, S. P., and Xiao, J. (2015). Sun rgb-d: A rgb-d scene understanding benchmark suite. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 567–576.Google Scholar

Song, T., Tong, Y., and Wei, S. (2019). Profit allocation for federated learning. In Proc. of IEEE Big Data, pages 2577–2586.Google Scholar

Staib, M., Reddi, S., Kale, S., Kumar, S., and Sra, S. (2019). Escaping saddle points with adaptive gradient methods. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:10420–10454.Google Scholar

Steinhardt, J., Koh, P. W. W., and Liang, P. S. (2017). Certified defenses for data poisoning attacks. In Advances in Neural Information Processing Systems 30.Google Scholar

Stich, S. U. (2019). Local SGD converges fast and communicates little. In Proceedings of 7th International Conference on Learning Representations, ICLR.Google Scholar

Stich, S. U., Cordonnier, J., and Jaggi, M. (2018a). Sparsified SGD with memory. In Proceedings of Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar

Stich, S. U., Cordonnier, J.-B., and Jaggi, M. (2018b). Sparsified sgd with memory. In Advances in Neural Information Processing Systems, pages 4447–4458.Google Scholar

Streeter, M. (2019). Learning optimal linear regularizers. in Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:10489–10498.Google Scholar

Sun, J., Chen, T., Giannakis, G., and Yang, Z. (2019). Communication-efficient distributed learning via lazily aggregated quantized gradients. In Proceedings of Advances in Neural Information Processing Systems, NeurIPS.Google Scholar

Sun, S., Chen, W., Bian, J., Liu, X., and Liu, T. (2018). Slim-dp: A multi-agent system for communication-efficient distributed deep learning. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, AAMAS 2018, Stockholm, Sweden, July 10-15, 2018, pages 721–729.Google Scholar

Sun, Y., Zhou, S., and GÃijndÃijz, D. (2020). Energy-aware analog aggregation for federated learning with redundant data. In ICC 2020 – 2020 IEEE International Conference on Communications (ICC), pages 1–7.Google Scholar

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1–9.Google Scholar

Tan, T., Yin, S., Liu, K., and Wan, M. (2019). On the convergence speed of AMSGRAD and beyond. in Proceedings of International Conference on Tools with Artificial Intelligence, ICTAI, 2019-Novem:464–470.Google Scholar

Tandon, R., Lei, Q., Dimakis, A. G., and Karampatziakis, N. (2017). Gradient coding: Avoiding stragglers in distributed learning. In Proceedings of the 34th International Conference on Machine Learning, ICML.Google Scholar

Tang, H., Gan, S., Zhang, C., Zhang, T., and Liu, J. (2018a). Communication compression for decentralized training. In Bengio, S., Wallach, H. M., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, 3-8 December 2018, Montréal, Canada, pages 7663–7673.Google Scholar

Tang, H., Gan, S., Zhang, C., Zhang, T., and Liu, J. (2018b). Communication compression for decentralized training. In Advances in Neural Information Processing Systems, pages 7652–7662.Google Scholar

Tang, H., Lian, X., Qiu, S., Yuan, L., Zhang, C., Zhang, T., and Liu, J. (2019a). Deepsqueeze: Decentralization meets error-compensated compression. arXiv, pages arXiv–1907.Google Scholar

Tang, H., Yu, C., Lian, X., Zhang, T., and Liu, J. (2019b). Doublesqueeze: Parallel stochastic gradient descent with double-pass error-compensated compression. In Chaud-huri, K. and Salakhutdinov, R., editors, Proceedings of the 36th International Conference on Machine Learning, ICML.Google Scholar

Tang, H., Yu, C., Lian, X., Zhang, T., and Liu, J. (2019c). Doublesqueeze: Parallel stochastic gradient descent with double-pass error-compensated compression. In International Conference on Machine Learning, pages 6155–6165. PMLR.Google Scholar

Tao, G., Ma, S., Liu, Y., and Zhang, X. (2018). Attacks meet interpretability: Attribute-steered detection of adversarial samples. In Advances in Neural Information Processing Systems 31.Google Scholar

Tian, L. and Gu, Q. (2017). Communication-efficient distributed sparse linear discriminant analysis. In Singh, A. and Zhu, X. J., editors, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017, 20-22 April 2017, Fort Lauderdale, FL, USA, volume 54 of Proceedings of Machine Learning Research, pages 1178–1187. PMLR.Google Scholar

Tong, L., Yu, S., Alfeld, S., and Vorobeychik, Y. (2018). Adversarial regression with multiple learners. CoRR, abs/1806.02256.Google Scholar

TramÂĺÂĺr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., and McDaniel, P. (2017). Ensemble adversarial training: Attacks and defenses.Google Scholar

Tran, N. H., Bao, W., Zomaya, A., Nguyen, M. N. H., and Hong, C. S. (2019). Federated learning over wireless networks: Optimization model design and analysis. In IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, pages 1387–1395.Google Scholar

Tuncer, O., Leung, V. J., and Coskun, A. K. (2015). Pacmap: Topology mapping of unstructured communication patterns onto non-contiguous allocations. In Proceedings of the 29th ACM on International Conference on Supercomputing, ICS.Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.Google Scholar

Venkataramani, S., Ranjan, A., Banerjee, S., Das, D., Avancha, S., Jagannathan, A., Durg, A., Nagaraj, D., Kaul, B., Dubey, P., and Raghunathan, A. (2017). Scaledeep: A scalable compute architecture for learning and evaluating deep networks. In Proc. ISCA.Google Scholar

Vepakomma, P., Swedish, T., Raskar, R., Gupta, O., and Dubey, A. (2018). No peek: A survey of private distributed deep learning.Google Scholar

Verbeke, J., Nadgir, N., Ruetsch, G., and Sharapov, I. (2002). Framework for peer-to-peer distributed computing in a heterogeneous, decentralized environment. In Parashar, M., editor, Grid Computing — GRID 2002, pages 1–12, Berlin, Heidelberg. Springer Berlin Heidelberg.Google Scholar

Viswanathan, R., Ananthanarayanan, G., and Akella, A. (2016). Clarinet: Wan-aware optimization for analytics queries. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 435–450.Google Scholar

Vogels, T., Karimireddy, S. P., and Jaggi, M. (2019). Powersgd: Practical low-rank gradient compression for distributed optimization. In Proceedings of Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar

Vu, T. T., Ngo, D. T., Tran, N. H., Ngo, H. Q., Dao, M. N., and Middleton, R. H. (2020). Cell-free massive MIMO for wireless federated learning. IEEE Transactions on Wireless Communications, Early Access.Google Scholar

Wadu, M. M., Samarakoon, S., and Bennis, M. (2020). Federated learning under channel uncertainty: Joint client scheduling and resource allocation. In 2020 IEEE Wireless Communications and Networking Conference (WCNC), pages 1–6.Google Scholar

Wang, D., Chen, C., and Xu, J. (2019a). Differentially private empirical risk minimization with non-convex loss functions. In Proceedings of the 36th International Conference on Machine Learning.Google Scholar

Wang, D., Gaboardi, M., and Xu, J. (2018a). Empirical risk minimization in non-interactive local differential privacy revisited. In Advances in Neural Information Processing Systems 31.Google Scholar

Wang, D., Ye, M., and Xu, J. (2017a). Differentially private empirical risk minimization revisited: Faster and more general. In Advances in Neural Information Processing Systems 30.Google Scholar

Wang, H., Guo, S., Tang, B., Li, R., and Li, C. (2019b). Heterogeneity-aware gradient coding for straggler tolerance. In 39th IEEE International Conference on Distributed Computing Systems, ICDCS.Google Scholar

Wang, H., Qu, Z., Guo, S., Gao, X., Li, R., and Ye, B. “Intermittent Pulling with Local Compensation for Communication-Efficient Distributed Learning,” IEEE Transactions on Emerging Topics in Computing, 2020, DOI: 10.1109/TETC.2020.3043300, Preprint.Google Scholar

Wang, H., Zhou, R., and Shen, Y.-D. (2019c). Bounding Uncertainty for Active Batch Selection. in Proceedings of the AAAI Conference on Artificial Intelligence, 33:5240– 5247.Google Scholar

Wang, J., Tantia, V., Ballas, N., and Rabbat, M. (2019d). SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum.Google Scholar

Wang, J., Tantia, V., Ballas, N., and Rabbat, M. (2019e). Slowmo: Improving communication-efficient distributed sgd with slow momentum. arXiv preprint arXiv:1910.00643.Google Scholar

Wang, K., Li, H., Maharjan, S., Zhang, Y., and Guo, S. (2018b). Green energy scheduling for demand side management in the smart grid. IEEE Transactions on Green Communications & Networking, pages 596–611.Google Scholar

Wang, K., Xu, C., and Guo, S. (2017b). Big data analytics for price forecasting in smart grids. In Global Communications Conference.Google Scholar

Wang, K., Xu, C., Zhang, Y., Guo, S., and Zomaya, A. Y. (2019f). Robust big data analytics for electricity price forecasting in the smart grid. IEEE Transactions on Big Data, 5(1):34–45.Google Scholar

Wang, L., Yang, Y., Min, R., and Chakradhar, S. (2017c). Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Networks, 93: 219–229.Google Scholar

Wang, M., Fang, E. X., and Liu, H. (2017d). Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions. Mathematical Programming, 161:419–449.Google Scholar

Wang, M., Liu, J., and Fang, E. X. (2017e). Accelerating stochastic composition optimization. J. Mach. Learn. Res., 18:105:1–105:23.Google Scholar

Wang, S., Chen, M., Yin, C., Saad, W., Hong, C. S., Cui, S., and Poor, H. V. (2020b). Federated learning for task and resource allocation in wireless high altitude balloon networks.Google Scholar

Wang, S., Li, D., Cheng, Y., Geng, J., Wang, Y., Wang, S., Xia, S.-T., and Wu, J. (2018c). Bml: A high-performance, low-cost gradient synchronization algorithm for dml training. In Advances in Neural Information Processing Systems, pages 4238–4248.Google Scholar

Wang, S., Li, D., Cheng, Y., Geng, J., Wang, Y., Wang, S., Xia, S.-T., and Wu, J. (2018d). Bml: A high-performance, low-cost gradient synchronization algorithm for dml training. In Proc. NeurIPS.Google Scholar

Wang, S., Pi, A., and Zhou, X. (2019g). Scalable Distributed DL Training: Batching Communication and Computation. In Proceedings of the AAAI Conference on Artificial Intelligence, 33:5289–5296.Google Scholar

Wang, S., Pi, A., and Zhou, X. (2019h). Scalable distributed dl training: Batching communication and computation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 33, pages 5289–5296.Google Scholar

Wang, S., Sun, J., and Xu, Z. (2019i). HyperAdam: A Learnable Task-Adaptive Adam for Network Training. In Proceedings of the AAAI Conference on Artificial Intelligence, 33:5297–5304.Google Scholar

Wang, Z. (2019). SpiderBoost and Momentum : Faster Stochastic Variance Reduction Algorithms. In Proceedings of NeurIPS2019, (NeurIPS).Google Scholar

Wangni, J., Wang, J., Liu, J., and Zhang, T. (2018a). Gradient sparsification for communication-efficient distributed optimization. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS).Google Scholar

Wangni, J., Wang, J., Liu, J., and Zhang, T. (2018b). Gradient sparsification for communication-efficient distributed optimization. In Advances in Neural Information Processing Systems, pages 1299–1309.Google Scholar

Ward, R., Wu, X., and Bottou, L. (2019). Adagrad stepsizes: Sharp convergence over nonconvex landscapes. In Proceedings of 36th International Conference on Machine Learning, ICML 2019, 2019-June:11574–11583.Google Scholar

Wen, W., Xu, C., Yan, F., Wu, C., Wang, Y., Chen, Y., and Li, H. (2017a). Terngrad: Ternary gradients to reduce communication in distributed deep learning. In Guyon, I., von Luxburg, U., Bengio, S., Wallach, H. M., Fergus, R., Vishwanathan, S. V. N., and Garnett, R., editors, Proceedings of Annual Conference on Neural Information Processing Systems, NeurIPS.Google Scholar

Wen, W., Xu, C., Yan, F., Wu, C., Wang, Y., Chen, Y., and Li, H. (2017b). Terngrad: Ternary gradients to reduce communication in distributed deep learning. In Advances in neural information processing systems, pages 1509–1519.Google Scholar

Williams, R. J. and Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural computation, 1(2):270–280.Google Scholar

Woodworth, B. E., Patel, K. K., Stich, S. U., Dai, Z., Bullins, B., McMahan, H. B., Shamir, O., and Srebro, N. (2020). Is local SGD better than minibatch sgd? CoRR.Google Scholar

Wu, F., He, S., Yang, Y., Wang, H., Qu, Z., and Guo, S. (2020a). On the convergence of quantized parallel restarted sgd for serverless learning. CoRR.Google Scholar

Wu, J., Huang, W., Huang, J., and Zhang, T. (2018a). Error compensated quantized sgd and its applications to large-scale distributed optimization. arXiv preprint arXiv:1806.08054.Google Scholar

Wu, L., Li, S., Hsieh, C.-J., and Sharpnack, J. (2019). Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers. (NeurIPS):1–11.Google Scholar

Wu, T., Yuan, K., Ling, Q., Yin, W., and Sayed, A. H. (2018b). Decentralized consensus optimization with asynchrony and delays. IEEE Trans. Signal Inf. Process. over Networks, 4(2):293–307.Google Scholar

Wu, X., Ward, R., and Bottou, L. (2018c). WNGrad: Learn the Learning Rate in Gradient Descent. pages 1–16.Google Scholar

Xia, W., Quek, T. Q. S., Guo, K., Wen, W., Yang, H. H., and Zhu, H. (2020). Multi-armed bandit based client scheduling for federated learning. IEEE Transactions on Wireless Communications, pages 1–1.Google Scholar

Xiao, L. and Zhang, T. (2014). A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optimization, 24:2057–2075.Google Scholar

Xiao, W., Bhardwaj, R., Ramjee, R., Sivathanu, M., Kwatra, N., Han, Z., Patel, P., Peng, X., Zhao, H., Zhang, Q., Yang, F., and Zhou, L. (2018). Gandiva: Introspective cluster scheduling for deep learning. In Proc. OSDI.Google Scholar

Xie, C., Koyejo, O., and Gupta, I. (2018). Zeno: Byzantine-suspicious stochastic gradient descent. CoRR, abs/1805.10032.Google Scholar

Xie, P., Kim, J. K., Zhou, Y., Ho, Q., Kumar, A., Yu, Y., and Xing, E. (2016). Lighter-communication distributed machine learning via sufficient factor broadcasting. In Proc. UAI.Google Scholar

Xie, P., Kim, J. K., Zhou, Y., Ho, Q., Kumar, A., Yu, Y., and Xing, E. P. (2014). Distributed machine learning via sufficient factor broadcasting. arXiv, abs/1511.08486.Google Scholar

Xie, S., Girshick, R. B., Dollár, P., Tu, Z., and He, K. (2017). Aggregated residual transformations for deep neural networks. In Proc. CVPR, pages 5987–5995.Google Scholar

Xing, E. P., Ho, Q., Xie, P., and Dai, W. (2015). Strategies and principles of distributed machine learning on big data. CoRR.Google Scholar

Xing, E. P., Ho, Q., Xie, P., and Wei, D. (2016). Strategies and principles of distributed machine learning on big data. Engineering, 2(2):179–195.Google Scholar

Xing, H., Simeone, O., and Bi, S. (2020). Decentralized federated learning via sgd over wireless d2d networks. In 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pages 1–5.Google Scholar

Xu, J. and Wang, H. (2020). Client selection and bandwidth allocation in wireless federated learning networks: A long-term perspective.Google Scholar

Xu, S., Zhang, H., Neubig, G., Dai, W., Kim, J. K., Deng, Z., Ho, Q., Yang, G., and Xing, E. P. (2018). Cavs: An efficient runtime system for dynamic neural networks. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 937–950.Google Scholar

Xu, Y., Dong, X., Li, Y., and Su, H. (2019). A main/subsidiary network framework for simplifying binary neural networks. In in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pages 7154–7162.Google Scholar

Xu, Z., Zhang, Y., Fu, C., Liu, L., and Guo, S. (2020). Back shape measurement and three-dimensional reconstruction of spinal shape using one kinect sensor. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI).Google Scholar

Yan, Z., Guo, Y., and Zhang, C. (2018). Deep defense: Training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems 31.Google Scholar

Yang, D., Xue, G., Fang, X., and Tang, J. (2016). Incentive mechanisms for crowd-sensing: Crowdsourcing with smartphones. IEEE/ACM Transactions on Networking, 24(3):1732–1744.Google Scholar

Yang, H. H., Arafa, A., Quek, T. Q. S., and Vincent Poor, H. (2020). Age-based scheduling policy for federated learning in mobile edge networks. In ICASSP 2020 – 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 8743–8747.Google Scholar

Yang, H. H., Liu, Z., Quek, T. Q. S., and Poor, H. V. (2019a). Scheduling policies for federated learning in wireless networks.Google Scholar

Yang, K., Jiang, T., Shi, Y., and Ding, Z. (2020). Federated learning via over-the-air computation. IEEE Transactions on Wireless Communications, 19(3):2022–2035.Google Scholar

Yang, K., Shi, Y., Zhou, Y., Yang, Z., Fu, L., and Chen, W. (2020). Federated machine learning for intelligent iot via reconfigurable intelligent surface. IEEE Network, 34(5):16–22.Google Scholar

Yang, Q., Liu, Y., Chen, T., and Tong, Y. (2019b). Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1–19.Google Scholar

Yang, Y., Zhang, G., Katabi, D., and Xu, Z. (2019c). Me-net: Towards effective adversarial robustness with matrix estimation. CoRR, abs/1905.11971.Google Scholar

Yang, Z., Chen, M., Saad, W., Hong, C. S., and Shikh-Bahaei, M. (2019d). Energy efficient federated learning over wireless communication networks.Google Scholar

Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., and Le, Q. V. (2019e). Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems, pages 5754–5764.Google Scholar

Yeganeh, Y., Farshad, A., Navab, N., and Albarqouni, S. (2020). Inverse distance aggregation for federated learning with non-iid data. arXiv preprint arXiv:2008.07665.Google Scholar

Yeh, T. T., Sabne, A., Sakdhnagool, P., Eigenmann, R., and Rogers, T. G. (2019). Pagoda: A gpu runtime system for narrow tasks. ACM Transactions on Parallel Computing (TOPC), 6(4):1–23.Google Scholar

Yin, D., Chen, Y., Ramchandran, K., and Bartlett, P. L. (2018a). Byzantine-robust distributed learning: Towards optimal statistical rates. CoRR, abs/1803.01498.Google Scholar

Yin, D., Chen, Y., Ramchandran, K., and Bartlett, P. L. (2018b). Defending against saddle point attack in byzantine-robust distributed learning. CoRR, abs/1806.05358.Google Scholar

Yin, D., Pananjady, A., Lam, M., Papailiopoulos, D. S., Ramchandran, K., and Bartlett, P. L. (2018c). Gradient diversity: a key ingredient for scalable distributed learning. In Proceedings of International Conference on Artificial Intelligence and Statistics, AISTATS.Google Scholar

Ying, B., Yuan, K., Vlaski, S., and Sayed, A. H. (2019). Stochastic Learning Under Random Reshuffling With Constant Step-Sizes. In IEEE Transactions on Signal Processing, volume 67, pages 474–489.Google Scholar

Yong, H., Huang, J., Hua, X., and Zhang, L. (2020). Gradient Centralization: A New Optimization Technique for Deep Neural Networks.Google Scholar

You, K., Long, M., Wang, J., and Jordan, M. I. (2019). How Does Learning Rate Decay Help Modern Neural Networks?Google Scholar

You, Y., Li, J., Reddi, S. J., Hseu, J., Kumar, S., Bhojanapalli, S., Song, X., Demmel, J., Keutzer, K., and Hsieh, C. (2020a). Large batch optimization for deep learning: Training BERT in 76 minutes. In Proceedings of 8th International Conference on Learning Representations, ICLR.Google Scholar

You, Y., Wang, Y., Zhang, H., Zhang, Z., Demmel, J., and Hsieh, C. (2020b). The limit of the batch size. CoRR.Google Scholar

You, Y., Zhang, Z., Hsieh, C.-J., Demmel, J., and Keutzer, K. (2018). Imagenet training in minutes. In Proceedings of the 47th International Conference on Parallel Processing, ICPP.Google Scholar

Yu, L., Liu, L., Pu, C., Gursoy, M. E., and Truex, S. (2019). Differentially private model publishing for deep learning. In 2019 IEEE Symposium on Security and Privacy (SP).Google Scholar

Yu, P. and Chowdhury, M. (2019). Salus: Fine-grained gpu sharing primitives for deep learning applications. CoRR.Google Scholar

Yu, Q., Li, S., Raviv, N., Kalan, S. M. M., Soltanolkotabi, M., and Avestimehr, A. S. (2019). Lagrange coded computing: Optimal design for resiliency, security, and privacy. In Proceedings of The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS.Google Scholar

Yuan, J. and Yu, S. (2014). Privacy preserving back-propagation neural network learning made practical with cloud computing. IEEE Transactions on Parallel and Distributed Systems, 25(1):212–221.Google Scholar

Yuan, K., Ling, Q., and Yin, W. (2016). On the convergence of decentralized gradient descent. SIAM J. Optim., 26(3):1835–1854.Google Scholar

Yuan, X., Feng, Z., Norton, M., and Li, X. (2019). Generalized Batch Normalization: Towards Accelerating Deep Neural Networks. in Proceedings of the AAAI Conference on Artificial Intelligence, 33:1682–1689.Google Scholar

Yue, Yu, Jiaxiang, Wu, L. H. (2019). Double quantization for communication-efficient distributed optimization. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), pages 4440–4451.Google Scholar

Yurochkin, M., Agarwal, M., Ghosh, S., Greenewald, K. H., Hoang, T. N., and Khazaeni, Y. (2019). Bayesian nonparametric federated learning of neural networks. In Proceedings of the 36th International Conference on Machine Learning, ICML.Google Scholar

Zaheer, M., Reddi, S. J., Sachan, D., Kale, S., and Kumar, S. (2018). Adaptive methods for nonconvex optimization. Advances in Neural Information Processing Systems, 2018-Decem(NeurIPS):9793–9803.Google Scholar

Zeng, D., Gu, L., Lian, L., Guo, S., Yao, H., and Hu, J. (2016). On cost-efficient sensor placement for contaminant detection in water distribution systems. IEEE Transactions on Industrial Informatics, 12(6):2177–2185.Google Scholar

Zeng, Q., Du, Y., Leung, K. K., and Huang, K. (2019). Energy-efficient radio resource allocation for federated edge learning.Google Scholar

Zeng, R., Zhang, S., Wang, J., and Chu, X. (2020a). Fmore: An incentive scheme of multidimensional auction for federated learning in mec. arXiv preprint arXiv:2002.09699.Google Scholar

Zeng, T., Semiari, O., Mozaffari, M., Chen, M., Saad, W., and Bennis, M. (2020b). Federated learning in the sky: Joint power allocation and scheduling with uav swarms.Google Scholar

Zhan, Y. and Zhang, J. (2020). An incentive mechanism design for efficient edge learning by deep reinforcement learning approach. In Proc. of IEEE INFOCOM, pages 2489– 2498.Google Scholar

Zhang, C., Öztireli, C., Mandt, S., and Salvi, G. (2019a). Active Mini-Batch Sampling Using Repulsive Point Processes. in Proceedings of the AAAI Conference on Artificial Intelligence, 33:5741–5748.Google Scholar

Zhang, G., Li, L., Nado, Z., Martens, J., Sachdeva, S., Dahl, G. E., Shallue, C. J., and Grosse, R. (2019b). Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model. (NeurIPS):1–12.Google Scholar

Zhang, H., Li, J., Kara, K., Alistarh, D., Liu, J., and Zhang, C. (2017a). Zipml: Training linear models with end-to-end low precision, and a little bit of deep learning. In Precup, D. and Teh, Y. W., editors, Proceedings of the 34th International Conference on Machine Learning, ICML.Google Scholar

Zhang, H., Zheng, Z., Xu, S., Dai, W., Ho, Q., Liang, X., Hu, Z., Wei, J., Xie, P., and Xing, E. P. (2017b). Poseidon: An efficient communication architecture for distributed deep learning on GPU clusters. In Proc. ATC.Google Scholar

Zhang, J., Hong, Z., Qiu, X., Zhan, Y., Guo, S., and Chen, W. (2020). Skychain: A deep reinforcement learning-empowered dynamic blockchain sharding system. In International Conference on Parallel Processing (ICPP.Google Scholar

Zhang, M., Rajbhandari, S., Wang, W., and He, Y. (2018). Deepcpu: Serving rnn-based deep learning models 10x faster. In 2018 USENIX Annual Technical Conference (USENIX ATC 18), pages 951–965.Google Scholar

Zhang, N. and Tao, M. (2020). Gradient statistics aware power control for over-the-air federated learning.Google Scholar

Zhang, N. and Tao, M. (2020). Gradient statistics aware power control for over-the-air federated learning in fading channels. In 2020 IEEE International Conference on Communications Workshops (ICC Workshops), pages 1–6.Google Scholar

Zhang, Q., Yang, L. T., and Chen, Z. (2016). Privacy preserving deep computation model on cloud for big data feature learning. IEEE Transactions on Computers, 65(5):1351– 1362.Google Scholar

Zhang, X., Zhao, R., Yan, J., Gao, M., Qiao, Y., Wang, X., and Li, H. (2019c). P2SGRAD: Refined gradients for optimizing deep face models. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June:9898–9906.Google Scholar

Zhang, Y., Qu, H., Chen, C., and Metaxas, D. (2019d). Taming the noisy gradient: Train deep neural networks with small batch sizes. in Proceedings of IJCAI International Joint Conference on Artificial Intelligence, 2019-Augus:4348–4354.Google Scholar

Zhao, J. (2018). Distributed deep learning under differential privacy with the teacher-student paradigm. In Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence.Google Scholar

Zhao, S., Xie, Y., Gao, H., and Li, W. (2019a). Global momentum compression for sparse communication in distributed SGD. CoRR, abs/1905.12948.Google Scholar

Zhao, T., Zhang, Y., and Olukotun, K. (2019b). Serving recurrent neural networks efficiently with a spatial accelerator.Google Scholar

Zheng, K., Mou, W., and Wang, L. (2017a). Collect at once, use effectively: Making non-interactive locally private learning possible. In Proceedings of the 34th International Conference on Machine Learning - Volume 70.Google Scholar

Zheng, S., Huang, Z., and Kwok, J. (2019). Communication-efficient distributed block-wise momentum sgd with error-feedback. In Advances in Neural Information Processing Systems, pages 11450–11460.Google Scholar

Zheng, W., Popa, R. A., Gonzalez, J. E., and Stoica, I. (2019). Helen: Maliciously secure coopetitive learning for linear models. In 2019 IEEE Symposium on Security and Privacy (SP).Google Scholar

Zheng, Z. and Hong, P. (2018). Robust detection of adversarial attacks by modeling the intrinsic properties of deep neural networks. In Advances in Neural Information Processing Systems 31.Google Scholar

Zheng, Z., Xie, S., Dai, H., Chen, X., and Wang, H. (2017b). An overview of blockchain technology: Architecture, consensus, and future trends. In 2017 IEEE International Congress on Big Data (BigData Congress).Google Scholar

Zhou, F. and Cong, G. (2018). On the convergence properties of a k-step averaging stochastic gradient descent algorithm for nonconvex optimization. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI.Google Scholar

Zhou, Z., Chen, X., Li, E., Zeng, L., Luo, K., and Zhang, J. (2019). Edge intelligence: Paving the last mile of artificial intelligence with edge computing. Proceedings of the IEEE, 107(8):1738–1762.Google Scholar

Zhu, G., Du, Y., Gündüz, D., and Huang, K. (2020). One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis. CoRR, abs/2001.05713.Google Scholar

Zhu, G., Liu, D., Du, Y., You, C., Zhang, J., and Huang, K. (2018). Towards an intelligent edge: Wireless communication meets machine learning. CoRR, abs/1809.00343.Google Scholar

Zinkevich, M., Weimer, M., Li, L., and Smola, A. J. (2010). Parallelized stochastic gradient descent. In Advances in neural information processing systems, pages 2595– 2603.Google Scholar

Zou, F., Shen, L., Jie, Z., Zhang, W., and Liu, W. (2019). A sufficient condition for convergences of adam and rmsprop. in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June(1):11119–11127.Google Scholar

Accessibility standard: Unknown

Accessibility compliance for the PDF of this book is currently unknown and may be updated in the future.

Book contents

Bibliography

Summary

Information

Access options

Book purchase

Temporarily unavailable

References

Accessibility standard: Unknown

Save book to Kindle

Save book to Dropbox

Save book to Google Drive