Dropout training for SVMs with data augmentation

Chen, Ning; Zhu, Jun; Chen, Jianfei; Chen, Ting

doi:10.1007/s11704-018-7314-7

Dropout training for SVMs with data augmentation

Research Article
Published: 23 June 2018

Volume 12, pages 694–713, (2018)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Ning Chen¹,
Jun Zhu²,
Jianfei Chen² &
…
Ting Chen²

137 Accesses
5 Citations
Explore all metrics

Abstract

Dropout and other feature noising schemes have shown promise in controlling over-fitting by artificially corrupting the training data. Though extensive studies have been performed for generalized linear models, little has been done for support vector machines (SVMs), one of the most successful approaches for supervised learning. This paper presents dropout training for both linear SVMs and the nonlinear extension with latent representation learning. For linear SVMs, to deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques. Our algorithm iteratively minimizes the expectation of a reweighted least square problem, where the re-weights are analytically updated. For nonlinear latent SVMs, we consider learning one layer of latent representations in SVMs and extend the data augmentation technique in conjunction with first-order Taylor-expansion to deal with the intractable expected hinge loss and the nonlinearity of latent representations. Finally, we apply the similar data augmentation ideas to develop a new IRLS algorithm for the expected logistic loss under corrupting distributions, and we further develop a non-linear extension of logistic regression by incorporating one layer of latent representations. Our algorithms offer insights on the connection and difference between the hinge loss and logistic loss in dropout training. Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of both linear and nonlinear SVMs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Connor Shorten & Taghi M. Khoshgoftaar

A survey on semi-supervised learning

Article Open access 15 November 2019

Jesper E. van Engelen & Holger H. Hoos

Fundamentals of Artificial Neural Networks and Deep Learning

References

Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15: 1929–1958
MathSciNet MATH Google Scholar
Wager S, Wang S, Liang P. Dropout training as adaptive regularization. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013
Google Scholar
Maaten L V, Chen M, Tyree S, Weinberger K Q. Learning with marginalized corrupted features. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 410–418
Google Scholar
Wang S, Wang M Q, Wager S, Liang P, Manning C D. Feature noising for log-linear structured prediction. In: Proceedings of Conference on Empirical Methods on Natural Language Processing. 2013, 1170–1179
Google Scholar
Wang S, Manning C. Fast dropout training. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 777–785
Google Scholar
Wang H, Shi X J, Yeung D Y. Relational stacked denoising autoencoder for tag recommendation. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015, 3052–3058
Google Scholar
Vapnik V. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995
Book MATH Google Scholar
Burges C J C, Scholkopf B. Improving the accuracy and speed of support vector machines. In: Proceedings of Advances in Neural Information Processing Systems. 1997, 375–381
Google Scholar
Globerson A, Roweis S. Nightmare at test time: robust learning by feature deletion. In: Proceedings of the 23rd International Conference on Machine Learning. 2006, 353–360
Google Scholar
Dekel O, Shamir O. Learning to classify with missing and corrupted features. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 149–178
Google Scholar
Teo C H, Globerson A, Roweis S T, Smola A K. Convex learning with invariances. In: Proceedings of Advances in Neural Information Processing Systems. 2008, 1489–1496
Google Scholar
Polson N G, Scott S L. Data augmentation for support vector machines. Bayesian Analysis, 2011, 6(1): 1–24
Article MathSciNet MATH Google Scholar
Polson N G, Scott J G, Windle J. Bayesian inference for logistic models using Polya-Gamma latent variables. Journal of the American Statistical Association, 2013, 108(504): 1339–1349
Article MathSciNet MATH Google Scholar
Rosasco L, De Vito E, Caponnetto A, Piana M, Verri A. Are loss functions all the same? Neural Computation, 2004, 16(5): 1063–1076
Article MATH Google Scholar
Globerson A, Koo T Y, Carreras X, Collins M. Exponentiated gradient algorithms for log-linear structured prediction. In: Proceedings of the 24th International Conference on Machine Learning. 2007, 305–312
Google Scholar
Baldi P, Sadowski P. The dropout learning algorithm. Artificial Intelligence, 2014, 210(5): 78–122
Article MathSciNet MATH Google Scholar
Srivastava N, Hinton G E, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15: 1929–1958
MathSciNet MATH Google Scholar
Srivastava N. Improving neural networks with dropout. Dissertation for the Master Degree. Toronto: University of Toronto, 2013
Google Scholar
Huang G, Song S J, Gupta J N D, Wu C. Semi-supervised and unsupervised extreme learning machines. IEEE Transactions on Cybernetics, 2014, 44(12): 2405–2417
Article Google Scholar
Van Erven T, Kotlowski W, Warmuth M K. Follow the leader with dropout perturbations. Proceedings of Machine Learning Research, 2014, 35: 949–974
Google Scholar
Xu P Y, Sarikaya R. Targeted feature dropout for robust slot filling in natural language understanding. In: Proceedings of the 15th Annual Conference of the International Speech Communication Association. 2014, 258–262
Google Scholar
Rashmi R K, Gilad-Bachrach R. Dart: dropouts meet multiple additive regression trees. In: Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. 2015, 489–497
Google Scholar
Chen M M, Xu Z X, Weinberger K, Sha F. Marginalized denoising autoencoders for domain adaptation. In: Proceedings of International Conference on Machine Learning. 2012, 767–774
Google Scholar
Chen M M, Weinberger K, Sha F, Bengio Y. Marginalized denoising autoencoders for nonlinear representation. In: Proceedings of the 31st International Conference on Machine Learning. 2014, 3342–3350
Google Scholar
Chen Z, Chen M M, Weinberger K Q, Zhang W X. Marginalized denoising for link prediction and multi-label learning. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015, 1707–1713
Google Scholar
Chen Z, Zhang W X. A marginalized denoising method for link prediction in relational data. In: Proceedings of the SIAM International Conference on Data Mining. 2014, 298–306
Google Scholar
Chen M M, Zheng A, Weinberger K. Fast image tagging. In: Proceedings of International Conference on Machine Learning. 2013, 2311–2319
Google Scholar
Qian Q, Hu J H, Jin R, Pei J, Zhu S H. Distance metric learning using dropout: a structured regularization approach. In: Proceedings of the 20th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2014, 323–332
Google Scholar
Wager S, Fithian W, Wang S, Liang P S. Altitude training: strong bounds for single-layer dropout. In: Proceedings of Advances in Neural Information Processing Systems. 2014, 100–108
Google Scholar
Bachman P, Alsharif O, Precup D. Learning with pseudo-ensembles. In: Proceedings of Advances in Neural Information Processing Systems. 2014, 3365–3373
Google Scholar
Helmbold D P, Long P M. On the inductive bias of dropout. Journal of Machine Learning Research, 2015, 16: 3403–3454
MathSciNet MATH Google Scholar
Maeda S. A Bayesian encourages dropout. 2014, arXiv:1412.7003v3
Google Scholar
Gal Y, Ghahramani Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: Proceedings of International Conference on Machine Learning. 2016, 1651–1660
Google Scholar
Chen N, Zhu J, Chen J F, Zhang B. Dropout training for support vector machines. In: Proceedings of the 28th AAAI Conference on Artificial Intelligence. 2014, 1752–1759
Google Scholar
Vincent P, Larochelle H, Bengio Y, Manzagol P A. Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning. 2008, 1096–1103
Google Scholar
Saul L K, Jaakkola T, Jordan M I. Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 1996, 4: 61–76
MATH Google Scholar
Zhu J, Chen N, Perkins H, Zhang B. Gibbs max-margin topic models with data augmentation. Journal of Machine Learning Research, 2014, 15: 1073–1110
MathSciNet MATH Google Scholar
Devroye L. Non-Uniform Random Variate Generation. New York: Springer-Verlag, 1986
Book MATH Google Scholar
Liu D C, Nocedal J. On the limited memory BFGS method for large scale optimization. Mathematical Programming, 1989, 45(3): 503–528
Article MathSciNet MATH Google Scholar
Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer, 2009
Book MATH Google Scholar
Bengio Y, Courville A, Vincent P. Representation learning: a review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8): 1798–1828
Article Google Scholar
Guo J, Che W X, Yarowsky D, Wang H F, Liu T. A distributed representation-based framework for cross-lingual transfer parsing. Journal of Artificial Intelligence Research, 2016, 55: 995–1023
MathSciNet Google Scholar
Smola A J, Scholkopf B. A tutorial on support vector regression. Statistics and Computing, 2003, 14(3): 199–222
Article MathSciNet Google Scholar
Chen N, Zhu J, Xia F, Zhang B. Generalized relational topic models with data augmentation. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 1273–1279
Google Scholar
Blitzer J, Dredze M, Pereira F. Biographies, bollywood, boom-boxes and blenders: domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics. 2007, 440–447
Google Scholar
Torralba A, Fergus R, Freeman W. A large dataset for non-parametric object and scene recognition. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2008, 30(11): 1958–1970
Article Google Scholar
Krizhevsky A. Learning multiple layers of features from tiny images. Technical Report. 2009
Google Scholar
Zhu J, Xing E P. Conditional topic random fields. In: Proceedings of International Conference on Machine Learning. 2010, 1239–1246
Google Scholar
Rifkin R, Klautau A. In defense of one-vs-all classification. Journal of Machine Learning Research, 2004, (5): 101–141
MathSciNet MATH Google Scholar
Blei D, McAuliffe J D. Supervised topic models. In: Proceedings of Advances in Neural Information Processing Systems. 2007
Google Scholar
Tang Y. Deep learning with linear support vector machines. In: Proceedings of ICML workshop on Representational Learning. 2013
Google Scholar
Kingma D P, Welling M. Efficient gradient-based inference through transformations between bayes nets and neural nets. In: Proceedings of International Conference on Machine Learning. 2014, 3791–3799
Google Scholar
Bacon P L, Bengio E, Pineau J, Precup D. Conditional computation in neural networks using a decision-theoretic approach. In: Proceedings of the 2nd Multidisciplinary Conference on Reinforcement Learning and Decision Making. 2015
Google Scholar

Download references

Author information

Authors and Affiliations

MOE Key lab of Bioinformatics, Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST, Tsinghua University, Beijing, 100084, China
Ning Chen
State Key Lab of Intelligent Technology and Systems, Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Jun Zhu, Jianfei Chen & Ting Chen

Authors

Ning Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jianfei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ting Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ning Chen.

Additional information

Ning Chen received her PhD degree in the Department of Computer Science and Technology at Tsinghua University, China, where she is currently an assistant researcher. She was a visiting researcher in the Machine Learning Department of Carnegie Mellon University, USA. Her research interests are primarily in machine learning, especially probabilistic graphical models with applications on data mining and bioinformatics.

Jun Zhu received his BS, MS and PhD degrees all from the Department of Computer Science and Technology at Tsinghua University, China, where he is currently an associate professor. He was a project scientist and postdoctoral fellow in the Machine Learning Department, Carnegie Mellon University, USA. His research interests focus on developing machine learning methods to understand scientific/ engineering data arising from various fields. He is a member of the IEEE.

Jianfei Chen received his BS degree from Department of Computer Science and Technology, Tsinghua University, China, where he is currently a PhD student. His research interests are primarily in machine learning, especially on probabilistic graphical models, Bayesian nonparametrics and data mining problems such as social networks.

Ting Chen received his BS degree in computer science from Tsinghua University, China in 1993 and PhD degree from SUNY Stony Brook, USA in 1997. He is currently a professor in Tsinghua National Lab for Information Science and Technology. He was a professor of biological sciences, computer science, and mathematics at University of Southern California, USA. His research interests are in applying machine learning and computer algorithms to answer questions in biology and medicine.

Electronic supplementary material

Supplementary material, approximately 230 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, N., Zhu, J., Chen, J. et al. Dropout training for SVMs with data augmentation. Front. Comput. Sci. 12, 694–713 (2018). https://doi.org/10.1007/s11704-018-7314-7

Download citation

Received: 07 September 2017
Accepted: 25 September 2017
Published: 23 June 2018
Issue Date: August 2018
DOI: https://doi.org/10.1007/s11704-018-7314-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Dropout training for SVMs with data augmentation

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

A survey on semi-supervised learning

Fundamentals of Artificial Neural Networks and Deep Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 230 KB.

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dropout training for SVMs with data augmentation

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

A survey on semi-supervised learning

Fundamentals of Artificial Neural Networks and Deep Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 230 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation