Faster discovery of faster system configurations with spectral learning

Nair, Vivek; Menzies, Tim; Siegmund, Norbert; Apel, Sven

doi:10.1007/s10515-017-0225-2

Faster discovery of faster system configurations with spectral learning

Published: 30 August 2017

Volume 25, pages 247–277, (2018)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Vivek Nair ORCID: orcid.org/0000-0002-7318-1530¹,
Tim Menzies¹,
Norbert Siegmund² &
…
Sven Apel³

31 Citations
Explore all metrics

Abstract

Despite the huge spread and economical importance of configurable software systems, there is unsatisfactory support in utilizing the full potential of these systems with respect to finding performance-optimal configurations. Prior work on predicting the performance of software configurations suffered from either (a) requiring far too many sample configurations or (b) large variances in their predictions. Both these problems can be avoided using the WHAT spectral learner. WHAT’s innovation is the use of the spectrum (eigenvalues) of the distance matrix between the configurations of a configurable software system, to perform dimensionality reduction. Within that reduced configuration space, many closely associated configurations can be studied by executing only a few sample configurations. For the subject systems studied here, a few dozen samples yield accurate and stable predictors—less than 10% prediction error, with a standard deviation of less than 2%. When compared to the state of the art, WHAT (a) requires 2–10 times fewer samples to achieve similar prediction accuracies, and (b) its predictions are more stable (i.e., have lower standard deviation). Furthermore, we demonstrate that predictive models generated by WHAT can be used by optimizers to discover system configurations that closely approach the optimal performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data-efficient performance learning for configurable systems

Article 20 November 2017

Tradeoffs in modeling performance of highly configurable software systems

Article 08 February 2018

Configuration Optimization with Limited Functional Impact

Notes

In this paper, we concentrate on Boolean options, as they make up the majority of all options; see Siegmund et al. (2015), for how to incorporate numeric options.
Though, in practice, this can be very difficult. For example, in models like the Linux Kernel such an enumeration is practically impossible (Sayyad et al. 2013).
The projection of \(N_i\) can be calculated in the following way:\(a = dist ( East , N_i);\quad b = dist ( West , N_i);\quad x_i = \sqrt{\frac{a^2 - b^2 + c ^2}{2 c }}\).
In our study, \( dist \) accepts pair of configuration (\(\mathbf {C}\)) and returns the distance between them. If \(x_i\) and \(y_i\) \(\in \mathbb {R}^n\), then the distance function would be same as the standard Euclidean distance.
Also known as response surface methods, meta models, or emulators.
http://openscience.us/repo/performance-predict/cpm.html.
Just to clarify one frequently asked question about this work, we note that our rig “studies” 40% of the data. We do not mean that our predictive models require accessing the performance scores from the 40% of the data. Rather, by “study” we mean reflect on a sample of configurations to determine what minimal subset of that sample deserves to be compiled and executed.
WHERE is an approximation of the first principal component
Another challenge of having high dimensional search space is the amount of noise induced by irrelevant dimensions.
http://openscience.us/repo/performance-predict/cpm.html.

References

Bettenburg, N., Nagappan, M., Hassan, A.E.: Think locally, act globally: improving defect and effort prediction models. In: Proceedings of IEEE Working Conference on Mining Software Repositories, pp. 60–69. IEEE (2012)
Bettenburg, N., Nagappan, M., Hassan, A.E.: Towards improving statistical modeling of software engineering data: think locally, act globally!. Empir. Softw. Eng. 20(2), 294–335 (2015)
Article Google Scholar
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM (2001)
Boley, Daniel: Principal direction divisive partitioning. Data Min. Knowl. Discov. 2(4), 325–344 (1998)
Article Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC, Boca Raton (1984)
Burges, C., Christopher, J.: Dimension reduction: a guided tour. Foundations and trends. Mach. Learn. 2(4), 275–365 (2010)
Article MATH Google Scholar
Chen, J., Nair, V., Krishna, R., Menzies, T.: Is “sampling” better than “evolution” for search-based software engineering? arXiv:1608.07617 (2016)
Dasgupta, S.: Experiments with random projection. In: Proceedings of conference on Uncertainty in Artificial Intelligence, pp. 143–151. Morgan Kaufmann Publishers Inc. (2000)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. Trans. Evol. Comput. 6(2), 182–197 (2002)
Article Google Scholar
Deiters, C., Rausch, A., Schindler, M.: Using spectral clustering to automate identification and optimization of component structures. In: Proceedings of International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, pp. 14–20. IEEE (2013)
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)
Article Google Scholar
Du, Qian, James, E.Fowler: Low-complexity principal component analysis for hyperspectral image compression. Int. J. High Perform. Comput. Appl. 22(4), 438–448 (2008)
Article Google Scholar
Efron, Bradley, Tibshirani, Robert J.: An Introduction to the Bootstrap. CRC, Boca Raton (1994)
MATH Google Scholar
Faloutsos, C., Lin, K.I.: Fastmap.: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of International Conference on Management of Data, pp. 163–174. ACM (1995)
Fletcher, R.: Practical Methods of Optimization. Wiley, New York (2013)
MATH Google Scholar
Ghotra, B., McIntosh, S., Hassan, A.E.: Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of International Conference on Software Engineering, pp. 789–800. IEEE (2015)
Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D Nonlinear Phenom. 9(1–2), 189–208 (1983)
Article MathSciNet MATH Google Scholar
Guo, J., Czarnecki, K., Apel, S., Siegmund, N., Wasowski, A.: Variability-aware performance prediction: a statistical learning approach. In: Proceedings of International Conference on Automated Software Engineering, pp. 301–311. IEEE (2013)
Hamerly, G.: Making k-means even faster. In: Proceedings of International Conference on Data Mining, pp. 130–140. SIAM (2010)
Harman, M., Mansouri, S.A., Zhang, Y.: Search-based software engineering: trends, techniques and applications. ACM Comput. Surv. 45(1), 11 (2012)
Article Google Scholar
Hinton, G.E.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: Proceedings of International Conference on Data Mining Workshops (ICDMW), pp. 587–594. IEEE (2012)
Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11(Jul), 1957–2000 (2010)
MathSciNet MATH Google Scholar
Jolliffe, Ian: Principal Component Analysis. Wiley, New York (2002)
MATH Google Scholar
Kamvar, K., Sepandar, S., Klein, K., Dan, D., Manning, M., Christopher, C.: Spectral learning. In: Proceedings of International Joint Conference of Artificial Intelligence. Stanford InfoLab (2003)
Krall, J., Menzies, T., Davies, M.: Gale: geometric active learning for search-based software engineering. Trans. Softw. Eng. 41(10), 1001–1018 (2015)
Kuhn, D.R., Kacker, R.N., Lei, Y.: Introduction to Combinatorial Testing. CRC, Boca Raton (2013)
MATH Google Scholar
Loshchilov, I.G.: Surrogate-assisted evolutionary algorithms. PhD thesis, Citeseer (2013)
Menzies, T., Butcher, A., Cok, D., Marcus, A., Layman, L., Shull, F., Turhan, B., Zimmermann, T.: Local versus global lessons for defect prediction and effort estimation. Trans. Softw. Eng. 39(6), 822–834 (2013)
Article Google Scholar
Mittas, N., Angelis, L.: Ranking and clustering software cost estimation models through a multiple comparisons algorithm. Trans. Softw. Eng. 39(4), 537–551 (2013)
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)
Platt, J.: Fastmap, metricmap, and landmark mds are all nystrom algorithms. In: Proceedings of International Conference on Artificial Intelligence and Statistics (2005)
Pukelsheim, F.: Optimal Design of Experiments. SIAM, Philadelphia (2006)
Book MATH Google Scholar
Sarkar, A., Guo, J., Siegmund, N., Apel, S., Czarnecki, K.: Cost-efficient sampling for performance prediction of configurable systems. In: Proceedings of International Conference on Automated Software Engineering, pp. 342–352. IEEE (2015)
Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: a straw to break the camel’s back. In: Proceedings of International Conference on Automated Software Engineering, pp. 465–474. IEEE (2013)
Shi, Jianbo, Malik, Jitendra: Trans. Pattern Anal. Mach. Intell. Normalized cuts and image segmentation 22(8), 888–905 (2000)
Google Scholar
Siegmund, N., Kolesnikov, S.S., Kästner, C., Apel, S., Batory, D., Rosenmüller, M., Gunter, S.: Predicting performance via automated feature-interaction detection. In: Proceedings of International Conference on Software Engineering, pp. 167–177. IEEE (2012)
Siegmund, J., Siegmund, N., Apel, S.: Views on internal and external validity in empirical software engineering. In: Proceedings of International Conference on Software Engineering, vol. 1, pp. 9–19. IEEE (2015)
Siegmund, N., Grebhahn, A., Apel, S., Kästner, C.: Performance-influence models for highly configurable systems. In: Proceedings of International Conference on Foundations of Software Engineering, pp. 284–294. ACM (2015)
Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997)
Article MathSciNet MATH Google Scholar
Theisen, C., Herzig, K., Morrison, P., Murphy, B., Williams, L.: Approximating attack surfaces with stack traces. In: Proceedings of International Conference on Software Engineering, pp. 199–208. IEEE (2015)
Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000)
Google Scholar
Weiss, G.M., Tian, Y.: Maximizing classifier utility when there are data acquisition and modeling costs. Data Min. Knowl. Discov. 17(2), 253–282 (2008)
Article MathSciNet Google Scholar
Xu, T., Jin, L., Fan, X., Zhou, Y., Pasupathy, S., Talwadker, R.: Hey, you have given me too many knobs!: Understanding and dealing with over-designed configuration in system software. In: Proceedings of International Conference on Foundations of Software Engineering, pp. 307–319. ACM (2015)
Zhang, F., Zheng, Q., Zou, Y., Hassan, A.E.: Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of International Conference on Software Engineering, pp. 309–320. ACM (2016)
Zhang, Y., Guo, J., Blais, E., Czarnecki, K.: Performance prediction of configurable software systems by fourier learning. In: Proceedings of International Conference on Automated Software Engineering, pp. 365–373. IEEE (2015)
Zuluaga, M., Sergent, G., Krause, A., Püschel, M.: Active learning for multi-objective optimization. In: Proceedings of International Conference in Machine Learning, vol. 28, pp. 462–470. ICML (2013)

Download references

Acknowledgements

The work is partially funded by NSF awards #1506586. Sven Apel’s work has been supported by the German Research Foundation (AP 206/4 and AP 206/6). Norbert Siegmund’s work has been supported by the German Research Foundation (SI 2171/2).

Author information

Authors and Affiliations

North Carolina State University, Raleigh, NC, USA
Vivek Nair & Tim Menzies
Bauhaus University of Weimar, Weimar, Germany
Norbert Siegmund
University of Passau, Passau, Germany
Sven Apel

Authors

Vivek Nair
View author publications
You can also search for this author in PubMed Google Scholar
Tim Menzies
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Siegmund
View author publications
You can also search for this author in PubMed Google Scholar
Sven Apel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vivek Nair.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nair, V., Menzies, T., Siegmund, N. et al. Faster discovery of faster system configurations with spectral learning. Autom Softw Eng 25, 247–277 (2018). https://doi.org/10.1007/s10515-017-0225-2

Download citation

Received: 06 September 2016
Accepted: 14 August 2017
Published: 30 August 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10515-017-0225-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Faster discovery of faster system configurations with spectral learning

Abstract

Access this article

Similar content being viewed by others

Data-efficient performance learning for configurable systems

Tradeoffs in modeling performance of highly configurable software systems

Configuration Optimization with Limited Functional Impact

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Faster discovery of faster system configurations with spectral learning

Abstract

Access this article

Similar content being viewed by others

Data-efficient performance learning for configurable systems

Tradeoffs in modeling performance of highly configurable software systems

Configuration Optimization with Limited Functional Impact

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation