Skip to main content
Log in

Faster discovery of faster system configurations with spectral learning

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

Despite the huge spread and economical importance of configurable software systems, there is unsatisfactory support in utilizing the full potential of these systems with respect to finding performance-optimal configurations. Prior work on predicting the performance of software configurations suffered from either (a) requiring far too many sample configurations or (b) large variances in their predictions. Both these problems can be avoided using the WHAT spectral learner. WHAT’s innovation is the use of the spectrum (eigenvalues) of the distance matrix between the configurations of a configurable software system, to perform dimensionality reduction. Within that reduced configuration space, many closely associated configurations can be studied by executing only a few sample configurations. For the subject systems studied here, a few dozen samples yield accurate and stable predictors—less than 10% prediction error, with a standard deviation of less than 2%. When compared to the state of the art, WHAT (a) requires 2–10 times fewer samples to achieve similar prediction accuracies, and (b) its predictions are more stable (i.e., have lower standard deviation). Furthermore, we demonstrate that predictive models generated by WHAT can be used by optimizers to discover system configurations that closely approach the optimal performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. In this paper, we concentrate on Boolean options, as they make up the majority of all options; see Siegmund et al. (2015), for how to incorporate numeric options.

  2. Though, in practice, this can be very difficult. For example, in models like the Linux Kernel such an enumeration is practically impossible (Sayyad et al. 2013).

  3. The projection of \(N_i\) can be calculated in the following way:\(a = dist ( East , N_i);\quad b = dist ( West , N_i);\quad x_i = \sqrt{\frac{a^2 - b^2 + c ^2}{2 c }}\).

  4. In our study, \( dist \) accepts pair of configuration (\(\mathbf {C}\)) and returns the distance between them. If \(x_i\) and \(y_i\) \(\in \mathbb {R}^n\), then the distance function would be same as the standard Euclidean distance.

  5. Also known as response surface methods, meta models, or emulators.

  6. http://openscience.us/repo/performance-predict/cpm.html.

  7. Just to clarify one frequently asked question about this work, we note that our rig “studies” 40% of the data. We do not mean that our predictive models require accessing the performance scores from the 40% of the data. Rather, by “study” we mean reflect on a sample of configurations to determine what minimal subset of that sample deserves to be compiled and executed.

  8. WHERE is an approximation of the first principal component

  9. Another challenge of having high dimensional search space is the amount of noise induced by irrelevant dimensions.

  10. http://openscience.us/repo/performance-predict/cpm.html.

References

  • Bettenburg, N., Nagappan, M., Hassan, A.E.: Think locally, act globally: improving defect and effort prediction models. In: Proceedings of IEEE Working Conference on Mining Software Repositories, pp. 60–69. IEEE (2012)

  • Bettenburg, N., Nagappan, M., Hassan, A.E.: Towards improving statistical modeling of software engineering data: think locally, act globally!. Empir. Softw. Eng. 20(2), 294–335 (2015)

    Article  Google Scholar 

  • Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 245–250. ACM (2001)

  • Boley, Daniel: Principal direction divisive partitioning. Data Min. Knowl. Discov. 2(4), 325–344 (1998)

    Article  Google Scholar 

  • Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC, Boca Raton (1984)

  • Burges, C., Christopher, J.: Dimension reduction: a guided tour. Foundations and trends. Mach. Learn. 2(4), 275–365 (2010)

    Article  MATH  Google Scholar 

  • Chen, J., Nair, V., Krishna, R., Menzies, T.: Is “sampling” better than “evolution” for search-based software engineering? arXiv:1608.07617 (2016)

  • Dasgupta, S.: Experiments with random projection. In: Proceedings of conference on Uncertainty in Artificial Intelligence, pp. 143–151. Morgan Kaufmann Publishers Inc. (2000)

  • Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.A.M.T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. Trans. Evol. Comput. 6(2), 182–197 (2002)

    Article  Google Scholar 

  • Deiters, C., Rausch, A., Schindler, M.: Using spectral clustering to automate identification and optimization of component structures. In: Proceedings of International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering, pp. 14–20. IEEE (2013)

  • Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)

    Article  Google Scholar 

  • Du, Qian, James, E.Fowler: Low-complexity principal component analysis for hyperspectral image compression. Int. J. High Perform. Comput. Appl. 22(4), 438–448 (2008)

    Article  Google Scholar 

  • Efron, Bradley, Tibshirani, Robert J.: An Introduction to the Bootstrap. CRC, Boca Raton (1994)

    MATH  Google Scholar 

  • Faloutsos, C., Lin, K.I.: Fastmap.: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of International Conference on Management of Data, pp. 163–174. ACM (1995)

  • Fletcher, R.: Practical Methods of Optimization. Wiley, New York (2013)

    MATH  Google Scholar 

  • Ghotra, B., McIntosh, S., Hassan, A.E.: Revisiting the impact of classification techniques on the performance of defect prediction models. In: Proceedings of International Conference on Software Engineering, pp. 789–800. IEEE (2015)

  • Grassberger, P., Procaccia, I.: Measuring the strangeness of strange attractors. Physica D Nonlinear Phenom. 9(1–2), 189–208 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  • Guo, J., Czarnecki, K., Apel, S., Siegmund, N., Wasowski, A.: Variability-aware performance prediction: a statistical learning approach. In: Proceedings of International Conference on Automated Software Engineering, pp. 301–311. IEEE (2013)

  • Hamerly, G.: Making k-means even faster. In: Proceedings of International Conference on Data Mining, pp. 130–140. SIAM (2010)

  • Harman, M., Mansouri, S.A., Zhang, Y.: Search-based software engineering: trends, techniques and applications. ACM Comput. Surv. 45(1), 11 (2012)

    Article  Google Scholar 

  • Hinton, G.E.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Houle, M.E., Kashima, H., Nett, M.: Generalized expansion dimension. In: Proceedings of International Conference on Data Mining Workshops (ICDMW), pp. 587–594. IEEE (2012)

  • Ilin, A., Raiko, T.: Practical approaches to principal component analysis in the presence of missing values. J. Mach. Learn. Res. 11(Jul), 1957–2000 (2010)

    MathSciNet  MATH  Google Scholar 

  • Jolliffe, Ian: Principal Component Analysis. Wiley, New York (2002)

    MATH  Google Scholar 

  • Kamvar, K., Sepandar, S., Klein, K., Dan, D., Manning, M., Christopher, C.: Spectral learning. In: Proceedings of International Joint Conference of Artificial Intelligence. Stanford InfoLab (2003)

  • Krall, J., Menzies, T., Davies, M.: Gale: geometric active learning for search-based software engineering. Trans. Softw. Eng. 41(10), 1001–1018 (2015)

  • Kuhn, D.R., Kacker, R.N., Lei, Y.: Introduction to Combinatorial Testing. CRC, Boca Raton (2013)

    MATH  Google Scholar 

  • Loshchilov, I.G.: Surrogate-assisted evolutionary algorithms. PhD thesis, Citeseer (2013)

  • Menzies, T., Butcher, A., Cok, D., Marcus, A., Layman, L., Shull, F., Turhan, B., Zimmermann, T.: Local versus global lessons for defect prediction and effort estimation. Trans. Softw. Eng. 39(6), 822–834 (2013)

    Article  Google Scholar 

  • Mittas, N., Angelis, L.: Ranking and clustering software cost estimation models through a multiple comparisons algorithm. Trans. Softw. Eng. 39(4), 537–551 (2013)

    Article  Google Scholar 

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12(Oct), 2825–2830 (2011)

  • Platt, J.: Fastmap, metricmap, and landmark mds are all nystrom algorithms. In: Proceedings of International Conference on Artificial Intelligence and Statistics (2005)

  • Pukelsheim, F.: Optimal Design of Experiments. SIAM, Philadelphia (2006)

    Book  MATH  Google Scholar 

  • Sarkar, A., Guo, J., Siegmund, N., Apel, S., Czarnecki, K.: Cost-efficient sampling for performance prediction of configurable systems. In: Proceedings of International Conference on Automated Software Engineering, pp. 342–352. IEEE (2015)

  • Sayyad, A.S., Ingram, J., Menzies, T., Ammar, H.: Scalable product line configuration: a straw to break the camel’s back. In: Proceedings of International Conference on Automated Software Engineering, pp. 465–474. IEEE (2013)

  • Shi, Jianbo, Malik, Jitendra: Trans. Pattern Anal. Mach. Intell. Normalized cuts and image segmentation 22(8), 888–905 (2000)

    Google Scholar 

  • Siegmund, N., Kolesnikov, S.S., Kästner, C., Apel, S., Batory, D., Rosenmüller, M., Gunter, S.: Predicting performance via automated feature-interaction detection. In: Proceedings of International Conference on Software Engineering, pp. 167–177. IEEE (2012)

  • Siegmund, J., Siegmund, N., Apel, S.: Views on internal and external validity in empirical software engineering. In: Proceedings of International Conference on Software Engineering, vol. 1, pp. 9–19. IEEE (2015)

  • Siegmund, N., Grebhahn, A., Apel, S., Kästner, C.: Performance-influence models for highly configurable systems. In: Proceedings of International Conference on Foundations of Software Engineering, pp. 284–294. ACM (2015)

  • Storn, R., Price, K.: Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Theisen, C., Herzig, K., Morrison, P., Murphy, B., Williams, L.: Approximating attack surfaces with stack traces. In: Proceedings of International Conference on Software Engineering, pp. 199–208. IEEE (2015)

  • Vargha, A., Delaney, H.D.: A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J. Educ. Behav. Stat. 25(2), 101–132 (2000)

    Google Scholar 

  • Weiss, G.M., Tian, Y.: Maximizing classifier utility when there are data acquisition and modeling costs. Data Min. Knowl. Discov. 17(2), 253–282 (2008)

    Article  MathSciNet  Google Scholar 

  • Xu, T., Jin, L., Fan, X., Zhou, Y., Pasupathy, S., Talwadker, R.: Hey, you have given me too many knobs!: Understanding and dealing with over-designed configuration in system software. In: Proceedings of International Conference on Foundations of Software Engineering, pp. 307–319. ACM (2015)

  • Zhang, F., Zheng, Q., Zou, Y., Hassan, A.E.: Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of International Conference on Software Engineering, pp. 309–320. ACM (2016)

  • Zhang, Y., Guo, J., Blais, E., Czarnecki, K.: Performance prediction of configurable software systems by fourier learning. In: Proceedings of International Conference on Automated Software Engineering, pp. 365–373. IEEE (2015)

  • Zuluaga, M., Sergent, G., Krause, A., Püschel, M.: Active learning for multi-objective optimization. In: Proceedings of International Conference in Machine Learning, vol. 28, pp. 462–470. ICML (2013)

Download references

Acknowledgements

The work is partially funded by NSF awards #1506586. Sven Apel’s work has been supported by the German Research Foundation (AP 206/4 and AP 206/6). Norbert Siegmund’s work has been supported by the German Research Foundation (SI 2171/2).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vivek Nair.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nair, V., Menzies, T., Siegmund, N. et al. Faster discovery of faster system configurations with spectral learning. Autom Softw Eng 25, 247–277 (2018). https://doi.org/10.1007/s10515-017-0225-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-017-0225-2

Keywords

Navigation