gms | German Medical Science

63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

02. - 06.09.2018, Osnabrück

Beyond empirical – advanced estimation methods for optimal cutpoints

Meeting Abstract

Search Medline for

  • Christian Thiele - Hochschule Osnabrück, Osnabrück, Deutschland
  • Gerrit Hirschfeld - Hochschule Osnabrück, Osnabrück, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Osnabrück, 02.-06.09.2018. Düsseldorf: German Medical Science GMS Publishing House; 2018. DocAbstr. 268

doi: 10.3205/18gmds108, urn:nbn:de:0183-18gmds1083

Published: August 27, 2018

© 2018 Thiele et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: In many clinical settings, continuous biomarkers (or recently predicted probabilities from multivariate models) have to be converted into a binary classification to aid clinical decision making. Determining the best or “optimal” cutoff value to separate the two classes based on the data in a specific sample is not trivial. While several sophisticated methods for estimating such cutpoints have been proposed [1], [2] most studies use an empirical method where the cutpoint is deemed “optimal” that has the best performance in the sample. In this simulation study we introduce two new estimation methods for optimal cutpoints, spline smoothing and smoothing via Generalized Additive Models, and compare these to the already available ones.

Methods: Seven estimation methods to estimate the cutpoint that optimizes the Youden-Index (parametric normal, empirical, Kernel, spline smoothing, GAM smoothing, bootstrapping and LOESS) were compared on normally and Gamma distributed data at sample sizes between 30 and 1000 and at four different effect sizes as represented by Youden-Index values of 0.2, 0.4, 0.6 and 0.8. Each simulation was repeated 10000 times and the estimation errors were summarized. The simulation was performed in R using the cutpointr package.

Results: The simple empirical method that maximizes a metric in-sample without smoothing generally suffers from high variance, regardless of the simulated scenario. If distributional assumptions are met, the normal method is superior to all other data-driven methods. For nonnormal data, bootstrapping and GAM smoothing perform even better than established methods. Bootstrapping is superior to GAM smoothing if the sample size is small and / or the effect is small, but with increasing sample size and effect size GAM smoothing delivers the lowest variance. If midpoints between observations instead of the exact values are used as optimal cutpoints, all methods are unbiased.

Discussion: The recent publication of the cutpointr package makes several advanced methods for estimating optimal cutpoints accessible. While all of these easily outperform the empirical method, our simulation study allows some suggestions as to what specific methods should be used. Bootstrapping is generally superior to the empirical method and easy to implement and understand. GAM smoothing offers easy interpretability by generating a smoothed version of the empirical method, does not necessarily need tuning of estimation parameters and is the best method with larger data and effect sizes. If normally distributed data can be assumed, the parametric normal method performs best.

Acknowledgements: The study was supported by the German Federal Ministry for Education and Research to GH (BMBF #01EK1501).

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cutoff point. Biometrical Journal. 2005;47(4):458–472.
2.
Leeflang MM, Moons KG, Reitsma JB, Zwinderman AH. Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions. Clinical Chemistry. 2008;(4):729–738.