gms | German Medical Science

63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

02. - 06.09.2018, Osnabrück

Performing multiple comparisons on clustered and overdispersed count data

Meeting Abstract

Search Medline for

  • Jochen Kruppa - Charité - Universitätsmedizin Berlin, Berlin, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Osnabrück, 02.-06.09.2018. Düsseldorf: German Medical Science GMS Publishing House; 2018. DocAbstr. 143

doi: 10.3205/18gmds109, urn:nbn:de:0183-18gmds1098

Published: August 27, 2018

© 2018 Kruppa.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Count data occurs frequently in many scientific fields: ecology, pharmacology, toxicology, and genetics. Multiple comparison between different factors are one possibility of analyze those data sets. Mostly these factors consists of a treatment effect, including a control. Further, the biological samples are often assigned to different clusters, like blocks, litters or plants. Therefore, the repeated measurement design must be considered for the model parameter estimation. In the case of count data especially overdispersion can be observed and should be considered to avoid an increase of the type one error.

Methods: In our work we run simulation studies on several different data settings using different multiple contrast tests with parameter estimates from generalized estimation equations and generalized linear mixed models to observe the coverage and the rejection probabilities. We generated the overdispersed, clustered count data in a small sample size settings, which can be observed in many biological settings.

Results: We found that the generalized estimation equations outperform the generalized linear mixed models, if the variance sandwich estimator is correctly specified. Further, generalized linear mixed models show under specific data settings problems with the convergence rate, but model implementations with less implications exists. Finally, we demonstrate the application of the multiple contrast test and the problems of ignoring severe overdispersion on a genetic data example.

Discussion: In this work we are able to show the extension of Orelien et al. [1] to a broad range of contrast tests: Dunnett, Tukey, Williams, and Changepoint. Moreover, we have shown that the analysis of clustered overdispersed count data with different model approaches is easy to apply. The application of multiple contrast test using the model estimates of gee models and generalized linear mixed models holds the family wise error rate sufficient in a broad range of settings. While the gee model needs more effort to choose the right variance sandwich estimator for the given data problem, the generalized linear models show a lack of convergence rates in some very small sample size settings.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Orelien JG, Zhai J, Morris R, Cohn R. An Approach to Performing Multiple Comparisons with a Control in Gee Models. Comput Stat Data Anal. 2002;31(1):87–105.