A Mixture of Coalesced Generalized Hyperbolic Distributions

Tortora, Cristina; Franczak, Brian C.; Browne, Ryan P.; McNicholas, Paul D.

doi:10.1007/s00357-019-09319-3

A Mixture of Coalesced Generalized Hyperbolic Distributions

Published: 22 April 2019

Volume 36, pages 26–57, (2019)
Cite this article

Journal of Classification Aims and scope Submit manuscript

Cristina Tortora¹,
Brian C. Franczak²,
Ryan P. Browne³ &
…
Paul D. McNicholas ORCID: orcid.org/0000-0002-2482-523X⁴

378 Accesses
24 Citations
Explore all metrics

Abstract

A mixture of multiple scaled generalized hyperbolic distributions (MMSGHDs) is introduced. Then, a coalesced generalized hyperbolic distribution (CGHD) is developed by joining a generalized hyperbolic distribution with a multiple scaled generalized hyperbolic distribution. After detailing the development of the MMSGHDs, which arises via implementation of a multi-dimensional weight function, the density of the mixture of CGHDs is developed. A parameter estimation scheme is developed using the ever-expanding class of MM algorithms and the Bayesian information criterion is used for model selection. The issue of cluster convexity is examined and a special case of the MMSGHDs is developed that is guaranteed to have convex clusters. These approaches are illustrated and compared using simulated and real data. The identifiability of the MMSGHDs and the mixture of CGHDs are discussed in an appendix.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions

Article 19 May 2018

Mixture of Two One-Parameter Lindley Distributions: Properties and Estimation

Article 17 November 2020

Introducing a Family of Distributions by Using the Class of Normal Mean–Variance Mixture

Article 05 March 2024

References

Aitken, A.C. (1926). A series formula for the roots of algebraic and transcendental equations. Proceedings of the Royal Society of Edinburgh, 45, 14–22.
Article MATH Google Scholar
Altman, E. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance, 23(4), 589–609.
Google Scholar
Andrews, J.L., & McNicholas, P.D. (2011a). Extending mixtures of multivariate t-factor analyzers. Statistics and Computing, 21(3), 361–373.
Article MathSciNet MATH Google Scholar
Andrews, J.L., & McNicholas, P.D. (2011b). Mixtures of modified t-factor analyzers for model-based clustering, classification, and discriminant analysis. Journal of Statistical Planning and Inference, 141(4), 1479–1486.
Article MathSciNet MATH Google Scholar
Andrews, J.L., & McNicholas, P. (2012). Model-based clustering, classification, and discriminant analysis via mixtures of multivariate t-distributions. Statistics and Computing, 22(5), 1021–1029.
Article MathSciNet MATH Google Scholar
Azzalini, A., Browne, R.P., Genton, M.G., McNicholas, P.D. (2016). On nomenclature for, and the relative merits of, two formulations of skew distributions. Statistics and Probability Letters, 110, 201–206.
Article MathSciNet MATH Google Scholar
Baek, J., & McLachlan, G.J. (2011). Mixtures of common t-factor analyzers for clustering high-dimensional microarray data. Bioinformatics, 27, 1269–1276.
Article Google Scholar
Banfield, J.D., & Raftery, A.E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49(3), 803–821.
Article MathSciNet MATH Google Scholar
Barndorff-Nielsen, O. (1978). Hyperbolic distributions and distributions on hyperbolae. Scandinavian Journal of Statistics, 5(3), 151–157.
MathSciNet MATH Google Scholar
Barndorff-Nielsen, O., Kent, J., Sørensen, M. (1982). Normal variance-mean mixtures and z distributions. International Statistical Review / Revue Internationale de Statistique, 50(2), 145–159.
MathSciNet MATH Google Scholar
Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics, 46, 373–388.
Article MATH Google Scholar
Browne, R.P., & McNicholas, P.D. (2014). Estimating common principal components in high dimensions. Advances in Data Analysis and Classification, 8(2), 217–226.
Article MathSciNet Google Scholar
Browne, R.P., & McNicholas, P.D. (2015). A mixture of generalized hyperbolic distributions. Canadian Journal of Statistics, 43(2), 176–198.
Article MathSciNet MATH Google Scholar
Celeux, G., & Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5), 781–793.
Article Google Scholar
Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Łukasik, S., żak, S. (2010). A complete gradient clustering algorithm for features analysis of x-ray images. In Piȩtka, E., & Kawa, J. (Eds.) Information Technologies in Biomedicine, (Vol. 2 pp. 15–24). Berlin: Springer.
Cook, R.D., & Weisberg, S. (1994). An Introduction to Regression Graphics. New York: Wiley.
Book MATH Google Scholar
Cormack, R.M. (1971). A review of classification (with discussion). Journal of the Royal Statistical Society: Series A, 34, 321–367.
Article Google Scholar
Debreu, G., & Koopmans, T.C. (1982). Additively decomposed quasiconvex functions. Mathematical Programming, 24(1), 1–38.
Article MathSciNet MATH Google Scholar
Demarta, S., & McNeil, A.J. (2005). The t copula and related copulas. International Statistical Review, 73(1), 111–129.
Article MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1), 1–38.
MathSciNet MATH Google Scholar
Flury, B., & Riedwyl, H. (1988). Multivariate Statistics: A Practical Approach. London: Chapman & Hall.
Book MATH Google Scholar
Forbes, F., & Wraith, D. (2014). A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweights: Application to robust clustering. Statistics and Computing, 24(6), 971–984.
Article MathSciNet MATH Google Scholar
Fraley, C., & Raftery, A.E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.
Article MathSciNet MATH Google Scholar
Franczak, B.C., Browne, R.P., McNicholas, P.D. (2014). Mixtures of shifted asymmetric Laplace distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(6), 1149–1157.
Article Google Scholar
Franczak, B.C., Tortora, C., Browne, R.P., McNicholas, P.D. (2015). Unsupervised learning via mixtures of skewed distributions with hypercube contours. Pattern Recognition Letters, 58(1), 69–76.
Article Google Scholar
Gallaugher, M.P.B., & McNicholas, P.D. (2018). Finite mixtures of skewed matrix variate distributions. Pattern Recognition, 80, 83–93.
Article Google Scholar
Gallaugher, M.P.B., & McNicholas, P.D. (2019a). On fractionally-supervised classification: weight selection and extension to the multivariate t-distribution. Journal of Classification 36. In press.
Gallaugher, M.P.B., & McNicholas, P.D. (2019b). Three skewed matrix variate distributions. Statistics and Probability Letters, 145, 103–109.
Article MathSciNet MATH Google Scholar
Ghahramani, Z., & Hinton, G.E. (1997). The EM algorithm for factor analyzers Technical Report CRG-TR-96-1. Toronto: University Of Toronto.
Google Scholar
Gneiting, T. (1997). Normal scale mixtures and dual probability densities. Journal of Statistical Computation and Simulation, 59(4), 375–384.
Article MATH Google Scholar
Hennig, C. (2015). What are the true clusters? Pattern Recognition Letters, 63, 53–62.
Article MATH Google Scholar
Holzmann, H., Munk, A., Gneiting, T. (2006). Identifiability of finite mixtures of elliptical distributions. Scandinavian Journal of Statistics, 33, 753–763.
Article MathSciNet MATH Google Scholar
Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.
Article MATH Google Scholar
Hunter, D.R., & Lange, K. (2000). Quantile regression via an MM algorithm. Journal of Computational and Graphical Statistics, 9(1), 60–77.
MathSciNet Google Scholar
Karlis, D., & Santourian, A. (2009). Model-based clustering with non-elliptically contoured distributions. Statistics and Computing, 19(1), 73–83.
Article MathSciNet Google Scholar
Kent, J.T. (1983). Identifiability of finite mixtures for directional data. The Annals of Statistics, 11, 984–988.
Article MathSciNet MATH Google Scholar
Kiers, H.A. (2002). Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems. Computational Statistics and Data Analysis, 41(1), 157–170.
Article MathSciNet MATH Google Scholar
Kotz, S., Kozubowski, T.J., Podgorski, K. (2001). The Laplace Distribution and Generalizations: A Revisit with Applications to Communications, Economics, Engineering, and Finance 1st edn: Burkhauser Boston.
Kotz, S., & Nadarajah, S. (2004). Multivariate t-distributions and their applications. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Lee, S.X., & McLachlan, G.J. (2013a). EMMIXuskew: fitting unrestricted multivariate skew t Mixture Models. R package version 0.11–5.
Lee, S.X., & McLachlan, G.J. (2013b). On mixtures of skew normal and skew t-distributions. Advances in Data Analysis and Classification, 7(3), 241–266.
Article MathSciNet MATH Google Scholar
Lee, S.X., & McLachlan, G.J. (2014). Finite mixtures of multivariate skew t-distributions: some recent and new results. Statistics and Computing, 24(2), 181–202.
Article MathSciNet MATH Google Scholar
Lin, T.I. (2009). Maximum likelihood estimation for multivariate skew normal mixture models. Journal of Multivariate Analysis, 100(2), 257–265.
Article MathSciNet MATH Google Scholar
Lin, T.I. (2010). Robust mixture modeling using multivariate skew t distributions. Statistics and Computing, 20(3), 343–356.
Article MathSciNet Google Scholar
Lin, T.-I., McNicholas, P.D., Hsiu, J.H. (2014). Capturing patterns via parsimonious t mixture models. Statistics and Probability Letters, 88, 80–87.
Article MathSciNet MATH Google Scholar
Lindsay, B. (1995). Mixture models: Theory, geometry and applications. In NSF-CBMS Regional Conference Series in Probability and Statistics, Vol. 5. California: Institute of Mathematical Statistics: Hayward.
McLachlan, G.J., & Peel, D. (2000). Mixtures of factor analyzers. In Proceedings of the Seventh International Conference on Machine Learning (pp. 599–606). San Francisco: Morgan Kaufmann.
McLachlan, G.J., Bean, R.W., Jones, L. B. -T. (2007). Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Computational Statistics and Data Analysis, 51(11), 5327–5338.
Article MathSciNet MATH Google Scholar
McLachlan, G.J., & Krishnan, T. (2008). The EM Algorithm and Extensions. New York: Wiley.
Book MATH Google Scholar
McNeil, A.J., Frey, R., Embrechts, P. (2005). Quantitative risk management: concepts, techniques and tools. Princeton: Princeton University Press.
MATH Google Scholar
McNicholas, P.D. (2016a). Mixture Model-Based Classification. Boca-Raton: Chapman & Hall/CRC press.
Book MATH Google Scholar
McNicholas, P.D. (2016b). Model-based clustering. Journal of Classification, 33 (3), 331–373.
Article MathSciNet MATH Google Scholar
McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D. (2010). Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Computational Statistics and Data Analysis, 54(3), 711–723.
Article MathSciNet MATH Google Scholar
McNicholas, S.M., McNicholas, P.D., Browne, R.P. (2017). A mixture of variance-gamma factor analyzers. In Ahmed, S. E. (Ed.) Big and Complex Data Analysis: Methodologies and Applications (pp. 369–385). Cham: Springer International Publishing.
Murray, P.M., Browne, R.B., McNicholas, P.D. (2014). Mixtures of skew-t factor analyzers. Computational Statistics and Data Analysis, 77, 326–335.
Article MathSciNet MATH Google Scholar
Murray, P.M., Browne, R.B., McNicholas, P.D. (2017). Hidden truncation hyperbolic distributions, finite mixtures thereof, and their application for clustering. Journal of Multivariate Analysis, 161, 141–156.
Article MathSciNet MATH Google Scholar
Niculescu, C., & Persson, L. (2006). Convex Functions and Their Applications. New York: Springer.
Book MATH Google Scholar
Ortega, J.M., & Rheinboldt, W.C. (1970). Iterative Solutions of Nonlinear Equations in Several Variables. New York: Academic Press.
MATH Google Scholar
Peel, D., & McLachlan, G.J. (2000). Robust mixture modelling using the t distribution. Statistics and Computing, 10(4), 339–348.
Article Google Scholar
Pesevski, A., Franczak, B.C., McNicholas, P.D. (2018). Subspace clustering with the multivariate-t distribution. Pattern Recognition Letters, 112(1), 297–302.
Article Google Scholar
R Core Team. (2017). R: A Language and Environment for Statistical Computing Vienna. Austria: R Foundation for Statistical Computing.
Google Scholar
Rand, W.M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336), 846–850.
Article Google Scholar
Rockafellar, R.T., & Wets, R.J.B. (2009). Variational Analysis. New York: Springer.
MATH Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6(2), 461–464.
Article MathSciNet MATH Google Scholar
Steane, M.A., McNicholas, P.D., Yada, R. (2012). Model-based classification via mixtures of multivariate t-factor analyzers. Communications in Statistics – Simulation and Computation, 41(4), 510–523.
Article MathSciNet MATH Google Scholar
Steinley, D. (2004). Properties of the Hubert-Arable adjusted Rand index. Psychological methods, 9(3), 386.
Article Google Scholar
Tang, Y., Browne, R.P., McNicholas, P.D. (2018). Flexible clustering of high-dimensional data via mixtures of joint generalized hyperbolic distributions. Stat, 7 (1), e177.
Article MathSciNet Google Scholar
Tipping, M.E., & Bishop, C.M. (1999). Mixtures of probabilistic principal component analysers. Neural Computation, 11(2), 443–482.
Article Google Scholar
Tortora, C., Franczak, B.C., Browne, R.P., McNicholas, P.D. (2014). Mixtures of multiple scaled generalized hyperbolic distributions. arXiv:1403.2332v1.
Tortora, C., Browne, R.P., Franczak, B.C., McNicholas, P.D. (2017). MixGHD: model based clustering, classification and discriminant analysis using the mixture of generalized hyperbolic distributions. R package version 2.1.
Vrbik, I., & McNicholas, P.D. (2012). Analytic calculations for the EM algorithm for multivariate skew-mixture models. Statistics and Probability Letters, 82(6), 1169–1174.
Article MathSciNet MATH Google Scholar
Vrbik, I., & McNicholas, P.D. (2014). Parsimonious skew mixture models for model-based clustering and classification. Computational Statistics and Data Analysis, 71, 196–210.
Article MathSciNet MATH Google Scholar
Vrbik, I., & McNicholas, P.D. (2015). Fractionally-supervised classification. Journal of Classification, 32(3), 359–381.
Article MathSciNet MATH Google Scholar
Wei, Y., Tang, Y., McNicholas, P.D. (2019). Mixtures of generalized hyperbolic distributions and mixtures of skew-t distributions for model-based clustering with incomplete data. Computational Statistics and Data Analysis, 130, 18–41.
Article MathSciNet MATH Google Scholar
Wraith, D., & Forbes, F. (2015). Clustering using skewed multivariate heavy tailed distributions with flexible tail behaviour. arXiv:http://arXiv.org/abs/1408.0711.
Yakowitz, S.J., & Spragins, J. (1968). On the identifiability of finite mixtures. Annals of Mathematical Statistics, 39, 209–214.
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are grateful to two anonymous reviewers and the Editor for helpful comments that have improved this manuscript. This work was supported by a grant-in-aid from Compusense Inc., a Collaborative Research and Development Grant from the Natural Sciences and Engineering Research Council of Canada, and the Canada Research Chairs program. The work on the introduction of the MMSGHD presented herein was first made publicly available as an arXiv preprint (Tortora et al. 2014).

Author information

Authors and Affiliations

Department of Mathematics & Statistics, San José State University, San José, CA, USA
Cristina Tortora
Department of Mathematics & Statistics, MacEwan University, Edmonton, AB, Canada
Brian C. Franczak
Department of Statistics & Actuarial Sciences, University of Waterloo, Waterloo, ON, Canada
Ryan P. Browne
Department of Mathematics & Statistics, McMaster University, Hamilton, ON, Canada
Paul D. McNicholas

Authors

Cristina Tortora
View author publications
You can also search for this author in PubMed Google Scholar
Brian C. Franczak
View author publications
You can also search for this author in PubMed Google Scholar
Ryan P. Browne
View author publications
You can also search for this author in PubMed Google Scholar
Paul D. McNicholas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul D. McNicholas.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Parameter Estimation

We use the EM algorithm to estimate the parameters of the MCGHDs. The EM algorithm belongs to a larger class of algorithms known as MM algorithms (Ortega and Rheinboldt 1970; Hunter and Lange 2000) and is well-suited for problems involving missing data. “MM” stands for “minorize-maximize” or “majorize-minimize,” depending on the purpose of the algorithm; in the EM context, the minorizing function is the expected value of the complete-data log-likelihood. The EM algorithm iterates between two steps, an E-step and a M-step, and has been used to estimate the parameters of mixture models in many experiments (McLachlan and Krishnan 2008). On each E-step, the expected value of the complete-data log-likelihood, $\mathcal {Q}$, is calculated and on each M-step is maximized with respect to π_g, μ_g, Φ_g, α_g, ω_g, λ_g, ω_0g, λ_0g, ϖ_g. However, in each M-step $\mathcal {Q}$ increases with respect to Γ_g rather than maximize; accordingly, the algorithm is formally a generalized EM (GEM) algorithm. For our MCGHDs, there are four sources of missing data: the latent variable W_0ig, the multi-dimensional weight variable Δ_wig, the group component indicator labels z_ig, and inner component labels u_ig, for i = 1, … , n and g = 1, … , G. For each observation i, z_ig = 1 if observation i is in component g and z_ig = 0 otherwise. Similarly, for each observation i, u_ig = 1 if observation i, in component g, is distributed generalized hyperbolic and u_ig = 0 if observation i, in component g, is distributed multiple scaled generalized hyperbolic. It follows that the complete-data log-likelihood for the MCGHDs is given by

$$ \begin{array}{@{}rcl@{}} l_{c}& =& \sum\limits_{i = 1}^{n} \sum\limits_{g = 1}^{G} \left\{\vphantom{\sum\limits_{j = 1}^{p}} z_{ig} \log \pi_{g} + z_{ig}{u_{ig}} \log \varpi_{g} + z_{ig}(1 - u_{ig}) \log (1-\varpi_{g})\right.\\ & &+ z_{ig}{u_{ig}} \log h \left( w_{0ig} | \omega_{0g}, 1, \lambda_{0g} \right)+ z_{ig}(1 - u_{ig}) \sum\limits_{j = 1}^{p}\log h \left( w_{jig} | \omega_{jg}, 1, \lambda_{jg} \right)\\&&+ z_{ig}{u_{ig}} \log \phi_{p} \left( \boldsymbol{\Gamma}_{g}^{\prime}\mathbf{x}_{i} | \boldsymbol{\mu}_{g} + w_{0ig} \boldsymbol{\alpha}_{g} , w_{0ig} \mathbf{\Phi} \right)\\ && \left.+ z_{ig}(1 - u_{ig}) \sum\limits_{j = 1}^{p} \log \phi_{1} \left( [\boldsymbol{\Gamma}_{g}^{\prime}\mathbf{x}_{i} ]_{j} | \mu_{jg} + w_{jig} \alpha_{jg} , \omega_{jg} \phi_{jg} \right)\right\}, \end{array} $$

where ϕ_p(⋅) represents a p-dimensional Gaussian density function, ϕ₁(⋅) is a unidimensional Gaussian density function, and h(⋅) is the density of a GIG distribution given in Eq. 15.

We are now prepared to outline the calculations for our GEM algorithm for the MCGHDs. On the E-step, the expected value of the complete-data log-likelihood,$ \mathcal {Q}$, is computed by replacing the sufficient statistics of the missing data by their expected values. For each component indicator label z_ig and inner component label u_ig, for i = 1, … , n and g = 1, … , G, we require the expectations

$$ \mathbb{E}\left[ Z_{ig}\mid \mathbf{x}_{i} \right] = \frac{\pi_{g}f_{\text{CGHD}}\left( \mathbf{x}\mid\boldsymbol{\mu}_{g},\boldsymbol{\Gamma}_{g},\boldsymbol{\Phi}_{g},\boldsymbol{\alpha}_{g},\boldsymbol{\omega}_{g},\boldsymbol{\lambda}_{g},\omega_{0g},\lambda_{0g},\varpi_{g}\right)}{{\sum}_{h = 1}^{G}\pi_{h}f_{\text{CGHD}}\left( \mathbf{x}\mid\boldsymbol{\mu}_{h},\boldsymbol{\Gamma}_{h},\boldsymbol{\Phi}_{h},\boldsymbol{\alpha}_{h},\boldsymbol{\omega}_{h},\boldsymbol{\lambda}_{h},\omega_{0h},\lambda_{0h},\varpi_{h}\right)} =:\hat{z}_{ig} $$

(21)

and

$$ \begin{array}{l} \mathbb{E}\left[ U_{ig}\mid \mathbf{x}_{i}, z_{ig}= 1 \right]=\\ \qquad \frac{\varpi_{g}f_{\text{GHD}}(\mathbf{x}\mid\boldsymbol{\mu}_{g},\boldsymbol{\Gamma}_{g}\boldsymbol{\Phi}_{g}\boldsymbol{\Gamma}_{g}^{\prime}, \boldsymbol{\alpha}_{g},\omega_{0g},\lambda_{0g})}{\varpi_{g}f_{\text{GHD}}(\mathbf{x}\mid\boldsymbol{\mu}_{g},\boldsymbol{\Gamma}_{g}\boldsymbol{\Phi}_{g}\boldsymbol{\Gamma}_{g}^{\prime}, \boldsymbol{\alpha}_{g},\omega_{0g},\lambda_{0g}) + (1 - \varpi_{g})f_{\text{MSGHD}}\left( \mathbf{x}\mid\boldsymbol{\mu}_{g},\boldsymbol{\Gamma}_{g},\boldsymbol{\Phi}_{g},\boldsymbol{\alpha}_{g},\boldsymbol{\omega}_{g},\boldsymbol{\lambda}_{g}\right)}\\\qquad=:\hat{u}_{ig}, \end{array} $$

(22)

where f_CGHD is given in Eq. 20, f_GHD is given in Eq. 10 and f_MSGHD is given in Eq. 17. For the latent variable W_0ig, we use the expected value given in Browne and McNicholas (2015). The authors show that, given the density in Eq. 15, the following is true

$$ \begin{array}{@{}rcl@{}} W_{0ig}\mid\mathbf{x}_{i}, z_{ig} = 1, u_{ig}\!&=&\!1 \backsim\text{GIG}\left( \omega_{0g}+\boldsymbol{\alpha}_{g}^{\prime}(\boldsymbol{\Gamma}_{g} \boldsymbol{\Phi}_{g}\boldsymbol{\Gamma}_{g}^{\prime})^{-1}\boldsymbol{\alpha}_{g},\omega_{0g}\right.\\&& \left.\qquad\qquad +\delta(\mathbf{x}_{i},\boldsymbol{\mu}_{g}\mid\boldsymbol{\Gamma}_{g} \boldsymbol{\Phi}_{g}\boldsymbol{\Gamma}_{g}^{\prime}), \lambda_{0g}-p/2\right). \end{array} $$

For the MCGHDs, the maximization of $\mathcal {Q}$ requires the expected values of W_0ig, $W^{-1}_{0ig}$, and log W_0ig, i.e.,

$$ \begin{array}{@{}rcl@{}} &&{\mathbb E}[W_{0ig} | \mathbf{x}_{i},z_{ig}= 1,u_{ig}= 1]= \sqrt{\frac{e_{ig}}{d_{g}}}\frac{K_{\lambda_{0g}-p/2 + 1}\left( \sqrt{d_{g}e_{ig}}\right)}{K_{\lambda_{0g}-p/2}\left( \sqrt{d_{g}e_{ig}}\right)} =: a_{ig},\\ &&{\mathbb E}[W_{0ig}^{-1} | \mathbf{x}_{i},z_{ig}= 1,u_{ig}= 1]= \sqrt{\frac{d_{g}}{e_{ig}}}\frac{K_{\lambda_{0g}-p/2 + 1}\left( \sqrt{d_{g}e_{ig}}\right)}{K_{\lambda_{0g}-p/2}\left( \sqrt{d_{g}e_{ig}}\right)}-\frac{2\lambda_{0g}-p}{ e_{ig}} =: b_{ig},\\ &&{\mathbb E}[\log W_{0ig} | \mathbf{x}_{i},z_{ig} = 1,u_{ig} = 1] = \log\sqrt{\frac{e_{ig}}{d_{g}}}+\left.\frac{\partial}{\partial v} \log \left\{K_{v}\left( \sqrt{d_{g}e_{ig}}\right)\right\}\right\vert_{v=\lambda_{0g}-p/2} =: c_{ig}, \end{array} $$

where $d_{g}= \omega _{0g}+\boldsymbol {\alpha }_{g}^{\prime } (\boldsymbol {\Gamma }_{g}\boldsymbol {\Phi }_{g}\boldsymbol {\Gamma }^{\prime }_{g})^{-1} \boldsymbol {\alpha }_{g}$ and $e_{ig}= \omega _{0g}+\delta (\mathbf {x}_{i}, \boldsymbol {\mu }_{g} \mid \boldsymbol {\Gamma }_{g}\boldsymbol {\Phi }_{g}\boldsymbol {\Gamma }^{\prime }_{g})$.

The maximization of $\mathcal {Q}$ also requires the expected values of the multidimensional weight variables Δ_wig, $\boldsymbol {\Delta }_{\mathbf {w} ig}^{-1}$, and log Δ_wig. Given the density in Eq. 17, it follows that

$$W_{ijg} | \mathbf{x}_{i}, z_{ig} = 1, u_{ig} = 0\!\backsim\!\text{GIG}\left( \omega_{jg} + \alpha_{jg}^{2}{\Phi}_{jg}^{-1},\omega_{jg} + ({ \left[\boldsymbol{\Gamma}^{\prime}\mathbf{x}\right]_{j}} - \boldsymbol{\mu}_{gj} )^{2}/\phi_{jg}, \lambda_{jg} - 1/2\right).$$

Each multidimensional weight variable is replaced by its expected value and so we need to compute E_1ig = diag{E_1i1g, … , E_1ipg}, E_2ig = diag{E_2i1g, … , E_2ipg}, and E_3ig = diag{E_3i1g, … , E_3ipg}, where

$$ \begin{array}{@{}rcl@{}} &&\mathbb{E}[W_{ijg}\mid\mathbf{x}_{i},z_{ig}= 1,u_{ig}= 0] = \sqrt{\frac{\bar{e}_{ijg}}{\bar{d}_{jg}}}\frac{K_{\lambda_{jg}+ 1/2}\left( \sqrt{\bar{d}_{jg} \bar{e}_{ijg}}\right)}{K_{\lambda_{jg}-1/2}\left( \sqrt{\bar{d}_{jg} \bar{e}_{ijg}}\right)}=: E_{1ijg},\\ &&\mathbb{E}[W_{ijg}^{-1}\mid\mathbf{x}_{i},z_{ig}= 1,u_{ig}= 0] = \sqrt{\frac{\bar{d}_{jg}}{\bar{e}_{ijg}}}\frac{K_{\lambda_{jg}+ 1/2}\left( \sqrt{\bar{d}_{jg} \bar{e}_{ijg}}\right)}{K_{\lambda_{jg}-1/2}\left( \sqrt{\bar{d}_{jg} \bar{e}_{ijg}}\right)} -\frac{2\lambda_{jg}-1}{\bar{e}_{ijg}}=: E_{2ijg},\\ &&\mathbb{E}[\log W_{ijg}\mid\mathbf{x}_{i},z_{ig}= 1,u_{ig}= 0]\\ &&= \log \sqrt{\frac{\bar{e}_{ijg}}{\bar{d}_{jg}}}+ \frac{\partial}{\partial v} \left. \log \left\{ K_{v}\left( \sqrt{\bar{d}_{jg} \bar{e}_{ijg}}\right)\right\} \right\vert_{v=\lambda_{jg}-1/2} =: E_{3ijg}, \end{array} $$

(23)

$\bar {d}_{jg} = \omega _{jg} + \alpha _{jg}^{2}{\Phi }_{jg}^{-1}$ and $\bar {e}_{ijg} =\omega _{jg} + ([\mathbf {x}_{i} -\boldsymbol {\mu }_{g}]_{j} )^{2}/\phi _{jg}$. Let $n_{g}={\sum }_{i = 1}^{n} \hat z_{ig}$, $A_{g}=(1/n_{g}){\sum }_{i = 1}^{n} \hat z_{ig}a_{ig}$, $B_{g}=(1/n_{g}){\sum }_{i = 1}^{n} \hat z_{ig}b_{ig}$, $C_{g}=(1/n_{g}){\sum }_{i = 1}^{n} \hat z_{ig}c_{ig}$, ${\bar {E}}_{1jg}=(1/n_{g}){\sum }_{i = 1}^{n} \hat z_{ig}{ E}_{1ijg}$, ${\bar {E}}_{2jg}=(1/n_{g}){\sum }_{i = 1}^{n} \hat z_{ig}{ E}_{2ijg}$, and ${\bar {E}}_{3jg}=(1/n_{g}){\sum }_{i = 1}^{n} \hat z_{ig} { E}_{3ijg}$.

In the M-step, we maximize the expected value of the complete-data log-likelihood with respect to the model parameters. The mixing proportions and inner mixing proportions are updated via $\hat {\pi }_{g}=n_{g}/n$ and $\hat {\varpi }_{g}={{\sum }_{i = 1}^{n} \hat {u}_{ig} \hat {z}_{ig}}/{n_{g}}$, respectively. The elements of the location parameter μ_g and skewness parameter α_g are replaced with

$$ \begin{array}{@{}rcl@{}} \hat{\mu}_{jg} = \frac{ {\sum}_{i = 1}^{n} \hat{z}_{ig}[\boldsymbol{\Gamma}_{g}^{\prime} \mathbf{x}_{i}]_{j}(\bar{s}_{1jg} s_{2ijg}-1)}{{\sum}_{i = 1}^{n} \hat{z}_{ig} (\bar{s}_{1jg}s_{2ijg}-1)} \quad\text{and}\quad \hat{\alpha}_{jg} = \frac{ {\sum}_{i = 1}^{n} \hat{z}_{ig}[\boldsymbol{\Gamma}_{g}^{\prime} \mathbf{x}_{i}]_{j}(\bar{s}_{2jg}- s_{2ijg})}{{\sum}_{i = 1}^{n} \hat{z}_{ig} (\bar{s}_{1jg}s_{2ijg}-1)}, \end{array} $$

respectively, where $[\boldsymbol {\Gamma }_{g}^{\prime } \mathbf {x}_{i}]_{j}$ is the j th element of the matrix $\boldsymbol {\Gamma }_{g}^{\prime } \mathbf {x}_{i}$, $s_{1ijg}= \hat {u}_{ig}a_{ig}+\left (1- \hat {u}_{ig} \right ) { E}_{1ijg}$, $s_{2ijg} =\hat {u}_{ig}b_{ig}+\left (1- \hat {u}_{ig} \right ) { E}_{2ijg}$, $\bar {s}_{1jg}= 1/n_{g}{\sum }_{i = 1}^{n} \hat {z}_{ig}s_{1ijg},\bar {s}_{2jg}= 1/n_{g}{\sum }_{i = 1}^{n} \hat {z}_{ig}s_{2ijg}$. The diagonal elements of the matrix Φ_g are updated using

$$ \begin{array}{@{}rcl@{}} \hat{\phi}_{jg} &=& \frac{1}{n_{g}} \sum\limits_{i = 1}^{n} \left\{ \hat{z}_{ig} \hat{u}_{ig} \left[ b_{ig} \left( [\boldsymbol{\Gamma}_{g}^{\prime} \mathbf{x}_{i}]_{j} - \hat{\mu}_{jg} \right)^{2} -2 \left( [\boldsymbol{\Gamma}_{g}^{\prime} \mathbf{x}_{i}]_{j} - \hat{\mu}_{jg} \right) \hat{\alpha}_{jg} + a_{ig} \hat{ \alpha}_{jg}^{2} \right] \right. \\ & & \left. + \hat{z}_{ig}(1- \hat{u}_{ig} ) \left[ E_{2ijg} \left( [\boldsymbol{\Gamma}_{g}^{\prime} \mathbf{x}_{i}]_{j} - \hat{\mu}_{jg} \right)^{2}-2 \left( [\boldsymbol{\Gamma}_{g}^{\prime} \mathbf{x}_{i}]_{j} - \hat{\mu}_{jg} \right) \hat{\alpha}_{jg} + E_{1ijg} \hat{\alpha}_{jg}^{2} \right] \right\}. \end{array} $$

To update the component eigenvector matrices Γ_g, we wish to minimize the objective function

$$ \begin{array}{@{}rcl@{}} f&(\boldsymbol{\Gamma}_{g}) = -\frac{1}{2} \text{tr} \left\{ \hat{z}_{ig} \hat{\boldsymbol{\Phi}}_{g}^{-1}\mathbf V_{ig} \boldsymbol{\Gamma}_{g} \mathbf{x}_{i} \mathbf{x}_{i} \boldsymbol{\Gamma}_{g}^{\prime} \right\} + \text{tr} \left\{ \hat{z}_{ig} \mathbf{x}_{i} \left( \mathbf V_{ig} \hat{\boldsymbol{\mu}}_{g} + \hat{\boldsymbol{\alpha}}_{g} \right)' \hat{\boldsymbol{\Phi}}_{g}^{-1} \boldsymbol{\Gamma}_{g} \right\} + C \end{array} $$

(24)

with respect to Γ_g, where $\mathbf V_{ig} = \hat {u}_{ig} b_{ig}\mathbf {I}_{p} + (1-\hat {u}_{ig})\textbf {E}_{2ig}$. We employ an optimization routine that uses two simpler majorization-minimization algorithms. Our optimization routine exploits the convexity of the objective function in Eq. 24, providing a computationally stable algorithm for estimating Γ_g. Specifically, we follow Kiers (2002) and Browne and McNicholas (2014) and use the surrogate function

$$ \begin{array}{@{}rcl@{}} f(\boldsymbol{\Gamma}_{g})\leq C+\sum\limits_{i = 1}^{n}{ \text{tr}{\left\{\mathbf{F}_{rg}\boldsymbol{\Gamma}_{g}\right\}}}, \end{array} $$

(25)

where C is a constant that does not depend on Γ_g, r ∈ {1, 2} is an index, and the matrices F_rg are defined in Eqs. 26 and 27.

Therefore, on each M-step, we calculate either

$$ \mathbf{F}_{1g} = \sum\limits_{i = 1}^{n} \hat{z}_{ig}\left[ - \mathbf{x}_{i} \left( \mathbf V_{ig} \hat{\boldsymbol{\mu}}_{g} + \hat{\boldsymbol{\alpha}}_{g} \right)' \hat{\boldsymbol{\Phi}}_{g}^{-1} + \mathbf{x}_{i} \mathbf{x}_{i}^{\prime} \boldsymbol{\Gamma}_{g}^{\prime} \hat{\boldsymbol{\Phi}}_{g}^{-1}\mathbf V_{ig} - \alpha_{1ig} \mathbf{x}_{i} \mathbf{x}_{i}^{\prime} \boldsymbol{\Gamma}_{g}^{\prime} \right] $$

(26)

or

$$ \mathbf{F}_{2g} = \sum\limits_{i = 1}^{n} \hat{z}_{ig}\left[ -\mathbf{x}_{i} \left( \mathbf V_{ig} \hat{\boldsymbol{\mu}}_{g} + \hat{\boldsymbol{\alpha}}_{g} \right)' \hat{\boldsymbol{\Phi}}_{g}^{-1} + \mathbf{x}_{i} \mathbf{x}_{i}^{\prime} \boldsymbol{\Gamma}_{g}^{\prime} \hat{\boldsymbol{\Phi}}_{g}^{-1}\mathbf V_{ig} - \alpha_{2ig} \mathbf V_{ig} \hat{\boldsymbol{\Phi}}_{g}^{-1} \boldsymbol{\Gamma}_{g}^{\prime} \right], $$

(27)

where α_1ig is the largest eigenvalue of the diagonal matrix $ \boldsymbol {\Phi }_{g}^{-1}\mathbf V_{ig}$, and α_2ig is equal to $\hat {z}_{ig}\mathbf {x}_{i}^{\prime } \mathbf {x}_{i}$, which is the largest eigenvalue of the rank-1 matrix $\hat {z}_{ig}\mathbf {x}_{i} \mathbf {x}_{i}^{\prime }$. Following this, we compute the singular value decomposition of F_rg given by

$$\mathbf{F}_{rg} = \mathbf{P}\mathbf{B}\mathbf{R}^{\prime}.$$

It follows that our update for Γ_g is given by

$$\hat{\boldsymbol{\Gamma}}_{g} = \mathbf{R}\mathbf{P}^{\prime}.$$

The p-dimensional concentration and index parameters, i.e., ω_g and λ_g, are estimated by maximizing the function

$$ q_{jg}(\omega_{jg}, \lambda_{jg})=-\log K_{\lambda_{jg}}(\omega_{jg})+(\lambda_{jg} -1){\bar{E}}_{3jg}- \frac{\omega_{jg}}{2}({\bar{E}}_{1jg}+{\bar{E}}_{2jg}). $$

(28)

This leads to

$$ \hat{\lambda}_{jg}= {\bar{E}}_{3jg}\lambda_{jg}^{\text{prev}}\left[\left.\frac{\partial} {\partial v}\log K_{v}(\omega_{jg}^{\text{prev}})\right\vert_{v=\lambda_{jg}^{\text{prev}}}\right]^{-1} $$

and

$$ \hat{\omega}_{jg}= \omega_{jg}^{\text{prev}}-\left[\left.\frac{\partial} {\partial v}q_{jg}(v, {\hat\lambda_{jg}})\right\vert_{v=\omega_{jg}^{\text{prev}}}\right]\left[\left.\frac{\partial^{2}} {\partial v^{2}}q_{jg}(v, {\hat\lambda_{jg}})\right\vert_{v=\omega_{jg}^{\text{prev}}}\right]^{-1}, $$

where the superscript “prev” denotes that the estimate from the previous iteration is used. The univariate parameters ω_0g and λ_0g are estimated by maximizing the function

$$ \begin{array}{@{}rcl@{}} q_{0g}(\omega_{0g}, \lambda_{0g})=-\log(K_{\lambda_{0g}}(\omega_{0g}))+(\lambda_{0g} -1)C_{g}- \frac{\omega_{0g}}{2}(A_{g}+B_{g}), \end{array} $$

(29)

giving

$$ \hat\lambda_{0g}= C_{g}\lambda_{0g}^{\text{prev}}\left[\left.\frac{\partial} {\partial v}\log K_{v}(\omega_{0g}^{\text{prev}})\right\vert_{v=\lambda_{0g}^{\text{prev}}}\right]^{-1}\qquad $$

and

$$ \hat \omega_{0g}= \omega_{0g}^{\text{prev}}-\left[\left.\frac{\partial}{\partial v}q_{0g}(v, {\hat\lambda_{0g}})\right\vert_{v=\omega_{0g}^{\text{prev}}}\right]\left[\left.\frac{\partial^{2}} {\partial v^{2}}q_{0g}(v, {\hat\lambda_{0g}})\right\vert_{v=\omega_{0g}^{\text{prev}}}\right]^{-1}. $$

Our GEM algorithm is iterated until convergence, which is determined using the Aitken acceleration (Aitken 1926). Formally, the Aitken acceleration is given by

$$ \begin{array}{@{}rcl@{}} a^{(k)}=\frac{l^{(k + 1)}-l^{(k)}}{l^{(k)}-l^{(k-1)}}, \end{array} $$

where l^(k) is the value of the log-likelihood at the iteration k and

$$ \begin{array}{@{}rcl@{}} l^{(k + 1)}_{\infty}=l^{(k)}+\frac{1}{1-a^{(k)}}\left( l^{(k + 1)}-l^{(k)}\right), \end{array} $$

is an asymptotic estimate of the log-likelihood on iteration k + 1. The algorithm can be considered to have converged when $l^{(k + 1)}_{\infty }-l^{(k)}< \epsilon $, provided this difference is positive (Böhning et al. 1994; Lindsay 1995; McNicholas et al. 2010). Herein, we set 𝜖 = 0.01. When the algorithm converges, we compute the maximum a posteriori (MAP) classification values using the posterior $\hat {z}_{ig}$, where $\text {MAP}\left \{\hat {z}_{ig}\right \}= 1$ if $g=\arg \max _{h}\left \{\hat {z}_{ih}\right \}$, and $\text {MAP}\left \{\hat {z}_{ig}\right \}= 0$ otherwise.

Appendix 2: Quasi-Concavity of the cMSGHD

In essence, we might want to consider only densities whose contours contain a set of points that are convex. Formally, such densities are quasi-concave. Extensive details on quasi-concavity, quasi-convexity, and related notions are given by Niculescu and Persson (2006) and Rockafellar and Wets (2009).

Definition 1

A function f(x) is quasi-concave if each upper-level set U_α(f) = {x|f(x) ≥ α} is convex, for $\alpha \in \mathbb {R}$.

Definition 2

A function f(x) is quasi-convex if each sub-level set S_α(f) = {x|f(x) ≤ α} is convex, for $\alpha \in \mathbb {R}$.

Lemma 1

The class of elliptical distributions, whose density functions have theform

$$f(\mathbf{x}) = \frac{1}{\sqrt{\lvert\boldsymbol{\Sigma}\rvert}}g\left( \delta\left( \mathbf{x}, \boldsymbol{\mu} | \boldsymbol{\Sigma}\right) \right)$$

are quasi-concave if the generator function, g, is monotonic non-increasing.

Proof

Result follows from the fact that if the function δ (x, μ|Σ) is convex since Σ is positive definite and the function g is monotonic non-increasing, then the function f(x) is quasi-concave. □

Theorem 1

The generalized hyperbolic distribution (GHD) is quasi-concave.

Proof

It is straightforward to show that the function

$$h(\mathbf{x}) = \sqrt{ a + b \delta\left( \mathbf{x}, \boldsymbol{\mu} | \boldsymbol{\Sigma}\right)}$$

is convex, where a and b are positive constants, and δ (x, μ|Σ) is the Malahanobis distance between x and μ. Let τ = λ − p/2. Then, the function

$$k(z) = \tau \log z + \log K_{\tau}(z),$$

where $z\in \mathbb {R}^{+}$, and K_τ is the modified Bessel function of the third kind with index τ, is monotonic decreasing (or non-increasing) because the first derivative

$$k^{\prime}(z) = \frac{\tau}{z} + \frac{ (\tau/z) K_{\tau}(z) - K_{\tau+ 1}(z)} {K_{\tau}(z)} = \frac{2\tau}{z} - \frac{ K_{\tau+ 1}(z)} {K_{\tau}(z)} = -\frac{K_{\tau-1}(z)}{K_{\tau}(z)}$$

is negative for all $\tau \in \mathbb {R}$ and z > 0. In addition to being monotonic decreasing, k(z) is convex for τ < 1/2, concave and convex (linear) for τ = 1/2, and concave for τ > 1/2. Because k(z) is a monotonic function, it satisfies the criteria for quasi-convexity and quasi-concavity, so it is simultaneously quasi-convex and quasi-concave. In this context, monotone functions are also known as quasi-linear or quasi-montone.

Recall that if the function U is quasi-convex and the function g is decreasing, then the function f(x) = g(U(x)) is quasi-concave. It follows that the composition k(h(x)) is quasi-concave. Consider the skewness part of the GHD density function, i.e., a(x) = −(x −μ)′Σ^− 1α, which is a linear function. It follows that the function

$$ \exp\left\{k(h(\mathbf{x}))+a(\mathbf{x})\right\} $$

(30)

is also quasi-concave, and the result follows from the fact that Eq. 30 is proportional to the density of the GHD. □

Theorem 2

The convex multiple scaled generalized hyperbolic distribution(cMSGHD) is quasi-concave. In other words, the multiple scaledgeneralized hyperbolic distribution (MSGHD) is quasi-concave provided thatλ_j > 1 for allj = 1, … , p.

Proof

A p-dimensional multiple scaled distribution is a product of p independent univariate densities. The density of the MSGHD has form

$$g_{p}(x_{1},x_{2},\ldots,x_{p}) = g_{1}(x_{1} | \boldsymbol{\theta}_{1}) g_{1}(x_{2} | \boldsymbol{\theta}_{2})\times\cdots\times g_{1}(x_{p} | \boldsymbol{\theta}_{p}),$$

where g₁(x_j|θ_j) is the density of the univariate hyperbolic distribution with parameters θ_j, j = 1, … , p. From Theorem 1, log g₁(x_j|θ_j) is a concave function for τ_j > 1/2, i.e., for λ_j > 1 (because p = 1). Therefore, the function

$$\log g_{p}(x_{1},x_{2},\ldots,x_{p}) = \log g_{1}(x_{1} | \boldsymbol{\theta}_{1}) + \log g_{1}(x_{2} | \boldsymbol{\theta}_{2})+\cdots+ \log g_{1}(x_{p} | \boldsymbol{\theta}_{p})$$

is concave provided that λ_j > 1 for all j = 1, … , p. Therefore, the function

$$g_{p}(x_{1},x_{2},\ldots,x_{p}) = g_{1}(x_{1} | \boldsymbol{\theta}_{1}) g_{1}(x_{2} | \boldsymbol{\theta}_{2})\times\cdots\times g_{1}(x_{p} | \boldsymbol{\theta}_{p})$$

is quasi-concave provided that λ_j > 1 for all j = 1, … , p. □

Note that addition does not preserve quasi-convexity or quasi-concavity. The sum of two quasi-convex functions defined on different domains will be quasi-concave if they are additively decomposed (see Debreu and Koopmans 1982). Debreu and Koopmans (1982) give necessary and sufficient conditions for the sum f of a set of functions f₁, … , f_m to be additively decomposed. These conditions depend on the convexity index c(f) in which f is quasi convex if and only if either of the following hold: (i) c(f_i) ≥ 0 for every i, or (ii) c(f_j) < 0 for some j, c(f_i) > 0 for every i ≠ j, and ${\sum }_{i = 1}^{m} \frac {1}{c(f_{i})} \le 0$. For differentiable functions, the convexity index satisfies the inequality f^″(x)/[f′(x)]² ≥ c(f).

We have that a sufficient condition for the MSGHD to be quasi-concave is that all λ_j > 1. Furthermore, a sufficient condition for the MSGHD not to be quasi-concave is that all λ_j < 1 and finite. Interestingly, this means the multiple scaled t-distribution cannot provide convex level sets for any finite degrees of freedom. For large degrees of freedom, the multiple scaled t-distribution will behave similarly to a normal distribution near the mode; however, as one moves away from the mode, non-convex contours will be encountered. Finally, note that Debreu and Koopmans (1982) give necessary and sufficient conditions that suggest a quasi-concave MSGHD with some λ_j positive and others negative is possible, but going this route would greatly complicate the estimation procedure.

Appendix 3: Finite Mixture Identifiability

In this section, we consider the notion of identifiability for finite mixtures of MSGHDs and coalesced generalized hyperbolic distributions (CGHDs). Herein, we take the term identifiability to mean finite mixture identifiability.

3.1 3.1 Background

Holzmann et al. (2006) prove identifiability of finite mixtures of elliptical distributions. They state that “finite mixtures are said to be identifiable if distinct mixing distributions with finite support correspond to distinct mixtures.” A finite mixture of the densities f_p(x|Ψ₁), … , f_p(x|Ψ_G) is identifiable if the family $\left \{ f_{p}(\mathbf {x}|\boldsymbol {\Psi }) : \boldsymbol {\Psi } \in \mathcal {A}^{p} \right \}$ is linearly independent. The founding work on finite mixture identifiability is by Yakowitz and Spragins (1968), who state that this linear independence is a necessary and sufficient condition for identifiability.

The GHD can be expressed as a normal variance-mean mixture. The stochastic relationship of the normal variance-mean mixture is given by

$$ \mathbf{X} =\boldsymbol{\mu} + W\boldsymbol{\alpha}+ \sqrt{W} \mathbf{U}, $$

(31)

where $\mathbf {U} \backsim \mathcal {N}_{p}(\mathbf {0}, \boldsymbol {\Sigma })$ and W, independent of U, is a positive univariate random variable with density h(w|θ). Browne and McNicholas (2015) proved identifiability for finite mixtures of GHDs through additivity of disjoint sets of identifiable distributions.

Definition 3

In the present context, a finite mixture of the multiple scale distributions f(x|θ₁), … , f(x|θ_G) is identifiable if

$$ \sum\limits_{g = 1}^{G} \pi_{g} f\left( {\mathbf{x}}|\boldsymbol{\theta}_{g} \right) = \sum\limits_{g = 1}^{G} \pi_{g}^{\star} f\left( {\mathbf{x}}|\boldsymbol{\theta}_{g}^{\star} \right) $$

(32)

for $\mathbf {x} \in \mathbb {R}^{p}$, where G is a positive integer, ${\sum }_{g = 1}^{G} \pi _{g} = {\sum }_{g = 1}^{G} \pi _{g}^{\star } = 1$ and $\pi _{g}, \pi _{g}^{\star } > 0$ for g = 1, … , G, implies that there exists a permutation σ such that (π_g, θ_g) = (π_σ(g), θ_σ(g)) for all g.

Browne and McNicholas (2015) prove identifiability for normal variance-mean mixtures, which includes the generalized hyperbolic. Here, we view the results from a different vantage point to illustrate the concepts required for the identifiability of the multiple scaled distributions. We begin by noting the characteristic function for the generalized hyperbolic arises from the characteristic function of the normal variance-mean mixture,

$$ \varphi_{\mathbf{X}}(\mathbf{v}) = \exp \left\{ i \mathbf{v}^{\prime}\boldsymbol{\mu}_{g} \right\} M_{W} \left( \boldsymbol{\beta}_{g}^{\prime} \mathbf{v} i -\frac{1}{2} \mathbf{v}^{\prime} \boldsymbol{\Sigma}_{g} \mathbf{v} \left| \boldsymbol{\Gamma}_{g} \right.\right), $$

(33)

where

$$ M_{W} \left( u \right) = \left[ \frac{\omega}{\omega -2u} \right]^{\frac{\lambda}{2}} \frac{ K_{\lambda} \left( \sqrt{ \omega(\omega-2u)} \right)} { K_{\lambda} \left( \omega \right)} = \left[ 1 -2 \frac{u}{ \omega} \right]^{- \frac{\lambda}{2}} \frac{ K_{\lambda} \left( \sqrt{ \omega(\omega-2u)} \right)} { K_{\lambda} \left( \omega \right)} . $$

The characteristic function for the generalized hyperbolic is

$$ \varphi_{\mathbf{X}}(\mathbf{v} ) = \exp\{ i \mathbf{v}^{\prime}\boldsymbol{\mu}\} \left[ 1 + \frac{ \mathbf{v}^{\prime} \boldsymbol{\Sigma} \mathbf{v} -2 i \boldsymbol{\beta}^{\prime} \mathbf{v} } {\omega} \right]^{-\frac{\lambda}{2}} \frac{ K_{\lambda} \left( \sqrt{ \omega \left[\omega + (\mathbf{v}^{\prime} \boldsymbol{\Sigma} \mathbf{v} - 2 i \boldsymbol{\beta}^{\prime} \mathbf{v} ) \right]} \right)} { K_{\lambda} \left( \omega \right)} . $$

In the context of a coalesced distribution, with a eigen-decomposed scale matrix, the characteristic function is

$$ \varphi_{\mathbf{X}}(\mathbf{v} ) = \exp\{ i \mathbf{v}^{\prime}\boldsymbol{\mu}\} \left[ 1 + \frac{ \mathbf{v}^{\prime} \boldsymbol{\Gamma} \boldsymbol{\Phi} \boldsymbol{\Gamma}^{\prime} \mathbf{v} -2 i \boldsymbol{\beta}^{\prime} \mathbf{v} } {\omega} \right]^{ -\frac{\lambda}{2}} \frac{ K_{\lambda} \left( \sqrt{ \omega \left[\omega + (\mathbf{v}^{\prime} \boldsymbol{\Gamma} \boldsymbol{\Phi} \boldsymbol{\Gamma}^{\prime} \mathbf{v} - 2 i \boldsymbol{\beta}^{\prime} \mathbf{v} ) \right]} \right)} { K_{\lambda} \left( \omega \right)} . $$

Now, we let v = tz and obtain

$$ \begin{array}{@{}rcl@{}} \varphi_{\mathbf{X}}(\mathbf{v} = t \mathbf{z}) &=& \exp\{ i t \mathbf{z}^{\prime}\boldsymbol{\mu}\} \left[ 1 + \frac{ t^{2} (\mathbf{z}^{\prime} \boldsymbol{\Gamma} \boldsymbol{\Phi} \boldsymbol{\Gamma}^{\prime} \mathbf{z}) -2 i t (\boldsymbol{\beta}^{\prime} \mathbf{z}) } {\omega} \right]^{-\frac{\lambda}{2}}\\&&\times \frac{ K_{\lambda} \left( \sqrt{ \omega \left[\omega + t^{2} (\mathbf{z}^{\prime} \boldsymbol{\Gamma} \boldsymbol{\Phi} \boldsymbol{\Gamma}^{\prime} \mathbf{z}) - 2 i t (\boldsymbol{\beta}^{\prime} \mathbf{z} ) \right]} \right)} { K_{\lambda} \left( \omega \right)} . \end{array} $$

To prove identifiability of the generalized hyperbolic, we could now use the results from Browne and McNicholas (2015) and Yakowitz and Spragins (1968, p. 211) that implies there exists z such that the tuple $(\mathbf {z}^{\prime } \boldsymbol {\Sigma }_{g} \mathbf {z}, \boldsymbol {\beta }_{g}^{\prime } \mathbf {z}, \mathbf {z}^{\prime }\boldsymbol {\mu }_{g})$, where $\boldsymbol {\Sigma }_{g}= \boldsymbol {\Gamma }_{g} \boldsymbol {\Phi }_{g} \boldsymbol {\Gamma }_{g}^{\prime }$ is unique for all g = 1, … , G, allows a reduction to the univariate case. Now, we rewrite the term z′Σ_gz as

$$ \mathbf{z}^{\prime} \boldsymbol{\Sigma}_{g} \mathbf{z} = \mathbf{z}^{\prime} \boldsymbol{\Gamma}_{g} \boldsymbol{\Phi}_{g} \boldsymbol{\Gamma}_{g}^{\prime} \mathbf{z} = \text{tr}\left[ \mathbf{z}^{\prime} \boldsymbol{\Gamma}_{g} \boldsymbol{\Phi}_{g} \boldsymbol{\Gamma}_{g}^{\prime} \mathbf{z} \right] = \text{tr}\left[ \boldsymbol{\Gamma}_{g}^{\prime} \mathbf{z} \mathbf{z}^{\prime} \boldsymbol{\Gamma}_{g} \boldsymbol{\Phi}_{g} \right] = \sum\limits_{j = 1}^{p} {\Phi}_{jg} [\boldsymbol{\Gamma}_{g}^{\prime} \mathbf{z} ]_{j}^{2}, $$

which implies the tuple

$$ \left( \mathbf{z}^{\prime} \boldsymbol{\Gamma}_{g} \boldsymbol{\Phi}_{g} \boldsymbol{\Gamma}_{g}^{\prime} \mathbf{z}, \boldsymbol{\beta}_{g}^{\prime} \mathbf{z}, \mathbf{z}^{\prime}\boldsymbol{\mu}_{g}\right) \equiv \left( \sum\limits_{j = 1}^{p} {\Phi}_{jg} [\boldsymbol{\Gamma}_{g}^{\prime} \mathbf{z} ]_{j}^{2}, \boldsymbol{\beta}_{g}^{\prime} \mathbf{z}, \mathbf{z}^{\prime}\boldsymbol{\mu}_{g}\right) $$

is unique for all g = 1, … , G. A similar argument indicates there exists a z such that the tuple

$$ \left( \sum\limits_{j = 1}^{p} {\Phi}_{jg} | {[\boldsymbol{\Gamma}_{g}^{\prime} \mathbf{z} ]_{j}} | , \boldsymbol{\beta}_{g}^{\prime} \mathbf{z}, \mathbf{z}^{\prime}\boldsymbol{\mu}_{g}\right) $$

(34)

is unique. In fact, a more general statement indicates that there exists a z such that the tuple

$$ \left( \sum\limits_{j = 1}^{p} {\Phi}_{jg} \varphi({[\boldsymbol{\Gamma}_{g}^{\prime} \mathbf{z} ]_{j}^{2}} ) , \boldsymbol{\beta}_{g}^{\prime} \mathbf{z}, \mathbf{z}^{\prime}\boldsymbol{\mu}_{g}\right) $$

is unique for monotonic $\varphi : \mathbb {R}^{+} \mapsto \mathbb {R}^{+} $. Deriving this unique set of tuples facilitates the reduction to the univariate case. This is useful because the univariate generalized hyperbolic density is identifiable (see Browne and McNicholas2015).

3.2 3.2 Identifiability of a Finite Mixture of Multiple Scaled Distributions

For a multiple scaled distribution, we only need to find a single direction where the distribution is finite mixture identifiable because, as noted in Remark 2 of Kent (1983), a distribution might be non-identifiable on a subset of $\mathbb {R}^{p}$ but identifiability can endure over $\mathbb {R}^{p}$. In other words, for a distribution to be non-identifiable, a linear combination has to be equal to zero for all $x \in \mathbb {R}^{p}$. This is illustrated by the example given in Kent (1983):

“the polynomials P(x₁, x₂) = 1 and $P(x_{1}, x_{2}) = ({x_{1}^{2}}+ x_{2})^{3}$, $x\in \mathbb {R}^{2}$, are equal on the unit circle, but are not the same on all of $\mathbb {R}^{2}$.”

As a consequence, if a multivariate distribution is identifiable in some direction then it is identifiable over $\mathbb {R}^{p}$.

To begin, consider that if there is at one least direction or column of Γ_g that is equal across g = 1, … , G, then the identifiability of a multiple scaled distribution follows from the identifiability of the univariate distribution. Whereas if one column of Γ_g is unequal, that implies, by the nature of orthonormal matrices, that two columns of Γ_g are unequal. We will now illustrate how the bivariate multiple scaled distribution is identifiable, which implies identifiability for finite p.

When Γ_g differ, the identifiability of the multiple scaled distribution depends on the behavior of the multiple scaled distribution’s density and moment generating functions when we consider moving along directions other than the columns of Γ_g. For example, a bivariate multiple scaled t-distribution behaves (by definition) like a t-distribution with ν₁ and ν₂ degrees of freedom along each of it’s principal axes, but along any other direction, a bivariate multiple scaled t-distribution behaves asymptotically like a t-distribution with ν₁ + ν₂ degrees of freedom.

Consider the following three orthonormal matrices in the context of an eigen-decomposition of a matrix;

$$ \boldsymbol{\Gamma}_{1} = \left[\begin{array}{cc} 1 & 0 \\0 & 1 \end{array}\right], \quad\quad \ \quad\quad \boldsymbol{\Gamma}_{2} = \left[\begin{array}{cc} 0 & 1 \\1 & 0 \end{array}\right] \quad\quad \text{and} \quad\quad \boldsymbol{\Gamma}_{3} = \left[\begin{array}{cc} -1 & 0 \\ 0 & 1 \end{array}\right] . $$

If we have equal eigenvalues then we cannot distinguish between Γ₁ and Γ₂. In the same way, if we have the same distribution along the first and second axis, we cannot distinguish between them. However, if we have eigenvalue ordering we can distinguish between Γ₁ and Γ₂, but eigenvalue ordering will not allow us to distinguish between Γ₁ and Γ₃, since they yield the same basis or set of directions. Therefore, in general, Γ is unique up to multiplication by

$$ \left[\begin{array}{cc} \pm 1 & 0 \\0 & \pm 1 \end{array}\right] . $$

One way to establish uniqueness is to require the largest value of each column of Γ to be positive. An equivalent requirement is for Γ₁ ≠ Γ₂ which requires that

$$ \boldsymbol{\Gamma}_{1}^{\prime} \boldsymbol{\Gamma}_{2} \neq \mathbf{R} \quad\quad \text{or} \quad\quad [ \boldsymbol{\Gamma}_{1}^{\prime}\mathbf{z} ]_{j} \neq - [ \boldsymbol{\Gamma}_{2}^{\prime} \mathbf{z} ]_{j} $$

(35)

for j = 1, … , p, $\mathbf {z} \in \mathbb {R}^{P}$, z ≠ 0_p and R is a set of diagonal matrices such that diag(R) = (± 1, … , ± 1) excluding the identity matrix. Note that [a]_j denotes the j th element of the vector a. However, if we had two orthonormal matrices such that $\boldsymbol {\Gamma }_{1}^{\prime } \boldsymbol {\Gamma }_{2} = \mathbf {I}$, then Γ₁ = Γ₂. If $\boldsymbol {\Gamma }_{1}^{\prime } \boldsymbol {\Gamma }_{2} = \mathbf {R}$, then our orthonormal condition amounts to $ \boldsymbol {\Gamma }_{1}^{\prime } = \boldsymbol {\Gamma }_{2}$ or equivalently, for all directions $\mathbf {z} \in \mathbb {R}^{P}$ and z ≠ 0_p

$$ | [ \boldsymbol{\Gamma}_{1}^{\prime}\mathbf{z} ]_{j} | = | [ \boldsymbol{\Gamma}_{2}^{\prime} \mathbf{z} ]_{j} | \quad \text{for all} \quad j = 1,\ldots,p \quad \text{then} \quad \boldsymbol{\Gamma}_{1} = \boldsymbol{\Gamma}_{2}. $$

(36)

This prevents the j th column of Γ₂ from being in the opposite direction of the j th column of Γ₁. This form of the condition is easier to incorporate into the identifiability illustration.

In the MSGHD, if we consider moving the amount t in a direction z, which entails setting x = tz, we can write the density as

$$ \begin{array}{@{}rcl@{}} && f_{\text{MSGHD}}\left( \mathbf{x} = t \mathbf{z} \mid\boldsymbol{\mu},\boldsymbol{\Gamma},\boldsymbol{\Phi},\boldsymbol{\alpha},\boldsymbol{\omega},\boldsymbol{\lambda}\right) \\ &&=\prod\limits_{j = 1}^{p}\left[\frac{\omega_{j}+ {\Phi}_{j}^{-1}\left( t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j}-\mu_{j}\right)^{2}}{\omega_{j}+ {\alpha_{j}^{2}} {{\Phi}_{j}}^{-1}} \right]^{\frac{\lambda_{j}-\frac{1}{2}}{2}}\\&&\times \frac{K_{\lambda_{j}-\frac{1}{2}}\left( \sqrt {[\omega_{j}+{\alpha_{j}^{2}} {{\Phi}_{j}}^{-1}]\left[\omega_{j}+ {\Phi}_{j}^{-1}\left( t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j}-\mu_{j}\right)^{2}\right]}\right)} {(2\pi)^{\frac{1}{2}}{{\Phi}_{j}}^{\frac{1}{2}}K_{\lambda_{j}}(\omega_{j})\exp{\{-(t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j}-\mu_{j}){\alpha_{j}} {\Phi}_{j}^{-1}\}}}, \end{array} $$

(37)

Note, if z is equal to the k th eigenvector, which is the k th column of Γ, then the density reduces to

$$ s_{k} \left[\frac{\omega_{k}+ {\Phi}_{k}^{-1}\left( t -\mu_{k}\right)^{2}}{\omega_{k}+ {\alpha_{k}^{2}} {{\Phi}_{k}}^{-1}} \right]^{\frac{\lambda_{k}-\frac{1}{2}}{2}} \frac{K_{\lambda_{k}-\frac{1}{2}}\left( \sqrt {[\omega_{k}+{\alpha_{k}^{2}} {{\Phi}_{k}}^{-1}]\left[\omega_{k}+ {\Phi}_{k}^{-1}\left( t -\mu_{k}\right)^{2}\right]}\right)} {(2\pi)^{\frac{1}{2}}{{\Phi}_{k}}^{\frac{1}{2}}K_{\lambda_{k}}(\omega_{k})\exp{\left\{-\left( t-\mu_{k}\right){\alpha_{k}} {\Phi}_{k}^{-1} \right\}}}, $$

where

$$ s_{k} = \prod\limits_{j = 1, j\neq k}^{p}\left[\frac{\omega_{j}+ {\Phi}_{j}^{-1} {\mu_{j}^{2}}}{\omega_{j}+ {\alpha_{j}^{2}} {{\Phi}_{j}}^{-1}} \right]^{\frac{\lambda_{j}-\frac{1}{2}}{2}} \frac{K_{\lambda_{j}-\frac{1}{2}}\left( \sqrt {[\omega_{j}+{\alpha_{j}^{2}} {{\Phi}_{j}}^{-1}]\left[\omega_{j}+ {\Phi}_{j}^{-1}{\mu_{j}^{2}}\right]}\right)} {(2\pi)^{\frac{1}{2}}{{\Phi}_{j}}^{\frac{1}{2}}K_{\lambda_{j}}(\omega_{j})\exp{\left\{ \mu_{j} \alpha_{j}{\Phi}_{j}^{-1} \right\}}}. $$

Therefore, the density is simply proportional to

$$ \left[\frac{\omega_{k}+ {\Phi}_{k}^{-1}\left( t -\mu_{k}\right)^{2}}{\omega_{k}+ {\alpha_{k}^{2}} {{\Phi}_{k}}^{-1}} \right]^{\frac{\lambda_{k}-\frac{1}{2}}{2}} \frac{K_{\lambda_{k}-\frac{1}{2}}\left( \sqrt {[\omega_{k}+{\alpha_{k}^{2}} {{\Phi}_{k}}^{-1}]\left[\omega_{k}+ {\Phi}_{k}^{-1}\left( t -\mu_{k}\right)^{2}\right]}\right)} {(2\pi)^{\frac{1}{2}}{{\Phi}_{k}}^{\frac{1}{2}}K_{\lambda_{k}}(\omega_{k})\exp{\left\{-\left( t-\mu_{k}\right){\alpha_{k}} {\Phi}_{k}^{-1} \right\}}} . $$

First, note that if the parameterizations are one-to-one, then if one parameterization is shown to be identifiable, the others are identifiable as well. Similar to Browne and McNicholas (2015), we let δ_j = β_j/Φ_j, $\alpha _{j} = \sqrt { \omega _{j}/{\Phi }_{j} + {\beta _{j}^{2}}/{{\Phi }_{j}^{2}}} $ and $\kappa _{j} = \sqrt {{\Phi }_{j} \omega _{j}} $, where α_j ≥|δ_j|. Under this reparameterization, we now have

$$ {\Phi}_{j} = \frac{\kappa_{j}}{\sqrt{{\alpha_{j}^{2}}-{\delta_{j}^{2}}}}, \quad \omega_{j} = \kappa_{j}\sqrt{{\alpha_{j}^{2}}-{\delta_{j}^{2}}} \quad \text{ and} \quad\beta_{j} = \frac{\delta_{j} \kappa_{j}}{\sqrt{{\alpha_{j}^{2}}-{\delta_{j}^{2}}}}. $$

(38)

For large z, the Bessel function can approximated by

$$ K_{\lambda} (z) = \sqrt{ \frac{ \pi} {2 z}} e^{-z} \left[ 1+ O\left( \frac{1}{z}\right)\right], $$

which yields, using the alternative parameterization,

$$ \begin{array}{@{}rcl@{}} f(t \mid\boldsymbol{\theta} ) &\propto \left[ 1 + \frac{(t-\mu_{j})^{2}}{{\kappa_{j}^{2}}} \right]^{\lambda_{j}/2}\exp\left\{ - \alpha_{j} |t-\mu_{j}|+\delta_{j} \left( t-\mu_{j}\right) \right\} . \end{array} $$

(39)

If z is not equal to the k th eigenvector, than, using the reparameterization given in Eq. 38, we have

$$ \begin{array}{@{}rcl@{}} f(t \mid\boldsymbol{\theta} ) \!&\propto&\! \exp\left\{ - \sum\limits_{j = 1}^{p} \alpha_{j} \left| t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j} - \mu_{j} \right| + \sum\limits_{j = 1}^{p} \delta_{j} \left( t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j} - \mu_{j} \right) \right\} \prod\limits_{j = 1}^{p}\\&&\!\times\left[ 1 + \frac{\left( t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j}-\mu_{j}\right)^{2}}{{\kappa_{j}^{2}}} \right]^{\frac{\lambda_{j}-\frac{1}{2}}{2}} \\ \!&\propto&\! \exp\left\{ - \sum\limits_{j = 1}^{p} \alpha_{j} \left| t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j} - \mu_{j} \right| + \sum\limits_{j = 1}^{p} \delta_{j} \left( t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j} - \mu_{j} \right) \right\} t^{2 \sum\limits_{j = 1}^{p} I\left( \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j} \neq 0 \right) \frac{\lambda_{j}-\frac{1}{2}}{2}} \\ \!&\propto&\! \exp\left\{ \sum\limits_{j = 1}^{p} \left[ \!- \alpha_{j} \left| t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j} - \mu_{j} \right| + \delta_{j} \left( t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j} - \mu_{j} \right) + 2 I\left( \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j} \!\neq\! 0 \right) \frac{\lambda_{j} - \frac{1}{2}}{2} \log (t) \right] \right\} \\ \!&\propto&\! \prod\limits_{j = 1}^{p} \exp\left\{\! - \alpha_{j} \left| t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j} - \mu_{j} \right| + \delta_{j} \left( t \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j} - \mu_{j} \right) + 2 I\left( \left[\boldsymbol{\Gamma}^{\prime} \mathbf{z}\right]_{j} \!\neq\! 0 \right) \frac{\lambda_{j} - \frac{1}{2}}{2} \log (t) \right\}. \end{array} $$

(40)

The characteristic function for a multiple scaled distribution can be written as

$$ \begin{array}{@{}rcl@{}} \varphi_{\mathbf{X}}(\mathbf{v} ) &=& \prod\limits_{j = 1}^{P} \exp\{ i | {[\boldsymbol{\Gamma}^{\prime} \mathbf{v} ]_{j}}| \mu_{j} \} \left[ 1 + \frac{ {\Phi}_{j} | {[\boldsymbol{\Gamma}^{\prime} \mathbf{v} ]_{j}}|^{2} -2 \beta_{j} | {[\boldsymbol{\Gamma}^{\prime} \mathbf{v} ]_{j}}| i} {\omega_{j}} \right]^{ -\frac{\lambda_{j}} {2}}\\&&\times \frac{ K_{\lambda_{j}} \left( \sqrt{ \omega_{j} \left[\omega_{j} + ({\Phi}_{j} | {[\boldsymbol{\Gamma}^{\prime} \mathbf{v} ]_{j}}|^{2} - 2\beta_{j} | {[\boldsymbol{\Gamma}^{\prime} \mathbf{v} ]_{j}}| i ) \right]} \right)} { K_{\lambda_{j}} \left( \omega_{j} \right)} , \end{array} $$

which, under the alternative parameterization from Eq. 38, becomes

$$ \begin{array}{@{}rcl@{}} \varphi_{\mathbf{X}}(\mathbf{v} ) &=& \prod\limits_{j = 1}^{P} \exp\{ i | {[\boldsymbol{\Gamma}^{\prime} \mathbf{v} ]_{j}}| \mu_{j} \} \left[ 1 + \frac{ | {[\boldsymbol{\Gamma}^{\prime} \mathbf{v} ]_{j}}|^{2} - 2 \delta_{j} | {[\boldsymbol{\Gamma}^{\prime} \mathbf{v} ]_{j}}| i} {{\alpha_{j}^{2}} - {\delta_{j}^{2}}} \right]^{-\frac{\lambda_{j}}{2}}\\&&\times \frac{ K_{\lambda_{j}} \left( \sqrt{ {\kappa_{j}^{2}} \left[ | {[\boldsymbol{\Gamma}^{\prime} \mathbf{v} ]_{j}}|^{2} - 2 \delta_{j} | {[\boldsymbol{\Gamma}^{\prime} \mathbf{v} ]_{j}}| i + {\alpha_{j}^{2}} - {\delta_{j}^{2}} \right]} \right)} { K_{\lambda_{j}} \left( \kappa_{j} \sqrt{{\alpha_{j}^{2}} - {\delta_{j}^{2}}} \right)}. \end{array} $$

(41)

Now if we consider moving t in the direction z

$$ \begin{array}{@{}rcl@{}} \varphi_{\mathbf{X}}(\mathbf{v} = t \mathbf{z} ) &=& \prod\limits_{j = 1}^{P} \exp\{ i t | {[\boldsymbol{\Gamma}^{\prime} \mathbf{z} ]_{j}}| \mu_{j} \} \left[ 1 + \frac{ t^{2} | {[\boldsymbol{\Gamma}^{\prime} \mathbf{z} ]_{j}}|^{2} - 2 \delta_{j} t | {[\boldsymbol{\Gamma}^{\prime} \mathbf{z} ]_{j}}| i} {{\alpha_{j}^{2}} - {\delta_{j}^{2}}} \right]^{-\frac{\lambda_{j}}{2}}\\&&\times \frac{K_{\lambda_{j}} \left( \sqrt{ {\kappa_{j}^{2}} \left[ t^{2} | {[\boldsymbol{\Gamma}^{\prime} \mathbf{z} ]_{j}}|^{2} - 2 \delta_{j} t | {[\boldsymbol{\Gamma}^{\prime} \mathbf{z} ]_{j}}| i + {\alpha_{j}^{2}} - {\delta_{j}^{2}} \right]} \right)} { K_{\lambda_{j}} \left( \kappa_{j} \sqrt{{\alpha_{j}^{2}} - {\delta_{j}^{2}}} \right)}, \end{array} $$

and, for large t, the characteristic function is

$$ \begin{array}{@{}rcl@{}} \varphi_{\mathbf{X}}(\mathbf{v} = t \mathbf{z} ) \!&\propto&\! \exp\left\{ i t \sum\limits_{j = 1}^{P} | {[\boldsymbol{\Gamma}^{\prime} \mathbf{z}]_{j}}| \mu_{j} - t \sum\limits_{j = 1}^{P} \kappa_{j} | {[\boldsymbol{\Gamma}^{\prime} \mathbf{z} ]_{j}} | - \log(t) \sum\limits_{j = 1}^{P} \lambda_{j} I\left( |{[\boldsymbol{\Gamma}^{\prime} \mathbf{z} ]_{j}}| \!\neq\! 0 \right) + O(1) \right\} \\ \!& \propto&\! \exp\left\{ i t \mathbf{z}^{\prime} \boldsymbol{\Gamma} \boldsymbol{\mu} - t \sum\limits_{j = 1}^{P} \kappa_{j} | {[ \boldsymbol{\Gamma}^{\prime} \mathbf{z} ]_{j}} | - \log(t) \sum\limits_{j = 1}^{P} \lambda_{j} I\left( | {[\boldsymbol{\Gamma}^{\prime} \mathbf{z} ]_{j}}| \neq 0 \right) + O(1) \right\} . \end{array} $$

Therefore, from the condition given in Eq. 34, there exists z such that the tuple $({\sum }_{j = 1}^{P} \kappa _{j} \left | [ \boldsymbol {\Gamma }^{\prime } \mathbf {z} ]_{j} \right |, \mathbf {z}^{\prime } \boldsymbol {\Gamma } \boldsymbol {\mu } )$ is unique for all g = 1, … , G and reduces to the univariate hyperbolic distribution, which is identifiable.

3.3 3.3 Identifiability of the Coalesced Generalized Hyperbolic Distribution

To prove the identifiability of the CGHD, we only need to show that two sets of distributions, the multiple scaled and the generalized hyperbolic distribution are disjoint. Consider moving along the k th eigenvalue such that (λ_k, κ_k) is distinct from (λ₀, κ₀) and the proof easily follows from the identifiability of the univariate generalized hyperbolic distribution.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tortora, C., Franczak, B.C., Browne, R.P. et al. A Mixture of Coalesced Generalized Hyperbolic Distributions. J Classif 36, 26–57 (2019). https://doi.org/10.1007/s00357-019-09319-3

Download citation

Published: 22 April 2019
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s00357-019-09319-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Mixture of Coalesced Generalized Hyperbolic Distributions

Abstract

Access this article

Similar content being viewed by others

Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions

Mixture of Two One-Parameter Lindley Distributions: Properties and Estimation

Introducing a Family of Distributions by Using the Class of Normal Mean–Variance Mixture

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix 1: Parameter Estimation

Appendix 2: Quasi-Concavity of the cMSGHD

Definition 1

Definition 2

Lemma 1

Proof

Theorem 1

Proof

Theorem 2

Proof

Appendix 3: Finite Mixture Identifiability

3.1 3.1 Background

Definition 3

3.2 3.2 Identifiability of a Finite Mixture of Multiple Scaled Distributions

3.3 3.3 Identifiability of the Coalesced Generalized Hyperbolic Distribution

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Mixture of Coalesced Generalized Hyperbolic Distributions

Abstract

Access this article

Similar content being viewed by others

Robust finite mixture modeling of multivariate unrestricted skew-normal generalized hyperbolic distributions

Mixture of Two One-Parameter Lindley Distributions: Properties and Estimation

Introducing a Family of Distributions by Using the Class of Normal Mean–Variance Mixture

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendices

Appendix 1: Parameter Estimation

Appendix 2: Quasi-Concavity of the cMSGHD

Definition 1

Definition 2

Lemma 1

Proof

Theorem 1

Proof

Theorem 2

Proof

Appendix 3: Finite Mixture Identifiability

3.1 3.1 Background

Definition 3

3.2 3.2 Identifiability of a Finite Mixture of Multiple Scaled Distributions

3.3 3.3 Identifiability of the Coalesced Generalized Hyperbolic Distribution

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation