Generalized Statistical Methods for Mixed Exponential Families, Part I: Theoretical Foundations

Cécile Levasseur, Kenneth Kreutz-Delgado and Uwe F. Mayer

Abstract: This work considers the problem of learning the underlying statistical structure of multidimensional data of mixed probability distribution types (continuous and discrete) for the purpose of fitting a generative model and making decisions in a data-driven manner. Using properties of exponential family distributions and generalizing classical linear statistics techniques, a unified theoretical model called Generalized Linear Statistics (GLS) is established. The methodology exploits the split between data space and natural parameter space for exponential family distributions and solves a nonlinear problem by using classical linear statistical tools applied to data that have been mapped into the parameter space. The framework is equivalent to a computationally tractable, mixed data-type hierarchical Bayes graphical model assumption with latent variables constrained to a low-dimensional parameter subspace. We demonstrate that exponential family Principal Component Analysis, Semi-Parametric exponential family Principal Component Analysis, and Bregman soft clustering are not separate unrelated algorithms, but different manifestations of model assumptions and parameter choices taken within this common GLS framework. We readily extend these algorithms to deal with the important mixed data-type case. We study in detail the extreme case corresponding to exponential family Principal Component Analysis and solve problems related to fitting the generative model.

Key words: Generalized Linear Models, latent variables, exponential families, graphical models, dimensionality reduction.


You can download a copy of this paper (about 16 pages).

Mayer23.pdf  This file is in Portable Document Format (374 Kbytes).



[leftarrow]Back

mayer@math.utah.edu
Wed Sep 9 22:51:25 PDT 2009
Last updated: Wed Sep 9 22:51:25 PDT 2009