Mathematical Biology Seminar

Anna Little, UU Math
Wednesday, October 18, 2023
1:45pm in LCB 323
Clustering and Visualization of High-dimensional Data using Path Metrics

Abstract: This talk will explore the utility of data-driven path metrics for the clustering and visualization of high-dimensional data. These metrics are defined by solving an optimal path problem in a proximity graph, and are characterized by a parameter harmonizing density-based and geometric features. First, we will discuss theoretical properties of these metrics and implications for clustering: in particular, spectral clustering with path metrics leads to strong theoretical guarantees on estimating number of clusters and cluster accuracy. Furthermore, as the sample size converges to infinity, the eigenvalues and eigenvectors of the discrete path metric graph Laplacian converge to those of a continuum operator; this operator generates a diffusion which is accelerated in regions of high data density, allowing for the rapid exploration of elongated data structures. Secondly, we will discuss dimension reduction with path metrics. Despite the allure of visually striking results from dimension reduction algorithms, can we be sure the perceived patterns are genuinely intrinsic to the data? We will see how designing algorithms which incorporate data-driven path metrics can lead to desirable and understandable properties for data visualization, and specifically focus on their application to the analysis of single cell RNA sequence data. Finally, we will discuss generalizations of these metrics to simplex paths, and their utility for multi-manifold clustering.