Rebecka Jornsten Department of Statistics, Rutgers University TITLE: Cluster Validation using the Relative Data Depth. ABSTRACT: Gene and sample clustering are important tasks in the analysis of gene expression data. Gene clustering can suggest candidate gene pathways, and provide a method for dimension reduction. Sample clustering can be used for class discovery, or validation. We present a fast and robust k-median clustering method based on a modified Weiszfeld algorithm for the multivariate median. The multivariate medians are used to represent the clusters, while the associated data depths are used to identify the nuclei of clusters, and outliers. We introduce a new cluster validation and visualization tool based on the within cluster data depths, and the data depths with respect to competing clusters. We demonstrate on several gene expression data sets, and simulated data sets that this Relative Data Depth outperforms the silhouette width and the gap statistic for estimating the number of clusters, and identifying outliers. References: [1] S. Dudoit, J. Fridlyand. Application of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method. Technical report 600, Department of Statistics, UC Berkeley (2001). [2] R. Tibshirani, G. Walther, T. Hastie. Estimating the number of clusters in a data set via the gap statistic. Technical report, Department of Statistics, Stanford university (2000). [3] Y. Vardi, C. Zhang. The multivariate L1 median and associated data depth. PNAS 97: 1423-1426 (2000).