R-tip: Simple way to get top varying genes from an expression matrix

June 28th, 2010 No comments

A simple bit of R-code to identify the top N most varying genes across a multi-condition numeric matrix
#calculate the variance by row
v <- apply(data,1,var);
#now get indices of rows whose variance is in the top n (you could do this with a sort on the variance)
sub <- v > quantile(v, (nrow(data) – n)/nrow(data));
#create the sub-matrix
subset <- data[sub,];