Statistical concepts in metabolomics cover very wide range of different applications. I will focus here on metabolic profiling experiments. Other concepts can be found on the left navigation bar. Metabolomic profiling datasets are highly dimensional (not that high like in gene array experiments). A typical experiment can include hundred samples and usually 150 known metabolites and 150 unknown metabolites. Hence this would be a 100x300 matrix. In order to understand the experiment we need to reduce it to 2 dimensions or maximum 3 dimensions, in order to look at it. Furthermore the human brain can barely handle data in more than four dimensions. The following five steps are always repeated and must be done after each analysis, hence they can be automated in a workflow like Statistica Dataminer.


1) Basic Statistics for Metabolome Data
Mean, Max, Stdev, Kurtosis, frequency tables, detect outliers data preparation (detect outliers, normalize, replace empty data if needed) this also includes different normalization procedures.


2) Analysis of variance for classes (class-ANOVA)
Find interaction effects between variables test for significant differences between mean values correlations within groups (statistical sheets and graphical output). This includes a t-test (2 classes) or ANOVA (multiple classes). ANOVA can also be used for time-course analysis.


3) Unsupervised multivariate analysis (PCA)

This step using principal component analysis (PCA) or cluster analysis is used as a dimension reduction method. Cluster analysis has to be performed on the PCA output or the dataset itself. PCA is not a cluster analysis!


4) Supervised multivariate analysis (PLS with feature selection and model building)
Find important variables (feature selection, biomarker detection) using PLS or other supervised classifications like LDA, PLS, CART (tree models), k-NN, SVM (machine learning) if class information is provided. Class information is typically sick/healty or wild type vs genotype with possible time courses.


5) Export data and submit to other tools
Make data (calculations, graphics) available as XML, EXCEL, HTML, PDF and submit to further analysis, like network analysis or time-course analysis or interpretation.


Software

  • MetaboAnalyst - an online service from metabolomics experiments with normalization modules and multivariate statistics
  • MetAtt - time course analysis of metabolomics experiments - 3D PCA, heatmap, two way ANOVA, ASCA (ANOVA-simultaneous component analysis) and MEBA (empirical Bayes time-series analysis)
  • See also the Tools Section on the left