Fiehn Lab - Tools

There are plethora of statistics packages out there, and among the most important decisions to choose the right package are listed below. A good starter for selecting the right package is KDNuggets and the annual data mining review.

A) Coverage of common statistical methods, classification and regression and machine learning approaches

B) Coverage of "modern" methods (not older than F-test) like MARS, CART, ROC curves, feature selections, meta learning methods (bagging), genetic methods and others.

C) GUI with import/export functionality of multiple formats or command line for large batch projects

D) Active support with working CRM system or active and large user group for active discussions

We rely on Statistica Dataminer which has around 10,000 statistical functions and covers most of the methods we use and are able to use in metabolomics. Other packages we use:

A) Statistica Dataminer (academic price)
B) WEKA (free)
C) RapidMiner (free)
D) MEV - TIGR Multiexperiment Viewer (free)
E) Genedata Expressionist Analyst
F) R-Project (R-Project for Statistical Computing) download via CRAN
G) TANAGRA - a free data mining project with a GUI and comprehensive features and tutorials
H) R-Commander and RExcel (integration of R into EXCEL)
R Through Excel - A Spreadsheet Interface for Statistics, Data Analysis, and Graphics [PDF]

There are many other (great) packages out there like Derive, Mathematica, matlab, Origin, SAS, STATA however they can not be covered here. Some packages of interest include JAVA TreeView, XTAL Regression Package. (with the original MARS code from Friedman), HDBSTatfor gene expression analysis. A list of genetic algorithms.

Tools for Statistics