There are plethora
of statistics packages out there, and among the most important decisions to choose the right package
are listed below. A good starter for selecting the right package is KDNuggets
and the annual data mining review.
A) Coverage of common statistical methods, classification and regression and machine learning
approaches
B) Coverage of "modern" methods (not older than F-test) like MARS, CART, ROC curves, feature selections,
meta learning methods (bagging), genetic methods and others.
C) GUI with import/export functionality of multiple formats or command line for large batch
projects
D) Active support with working CRM system or active and large user group for active discussions
We rely on Statistica Dataminer which has around 10,000 statistical functions and covers most of the
methods we use and are able to use in metabolomics. Other packages we use:
A)
Statistica Dataminer (academic price)
B) WEKA (free)
C)
RapidMiner (free)
D)
MEV - TIGR Multiexperiment Viewer (free)
E)
Genedata Expressionist Analyst
F) R-Project (R-Project for Statistical
Computing) download via CRAN
G) TANAGRA - a
free data mining project with a GUI and comprehensive features and tutorials
H) R-Commander and RExcel (integration
of R into EXCEL)
R Through Excel - A Spreadsheet Interface for Statistics, Data Analysis, and Graphics [PDF]
There are many other (great) packages out there like Derive, Mathematica, matlab, Origin, SAS, STATA
however
they can not be covered here. Some packages of interest include JAVA
TreeView, XTAL
Regression Package. (with the original MARS code from Friedman), HDBSTatfor
gene expression analysis. A list of genetic
algorithms.