Fiehn Lab - RosettaStone

Database Aggregators for Metabolomics contain sources, tools and technologies to perform common and batch wise tasks for metabolite annotations.

Structure Database Technologies and Tools

MayaChemTools - http://www.mayachemtools.org/index.html (Manish Sud)
Manipulation of SD, CSV/TSV, Sequence/Alignments, and PDB files * Analysis of data in SD, CSV/TSV, and Sequence/Alignments files * Information about data in SD, CSV/TSV, Sequence/Alignments, PDB, and fingerprints files * Exporting data from Oracle and MySQL tables into text files * Properties of periodic table elements, amino acids, and nucleic acids * Elemental analysis * Generation of fingerprints corresponding to path lengths, MACCS keys, extended connectivity, atom neighborhoods, topological atom pairs, topological atom torsions, topological pharmacophore atom pairs, and topological pharmacophore atom triplets * Calculation of similarity matrices using a variety of similarity and distance coefficients
JCHEM Cartridge - http://www.chemaxon.com/jchem/intro/ - JAVA chemical interface to relational database engines
CDK JAVA API - Chemistry Development Kit - JAVA library
MDL Symyx Cartrige DIRECT - http://www.symyx.com/products/software/cheminformatics
OreChem - http://orchem.sourceforge.net/ - an Oracle chemistry plug-in using the Chemistry Development Kit (CDK)
OSCAR3 and OPSIN - OSCAR3 (Open Source Chemistry Analysis Routines) is software for the semantic annotation of chemistry papers. The modules OPSIN (a name to structure converter) - http://sourceforge.net/projects/oscar3-chem/
TRIPOD - http://tripod.nih.gov/ - I-Tunes for drug discovery

Database Aggregators, Gaggles, Meta Approaches, Mashes
Mashups are web application hybrids and try to combine different databases under one common interface, Gaggles are software frameworks (programs) to combine different data sources and explore data under different view points. The reason for those approaches is, that there is and never will be a total comprehensive (real-time) collection of data.

CTS - Chemical Translation Service - http://cts.fiehnlab.ucdavis.edu/
Discovers chemical names in text; Converts CAS, CHEBI, Pubchem CID, Formula, HMDB, InChI Code, InChI Key, IUPAC Name, KEGG, LIPID MAP, Exact Mass, Synonyms, Pubchem SID, SMILES as batch and vice versa
CACTVS Chemical Resolver (NCI/CADD) - http://cactus.nci.nih.gov/chemical/structure
Chemical Identifier Resolver beta 2 (Documentation)
This resolver with a 80 million chemical compound database in the back end, can resolve, convert and show:
Cactvs HASHISY, CAS Registry Number, Chemical Formula, FICTS Identifier, FICuS Identifier, GIF Image, IUPAC Name, Molecular Weight, Names, SD File, SMILES, Standard InChI, Standard InChIKey, TwirlyMol (3D), uuuuu Identifiers
MetMask - http://sourceforge.net/projects/metmask/files/
Tool for managing chemical identifiers for metabolomics experiments. It can incorporate identifiers from local textfiles, several online databases, query PubChem and record all found associations in a local sqlite database.
Chemspider - http://www.chemspider.com/
Chemspider hosts a multitude of structures, spectra, literature references, it includes resolver services and web frontends for many possible conversion. From new kid on the blog to chemical database juggernaut, nothing is impossible for ChemSpider(Man).
Chemicalize - http://www.chemicalize.org/
Annotation of text with chemical structures and structure property predictions [Citric Acid]
Metabolite Set Enrichment Analysis (MSEA) - http://www.msea.ca/
Identify and interpret patterns of metabolite concentration changes in a biologically meaningful context,
compound ID conversion for HMDB, PubChem, ChEBI, METLIN, BiGG, Reactome, BioCyc, KEGG with error correction

Metabolite and Small Molecule Database BATCH Downloads

Database Downloads for inhouse databases can be obtained from the following resources below. API and web interfaces could avoid the download of such large and static database sets. Furthermore older database IDs and error corrections always need to be updated in an in-house DB, which requires costly and manual curation. Large Databases like ChemSpider are not available for batch download. Batch download means, a fully annotated SDF file, ASCII file, database dump file (.dmp) or XML file is available. Word stop lists and ontology dictionaries are important for lookup of chemicals.

HMDB Serum Metabolome - http://serummetabolome.ca/scripts/hmdbDownload.cgi
ChEBI Downloads - http://www.ebi.ac.uk/chebi/downloadsForward.do - CHEBI FTP
PUBCHEM FTP - http://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch-help.html - [FTP]
Emolecules PLUS Download - http://www.emolecules.com/doc/plus/download-database.php
MESH - Medical Subject Headings - http://www.nlm.nih.gov/mesh/filelist.html
BioSemantics ChemList - http://www.biosemantics.org/index.php?page=chemlist
Golm Metabolite Database - http://gmd.mpimp-golm.mpg.de/search.aspx
PubChem Download Service - http://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch.cgi - [FTP/PUG/SOAP]

Mass Spectrometry APIs

Database Aggregators for Mass Spectrometry search. There is no large freely available Mass Spectrometry Database. The largest freely accessible DB is the MASSBank DB. However it is possible to search accurate masses with a certain mass accuracy and obtain possible structure hit candidates. There is a problem with accurate mass searches, using accurate mass only the orthogonal isotopic pattern filter is lost. Unless the obtained molecular formulas from a query are all the same, this can only be prevented by searching molecular formulas instead of accurate masses. (See Seven Golden Rules)

HUMAN Metabolite MASS Spec API - http://hmdb.ca/search/spectra?type=ms_search
HMDB Serum Metabolome is offered to the public as a freely available resource.
Wishart DS et al., HMDB: the Human Metabolome Database. Nucleic Acids Res. 2007 Jan;35(Database issue):D521-6.
MZedDB - http://maltese.dbs.aber.ac.uk:8888/hrmet/index.html
MZedDB the Aberystwyth University High Resolution Mass Spectrometry Laboratory database
Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules' [LINK]
Chemspider MassSpecAPI - http://www.chemspider.com/MassSpecAPI.asmx
Allows to query Chemspider DB with SearchByFormula2, SearchByMass2 using the SOAP Protocoll
PubChem PUG - http://pubchem.ncbi.nlm.nih.gov/pug/pughelp.html
The Power User Gateway can be used for JAVA, C++, VBA powered automatic access to PubChem.
Example find all PubChem CIDs with accurate mass (exact mass) betwen 800-800.1 [LINK]
METLIN DB - http://metlin.scripps.edu/metabo_search.php - from SCRIPPS
Allows mass, name, formula, CAS search with positive and negative charge

Technology blogs

CouchDB and PubChem
http://depth-first.com/articles/2010/01/20/pubcouch-a-couchdb-interface-to-pubchem

PubChem and CAS Lookup
http://depth-first.com/articles/2008/05/26/simple-cas-number-lookup-and-more-with-chempedia

Hadoop and Map Reduce
http://blog.rguha.net/?p=289

/chemical/structure/ Blog
http://cactus.nci.nih.gov/blog/

Literature

How large is the metabolome? - PLOS ONE (2009)
The project section contains additional tools

Consolidating metabolite identifiers to enable contextual and multi-platform metabolomics data analysis -
BMC Bioinformatics (2010)
Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining -
Journal of Cheminformatics (2010)