Database Aggregators for Metabolomics contain sources, tools and technologies to perform common and batch wise tasks for metabolite annotations.
Structure Database Technologies and Tools
- MayaChemTools - http://www.mayachemtools.org/index.html (Manish Sud)
Manipulation of SD, CSV/TSV, Sequence/Alignments, and PDB files * Analysis of data in SD, CSV/TSV, and Sequence/Alignments files * Information about data in SD, CSV/TSV, Sequence/Alignments, PDB, and fingerprints files * Exporting data from Oracle and MySQL tables into text files * Properties of periodic table elements, amino acids, and nucleic acids * Elemental analysis * Generation of fingerprints corresponding to path lengths, MACCS keys, extended connectivity, atom neighborhoods, topological atom pairs, topological atom torsions, topological pharmacophore atom pairs, and topological pharmacophore atom triplets * Calculation of similarity matrices using a variety of similarity and distance coefficients - JCHEM Cartridge - http://www.chemaxon.com/jchem/intro/ - JAVA chemical interface to relational database engines
- CDK JAVA API - Chemistry Development Kit - JAVA library
- MDL Symyx Cartrige DIRECT - http://www.symyx.com/products/software/cheminformatics
- OreChem - http://orchem.sourceforge.net/ - an Oracle chemistry plug-in using the Chemistry Development Kit (CDK)
- OSCAR3 and OPSIN - OSCAR3 (Open Source Chemistry Analysis Routines) is software for the semantic annotation of chemistry papers. The modules OPSIN (a name to structure converter) - http://sourceforge.net/projects/oscar3-chem/
- TRIPOD - http://tripod.nih.gov/ - I-Tunes for drug discovery
Database Aggregators, Gaggles, Meta Approaches, Mashes
Mashups are web application hybrids and try to combine different databases under one common interface, Gaggles are software frameworks (programs) to combine different data sources and explore data under different view points. The reason for those approaches is, that there is and never will be a total comprehensive (real-time) collection of data.
- CTS - Chemical Translation Service - http://cts.fiehnlab.ucdavis.edu/
Discovers chemical names in text; Converts CAS, CHEBI, Pubchem CID, Formula, HMDB, InChI Code, InChI Key, IUPAC Name, KEGG, LIPID MAP, Exact Mass, Synonyms, Pubchem SID, SMILES as batch and vice versa - CACTVS Chemical Resolver (NCI/CADD) - http://cactus.nci.nih.gov/chemical/structure
Chemical Identifier Resolver beta 2 (Documentation)
This resolver with a 80 million chemical compound database in the back end, can resolve, convert and show:
Cactvs HASHISY, CAS Registry Number, Chemical Formula, FICTS Identifier, FICuS Identifier, GIF Image, IUPAC Name, Molecular Weight, Names, SD File, SMILES, Standard InChI, Standard InChIKey, TwirlyMol (3D), uuuuu Identifiers - MetMask - http://sourceforge.net/projects/metmask/files/
Tool for managing chemical identifiers for metabolomics experiments. It can incorporate identifiers from local textfiles, several online databases, query PubChem and record all found associations in a local sqlite database. - Chemspider - http://www.chemspider.com/
Chemspider hosts a multitude of structures, spectra, literature references, it includes resolver services and web frontends for many possible conversion. From new kid on the blog to chemical database juggernaut, nothing is impossible for ChemSpider(Man). - Chemicalize - http://www.chemicalize.org/
Annotation of text with chemical structures and structure property predictions [Citric Acid] - Metabolite Set Enrichment Analysis (MSEA) - http://www.msea.ca/
Identify and interpret patterns of metabolite concentration changes in a biologically meaningful context,
compound ID conversion for HMDB, PubChem, ChEBI, METLIN, BiGG, Reactome, BioCyc, KEGG with error correction
Metabolite and Small Molecule Database BATCH Downloads
Database Downloads for inhouse databases can be obtained from the following resources below. API and web interfaces could avoid the download of such large and static database sets. Furthermore older database IDs and error corrections always need to be updated in an in-house DB, which requires costly and manual curation. Large Databases like ChemSpider are not available for batch download. Batch download means, a fully annotated SDF file, ASCII file, database dump file (.dmp) or XML file is available. Word stop lists and ontology dictionaries are important for lookup of chemicals.
- HMDB Serum Metabolome - http://serummetabolome.ca/scripts/hmdbDownload.cgi
- ChEBI Downloads - http://www.ebi.ac.uk/chebi/downloadsForward.do - CHEBI FTP
- PUBCHEM FTP - http://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch-help.html - [FTP]
- Emolecules PLUS Download - http://www.emolecules.com/doc/plus/download-database.php
- MESH - Medical Subject Headings - http://www.nlm.nih.gov/mesh/filelist.html
- BioSemantics ChemList - http://www.biosemantics.org/index.php?page=chemlist
- Golm Metabolite Database - http://gmd.mpimp-golm.mpg.de/search.aspx
- PubChem Download Service - http://pubchem.ncbi.nlm.nih.gov/pc_fetch/pc_fetch.cgi - [FTP/PUG/SOAP]
Mass Spectrometry APIs
Database Aggregators for Mass Spectrometry search. There is no large freely available Mass Spectrometry Database. The largest freely accessible DB is the MASSBank DB. However it is possible to search accurate masses with a certain mass accuracy and obtain possible structure hit candidates. There is a problem with accurate mass searches, using accurate mass only the orthogonal isotopic pattern filter is lost. Unless the obtained molecular formulas from a query are all the same, this can only be prevented by searching molecular formulas instead of accurate masses. (See Seven Golden Rules)
- HUMAN Metabolite MASS Spec API - http://hmdb.ca/search/spectra?type=ms_search
HMDB Serum Metabolome is offered to the public as a freely available resource.
Wishart DS et al., HMDB: the Human Metabolome Database. Nucleic Acids Res. 2007 Jan;35(Database issue):D521-6. - MZedDB - http://maltese.dbs.aber.ac.uk:8888/hrmet/index.html
MZedDB the Aberystwyth University High Resolution Mass Spectrometry Laboratory database
Metabolite signal identification in accurate mass metabolomics data with MZedDB, an interactive m/z annotation tool utilising predicted ionisation behaviour 'rules' [LINK] - Chemspider MassSpecAPI - http://www.chemspider.com/MassSpecAPI.asmx
Allows to query Chemspider DB with SearchByFormula2, SearchByMass2 using the SOAP Protocoll - PubChem PUG - http://pubchem.ncbi.nlm.nih.gov/pug/pughelp.html
The Power User Gateway can be used for JAVA, C++, VBA powered automatic access to PubChem.
Example find all PubChem CIDs with accurate mass (exact mass) betwen 800-800.1 [LINK] - METLIN DB - http://metlin.scripps.edu/metabo_search.php - from SCRIPPS
Allows mass, name, formula, CAS search with positive and negative charge
Technology blogs
- CouchDB and PubChem
http://depth-first.com/articles/2010/01/20/pubcouch-a-couchdb-interface-to-pubchem
- PubChem and CAS Lookup
http://depth-first.com/articles/2008/05/26/simple-cas-number-lookup-and-more-with-chempedia
- Hadoop and Map Reduce
http://blog.rguha.net/?p=289
- /chemical/structure/ Blog
http://cactus.nci.nih.gov/blog/
Literature
- How large is the metabolome? - PLOS ONE (2009)
The project section contains additional tools
- Consolidating metabolite identifiers to enable contextual and multi-platform metabolomics data analysis -
BMC Bioinformatics (2010) - Automatic vs. manual curation of a multi-source chemical dictionary: the impact on text mining -
Journal of Cheminformatics (2010)