Database Aggregators for Metabolomics

Tobias Kind; JANUARY 2010;  Davis, CA, CC-BY

Database Aggregators for Metabolomics contain sources, tools and technologies to perform common and batch wise tasks for metabolite annotations.

Structure Database Technologies and Tools

  • MayaChemTools -  (Manish Sud)
    Manipulation of SD, CSV/TSV, Sequence/Alignments, and PDB files        * Analysis of data in SD, CSV/TSV, and Sequence/Alignments files        * Information about data in SD, CSV/TSV, Sequence/Alignments, PDB, and fingerprints files        * Exporting data from Oracle and MySQL tables into text files        * Properties of periodic table elements, amino acids, and nucleic acids        * Elemental analysis        * Generation of fingerprints corresponding to path lengths, MACCS keys, extended connectivity, atom neighborhoods, topological atom pairs, topological atom torsions, topological pharmacophore atom pairs, and topological pharmacophore atom triplets        * Calculation of similarity matrices using a variety of similarity and distance coefficients        
  • JCHEM Cartridge - - JAVA chemical interface to relational database engines
  • CDK JAVA API - Chemistry Development Kit - JAVA library
  • MDL Symyx Cartrige DIRECT -
  • OreChem - - an Oracle chemistry plug-in using the Chemistry Development Kit (CDK)
  • OSCAR3 and OPSIN - OSCAR3 (Open Source Chemistry Analysis Routines) is software for the semantic annotation of chemistry papers. The modules OPSIN (a name to structure converter) -
  • TRIPOD - - I-Tunes for drug discovery

Database Aggregators, Gaggles, Meta Approaches, Mashes
Mashups are web application hybrids and try to combine different databases under one common interface, Gaggles are software frameworks (programs) to combine different data sources and explore data under different view points. The reason for those approaches is, that there is and never will be a total comprehensive (real-time) collection of data.

  • CTS - Chemical Translation Service -
    Discovers chemical names in text; Converts CAS, CHEBI, Pubchem CID, Formula, HMDB, InChI Code, InChI Key, IUPAC Name, KEGG, LIPID MAP, Exact Mass, Synonyms, Pubchem SID, SMILES as batch and vice versa
  • CACTVS Chemical Resolver (NCI/CADD) -
    Chemical Identifier Resolver beta 2 (Documentation)
    This resolver with a 80 million chemical compound database in the back end, can resolve, convert and show:
    Cactvs HASHISY, CAS Registry Number, Chemical Formula, FICTS Identifier, FICuS Identifier, GIF Image, IUPAC Name, Molecular Weight, Names, SD File, SMILES, Standard InChI, Standard InChIKey, TwirlyMol (3D), uuuuu Identifiers
  • MetMask -
    Tool for managing chemical identifiers for metabolomics experiments. It can incorporate identifiers from local textfiles, several online databases, query PubChem and record all found associations in a local sqlite database.
  • Chemspider -
    Chemspider hosts a multitude of structures, spectra, literature references, it includes resolver services and web frontends for many possible conversion. From new kid on the blog to chemical database juggernaut, nothing is impossible for ChemSpider(Man).
  • Chemicalize -
    Annotation of text with chemical structures and structure property predictions [Citric Acid]
  • Metabolite Set Enrichment Analysis (MSEA) -
    Identify and interpret patterns of metabolite concentration changes in a biologically meaningful context,
    compound ID conversion for  HMDB, PubChem, ChEBI, METLIN, BiGG, Reactome, BioCyc, KEGG with error correction

Metabolite and Small Molecule Database BATCH Downloads

Database Downloads for inhouse databases can be obtained from the following resources below. API and web interfaces could avoid the download of such large and static database sets. Furthermore older database IDs and error corrections always need to be updated in an in-house DB, which requires costly and manual curation. Large Databases like ChemSpider are not available for batch download. Batch download means, a fully annotated SDF file, ASCII file, database dump file (.dmp) or XML file is available. Word stop lists and ontology dictionaries are important for lookup of chemicals.


Mass Spectrometry APIs

Database Aggregators for Mass Spectrometry search. There is no large freely available Mass Spectrometry Database. The largest freely accessible DB is the MASSBank DB. However it is possible to search accurate masses with a certain mass accuracy and obtain possible structure hit candidates. There is a problem with accurate mass searches, using accurate mass only the orthogonal isotopic pattern filter is lost. Unless the obtained molecular formulas from a query are all the same, this can only be prevented by searching molecular formulas instead of accurate masses. (See Seven Golden Rules)


