The (rice) metabolome
How large is the metabolome? A Critical Analysis of Data Exchange Practices in Chemistry
Project Partner:
Tobias Kind, Martin Scholz, Oliver Fiehn

Results:
Kind T, Scholz M, Fiehn O (2009) How Large Is the Metabolome? A Critical Analysis of Data Exchange Practices in Chemistry. PLoS ONE 4(5): e5440.; dx.doi.org/10.1371/journal.pone.0005440; Download article here [DOI] [PDF]
Short Introduction:
As an example, results are presented for rice. Two rice (oryza sativa) subspecies have been fully sequenced, oryza japonicaoryza indica. Several major small molecule databases were compared for listing known rice metabolites comprising PubChem, Chemical Abstracts, Beilstein, Patent databases, Dictionary of Natural Products, SetupX/BinBase, KNApSAcK DB, and finally those databases which were obtained by computational approaches, i.e. RiceCyc, KEGG, and Reactome. More than 5,000 small molecules were retrieved when searching these databases. Unfortunately, most often, genuine rice metabolites were retrieved together with non-metabolite database entries such as pesticides. Overlaps from database compound lists were very difficult to compare because structures were either not encoded in machine-readable format or because compound identifiers were not cross-referenced between databases.
We conclude that present databases are not capable of comprehensively retrieving all known metabolites. Metabolome lists are yet mostly restricted to genome-reconstructed pathways. We suggest that providers of (bio)chemical databases enrich their database identifiers to PubChem IDs and InChIKeys to enable cross-database queries. In addition, peer-reviewed journal repositories need to mandate submission of structures and spectra in machine readable format to allow automated semantic annotation of articles containing chemical structures. Such changes in publication standards and database architectures will enable researchers to compile current knowledge about the metabolome of species, which may extend to derived information such as spectral libraries, organ-specific metabolites, and cross-study comparisons.



Provided project software:
Picture service (for your convenience):
Call for participation and open discussion:
Please comment on this article or discuss barriers, problems, obstacles or missing projects from the article. The focus is not on how many databases exist, but how such databases can be enriched with experimental, machine readable data from electronic structure and spectral data submissions directly from publications. The PLOS comment section requires a valid (non-anonymous) login, the comment rider is on the top see below graphics [LINK to comments].

Please comment on the article by creating a PLOS login and write about your ideas regarding this article.
1) TEXTPAD ($$) www.textpad.com
2) MS EXCEL ($$$) + Visual Basic www.microsoft.com
3) ChemAxon molconvert (free), cxcalc (academic license), JCHEM full (academic license)
4) ChemAxon Instant-JChem (free academic version)
5) EPA EPISuite (free)
6) Beilstein Crossfire for searching the Beilstein database of organic compounds and properties
7) Scifinder Scholar for searching the CAS database
8) InChI and InChIkey software (free)
Databases and Services (updated):
1) The PubChem database (free) - download the whole PubChem DB here: PubChem FTP
2) The Dictionary of Natural Products ($$$$) Web version
3) The KEGG database (free)
4) The peptide DB and metabolome DB (free)
5) The MDL Beilstein database ($$$$$)
6) The CAS database ($$$$$ academic or $$$$$$ commercial)
7) The ChemSpider DB (free) - largest information enhanced DB with mass spectrometry API
8) The RiceCyc DB - Rice Metabolic Pathways: RiceCyc Home
9) The Reactome DB
10) The SetupX - biological experiment database
11) The KNApSAcK DB - Species-Metabolite Relationship Database
12) The SureChem patent database
13) The IBM Patent chemical search
14) The MetaCrop DB - a detailed database of crop plant metabolism
15) The LipidMaps DB - LIPID Metabolites And Pathways Strategy
16) The Dr. Duke's Phytochemical and Ethnobotanical Database
17) The NCBI Taxonomy DB
18) The Oryzabase - integrated rice sciences database
19) The IBM Chemical Patent search (Simple) beta
20) The BatchEntrez service to retrieve compounds from PubChem compound IDs
21) The InChiKey resolver from RSC and ChemSpider
Compound annotations from text (Name; PubChem CID; InChIKey):
2-acetyl-1-pyrroline; CID 522834; DQBQWWSFRPLIAX-UHFFFAOYAG
Vitamin-A; CID 445354; FPIPGXGPPPQFEQ-OVSJKPMPBW
Beta-carotene; CID 5280489; OENHQHLEOONYIE-JLTXGRSLBT
Bisbynin; CID NA; ICHJNTDKHBXTFN-CMZGOGIXBZ [CML] [MOL] [ChemSpider]
Trans-luteine; CID 5368396; KBPHJBAIARWVSC-DKLMTRRABK
Cholesterol; CID 5997; HVYWMOMLDIMFJA-DPAQBDIFBB
Malathion; CID 4004; JXSJBGJIGXNWCI-UHFFFAOYAK
Chlorpyrifos; CID 2730; SBPBAQFWLVIOKP-UHFFFAOYAG
Ribosylnicotinamide; CID 439924; JLEBZPBDRKPWTD-ARWKKGFBBE
Omeprazol; CID 4594; SUBDBMMJDZJVOS-UHFFFAOYAZ
Rhodopinal; CID 20055178; GOJQFVQXKNNAAY-XQHLYSSHBM
Tegafur; CID 5386; WFWLQNSHRPWKFK-UHFFFAOYAE
Arginine; CID 232; ODKSFYDXXFIFQN-UHFFFAOYAT
Optical Character Recognition and Chemical Structure Recognition:
1) OSRA - Optical Structure Recognition (NIH) (free, open source)
2) Kekule - OCR-optical chemical (structure) recognition (NCI)
3) Clide & Clide Pro - Chemical literature data extraction tool (Univ. Leeds/ SimBioSys/Keymodule)
4) ChemoCR - Tool for Chemical Compound Reconstruction
5) ChemReader - Automated extraction of chemical structure information
Text based semantic annotation tools and projects:
1) Oscar3 - Open Source Chemistry Analysis Routines (open source)
2) Chem-MANTIS - Nomenclature Transformation Integrated System
3) Project Prospect - IUPAC, Ontology, CML, InChI enhanced chemical publications
4) Chemicalize.org - web based annotation service via ChemAxon proxy (name to structure)
Name to chemical structure converters (vice versa):
1) Autonom - Beilstein Institute
2) IBM Chemical Annotator - IBM Almaden
3) Lexichem - OpenEye
4) Struct <=> Name - CambridgeSoft
5) Marvin IUPAC Name - ChemAxon
6) ACDName - Structure to Name and Name to Structure ACDLabs
7) NameExpert and Nomenclator - Cheminnovation
8) IUPAC NameIt - BioRad
9) OPSIN - name to structure converter open source project (OSCAR3)
Last modified 2009-05-18 04:28 PM