Skip to content

Metabolomics Fiehn Lab

Sections
Personal tools
You are here: Home » Members » Dr. Tobias Kind » InChIKey - the InChI hash code released
« November 2009 »
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30          
 

InChIKey - the InChI hash code released

Document Actions

The IUPAC International Chemical Identifier team at IUPAC and NIST just released a shorter version of the InChI code (a unique chemical identifier) called InChIKey - to allow search engines like Google Scholar, Yahoo, Scirus or MSN a better and more reliable search of chemical information on the internet. Additionally specialized chemistry search engines like eMolecules and ChemSpider and PubMed and CAS can use the InChIKey to built reliable chemistry related web services. The InChIKey is not a replacement for the InChI code but a unique identifier for chemicals on the internet. RSC with Project Prospect and ChemSpider provide an InChIKey resolver service to lookup InChIKeys and generate InChIKeys from structures: InChIKey Resolver.

Why is it important to link information on the web and publications via the InChiKey? Imagine seven international labs working on a new methylene blue drug to cure Malaria. All use different names:

1) methylene blue 
2) Chromosmon 
3) Urolene blue 
4) Swiss Blue
5) Solvent blue 8
6) Bleu de methylene
7) Methylenblau


InChIKey=CXKWCBBOMKCUKX-REWHXWOFAR


Having the InChIKey on all web resources and all peer-reviewed open source publications any commercial or non-commercial automated service could connect these publications and proceedings in a semantic manner. Searching the short InChIKey together with the keyword Malaria would only  result in related publications. Chemicals can have up to 200 synonyms; SMILES codes which claim to be unique are only unique if they are produced with the same software, but not if compared with the output from 10 different software packages. The PubChem ID could be used as unique identifier, but fails if the substance is not yet in PubChem. This is also the problem with CAS numbers, if the new substance is not yet in the CAS database, there is no CAS number. More problematic is the license which only allows the storage of 5000 CAS numbers for academic projects. What happens with a project database that has 5000+1 CAS numbers? Usually cheminformatics and metabolomics projects include millions of chemicals. Even worse is the fact that all CAS numbers have to be deleted after the project is finished. So CAS numbers are not useful unique identifiers for chemicals. In conclusion the SMILES codes, chemical names, CAS numbers or  PubChem IDs are no suitable candidates. Therefore a free, open-source, unique and search engine friendly identifier for chemicals which can be directly generated from the molecular structure was urgently needed. Here it is. Watch the Video at the Google Tech Talk.

Problem (InChI too long for search engine requests and string is broken, example Angiotensin):
InChI=1/C49H70N14O11/c1-26(2)39(61-42(67)33(12-8-18-55-49(52)53)57-41(66
)32(50)23-38(51)65)45(70)58-34(20-29-14-16-31(64)17-15-29)43(68)62-40(27
(3)4)46(71)59-35(22-30-24-54-25-56-30)47(72)63-19-9-13-37(63)44(69)60-36
(48(73)74)21-28-10-6-5-7-11-28/h5-7,10-11,14-17,24-27,32-37,39-40,64H,8-
9,12-13,18-23,50H2,1-4H3,(H2,51,65)(H,54,56)(H,57,66)(H,58,70)(H,59,71)(
H,60,69)(H,61,67)(H,62,68)(H,73,74)(H4,52,53,55)/f/h56-62,73H,51-53H2

Google output of the InChI search : "29" (and any subsequent words) was ignored because we limit queries to 32 words.

Solution (a shorter unique representation aka hash code for Angiotensin, CID=5890):
InChIKey=JYPVVOOBQVVUQV-UHFFFAOYAR


Recommendation:

Include InChI and the InChIKey in every publication to allow search engines to gather this information together with related information to start building the chemical semantic web.

Download:

Open Source Code: http://sourceforge.net/projects/inchi
Beta Version: http://www.iupac.org/inchi/download/index.html

Newssource: Egon Willighagen via cb.openmolecules.net

Created by zwluxx
Last modified 2009-04-02 07:53 PM
 

Powered by Plone

This site conforms to the following standards: