InChIKey - the InChI hash code released
The IUPAC International Chemical Identifier team at IUPAC and NIST just released a shorter version of the InChI code (a unique chemical identifier) called InChIKey - to allow search engines like Google Scholar, Yahoo, Scirus or MSN a better and more reliable search of chemical information on the internet. Additionally specialized chemistry search engines like eMolecules and ChemSpider and PubMed and CAS can use the InChIKey to built reliable chemistry related web services. The InChIKey is not a replacement for the InChI code but a unique identifier for chemicals on the internet. RSC with Project Prospect and ChemSpider provide an InChIKey resolver service to lookup InChIKeys and generate InChIKeys from structures: InChIKey Resolver.
1) methylene blue
3) Urolene blue
4) Swiss Blue
5) Solvent blue 8
6) Bleu de methylene
Having the InChIKey on all web resources and all peer-reviewed open source publications any commercial or non-commercial automated service could connect these publications and proceedings in a semantic manner. Searching the short InChIKey together with the keyword Malaria would only result in related publications. Chemicals can have up to 200 synonyms; SMILES codes which claim to be unique are only unique if they are produced with the same software, but not if compared with the output from 10 different software packages. The PubChem ID could be used as unique identifier, but fails if the substance is not yet in PubChem. This is also the problem with CAS numbers, if the new substance is not yet in the CAS database, there is no CAS number. More problematic is the license which only allows the storage of 5000 CAS numbers for academic projects. What happens with a project database that has 5000+1 CAS numbers? Usually cheminformatics and metabolomics projects include millions of chemicals. Even worse is the fact that all CAS numbers have to be deleted after the project is finished. So CAS numbers are not useful unique identifiers for chemicals. In conclusion the SMILES codes, chemical names, CAS numbers or PubChem IDs are no suitable candidates. Therefore a free, open-source, unique and search engine friendly identifier for chemicals which can be directly generated from the molecular structure was urgently needed. Here it is. Watch the Video at the Google Tech Talk.
Problem (InChI too long for search engine requests and string is broken, example Angiotensin):
Google output of the InChI search : "29" (and any subsequent words) was ignored because we limit queries to 32 words.
Solution (a shorter unique representation aka hash code for Angiotensin, CID=5890):
Include InChI and the InChIKey in every publication to allow search engines to gather this information together with related information to start building the chemical semantic web.
Beta Version: http://www.iupac.org/inchi/download/index.html
Last modified 2009-04-02 07:53 PM