... is a chemical discipline that uses mathematical, statistical and other methods to design or select optimal measurement procedures and experiments and to provide maximum relevant chemical information by analyzing chemical data. Please see our tools section.

Title Molecular similarity and diversity in chemoinformatics: From theory to applications
Ana G. Maldonado, J.P. Doucet, Michel Petitjean & Bo-Tao Fan
Source Molecular Diversity (2006) 10: 39–79
DOI dx.doi.org/10.1007/s11030-006-8697-1
Short Review

This review gives an almost complete (its not a book, but 30 pages) overview about structure handling (SMILES, SDF), molecular databases (NCI,MDDR), molecular descriptors and of course similarity and dissimilarity calculations for small molecules. This is especially helpful for comparing large drug databases and calculating the overlap of these databases, or creating unique databases in silico with special designed properties. Different criterions are presented, ie. pharmaceutical or spectral criterions which lead to the selection of different descriptors (2D. 3D, Balaban index, logP, CoMFA descriptors etc) for dissimilarity calculations.

Title Chemoinformatics methods for systematic comparison of molecules from
natural and synthetic sources and design of hybrid libraries
Jürgen Bajorath
Source Molecular Diversity, 5: 305-313, 2000 (published in 2002).
DOI doi:10.1023/A:1020868022748
Short Review

Drug companies always used "nature" for getting new inspirations, hence new ideas for molecules (diversity sets) and their usuage for treating diseases as a long term goal. However just comparing synthetic compound libraries with natural compound databases doing diversity calulations may not lead to direct usable ideas. Here the authors developed a hybrid library ‘MetaFocus’ (Metabolite Focused) library from Available Chemicals Directory (ACD) and the CRC Dictionary of Natural Products (DNP) molecules. "The strategy relies to a large extent on molecular similarity calculations to combine information from natural and synthetic molecules". Synthetic mimics of difficult to synthesize natural products can be used to built new lead molecules.

See also: Design of Array-Type Compound Libraries that Combine Information from Natural Products and Synthetic Molecules DOI - The Dictionary of Natural Products lists ~60.000 natural products (V code assigned, year 2006) which are parent structures (without additional modifications). See the Dictionary Subset of the Combinded Chemical Dictionary.

33009 B - Organometallic Compounds
5540 C - Organophosphorus Compounds
4476 F - Carbohydrates
2761 G - The Lipid Handbook
1644 H - Amino Acids and Peptides
6095 J - Steroids
2938 K - Marine Natural Products
16676 M - Commonly Cited Compounds
31122 N - Inorganic Compounds
5398 P - Analytical Reagents
7631 R - Rubber Handbook
58821 V - Natural Products
8369 W - Food
12454 X - Drugs
1423 Y - Monomers

Title Selection Criteria for Drug-Like Compounds
Ingo Muegge
Source Medicinal Research Reviews, Vol. 23, No. 3, 302-321, 2003
DOI doi:10.1002/med.10041
Short Review This study shows, why the famous Lipinski Rule of Five can not be used alone to distinguish between drugs and non-drugs in a comprehensive manner. Many examples with MDDR, CMC and the ACD database are given, The authors suggest to use multiple filter criteria cascades with relevant 2D and 3D descriptors and biological properties with the use of machine learning algorithms for a fine-grained model.

Title Property distribution of drug-related chemical databases
Tudor I. Oprea
Source Journal of Computer-Aided Molecular Design, 14: 251-264, 2000.
DOI 10.1023/A:1008130001697
Short Review This paper discusses the Lipinski Rule of Five on several example databases (MDDR, ACD) and for each of the rule graphis with dsitribution curves are given. Based on examples it shows why the Rule of Five can not be used for distinguishing between drugs and non-drugs. The study also shows, based on a Pareto analysis, which additional filters can be used to separate the drug from non-drug molecular space.

Title Combinatorial Enumeration in Chemistry
Source Chemical Modelling: Applications and Theory, Volume 3 (RSC)
Short Review This is a RSC review for the years of 2001-2003 raising the question "How many different molecules are possible?". It is not a complete review (350 citations), however gives a deeper insight into combinatorial chemistry and especially the mathematics behind the calculation of different isomers and stereoisomers.

Title Bad results from good data
Martin Badertscher, Erno Pretsch
Source TRAC-TRENDS IN ANALYTICAL CHEMISTRY 25 (11): 1131-1138 DEC 2006
DOI doi:10.1016/j.trac.2006.09.003
Short Review This article should be a chapter in every analytical textbook or any book about chemonmetrics. The daily abuse of "linear regression" with R^2=0.9999999 without mentioning errors for x- and y- axis; the pitiless axing of ranges where data points do not fit; the merciless linear extrapolations - all this should end soon. Will it end soon? I doubt it.

Today is a cool and nice day. Why? Ask yourself!