Free datasets for QSAR and QSPR modelling

Open and freely available datasets are important to develop, test and validate new calculational methods. Without such datasets innovation in the past was seriously hindered because such datasets had to be collected from the printed literature or were only available in commercial databases with stringent copyright issues. For further discussions see Blue Obelisk. Some QSAR journals require submissions of raw data sets.

  • Free datasets from QSAR world [LINK]
  • Free datasets from [LINK]

Download isomer structures

Free isomer structures are important for metabolomics, QSAR research and chemistry in general. Structures can be downloaded as 1D, 2D or 3D represantations. Mostly SMILES and SDF files are provided.

  • PubChem FTP [LINK] and specifications [LINK] or PUG
  • Chemspider a fast growing open DB with numerous APIs [LINK]
  • NCI datasets NCI99, NCI2000 as SMI, SDF [LINK] with head collector [LINK] from the CACTUS group (NCI/CADD)
  • Public Database collection [LCM CIS] at CADD
  • CCCBDB - Computational Chemistry Comparison and Benchmark DataBase [LINK]
  • PDB Ligand Expo - Small molecules as SDF, CIF, PDBML, mmCIF, SMILES from the PDB database [LINK]

Database Collector pages

Database collectors are compilation of multiple databases sorted according to different fields and approaches.

  • UNI Jena DB collector [LINK]
  • Thirty-Two Free Chemistry Databases [LINK] by Rich Apodaca (depth-first)
  • Comprensive list at Indiana EDU [LINK]

3D Structures and X-ray crystallography (small molecules)

3D structures can be used to develop and validate structural conformer software. Most of the structures are not open accessible, allthough this is changing right now. Services like Crystaleye and Reciprocal Net provide free (open) access to such structures.

  • CrystalEye - covers CIF and PDF structures from most/all journals
  • Reciprocal Net - covers a selection of CIF and PDB structures
  • MERCK and MMFF94 validation set on CCL FTP