3D conformer generation is important for chemistry, drug research and QSAR and QSPR development. There are free and open-source implementations (Frog and SMI23D and Cyndi and TINKER and Balloon and DG-AMMOS) and commercial implementations (CORINA, MARVIN, OMEGA, MOE, SYBYL, CONFORT, CONFLEX). Some of the commercial applications allow free academic use (ChemAxon Marvin and OEChem OMEGA and VeraChem VConf). The FROG code was donated to the OpenBabel project but is not included yet.

For 2D to 3D conversions fast force field methods are used among them the UFF and MMFF94 (Merck, validation suite on CCL) and others. Speed matters in case of conformer generation. Among the fastest one is CORINA (Gasteiger/BMBF/Uni Erlangen) The CORINA manual tells an interesting story of development, tricks and pitfalls and has an extensive benchmark section covering accuracy and conversion success rates. [PDF].

A common start would be the generation of molecular formulae, the isomer generation with MOLGEN/CDK, conversion from 2D into 3D data, calculation of tautomers, resonant forms, stereoisomers and finally calculation of lowest energy conformers for each structure using force field methods (MM2, MM3, MMFF94) and subsequent accurate energy minimization using semi-empirical methods (AM1, PM3, PM6 or RM1) or DTF or ab-initio methods.


Example: 2D to 3D conversion of the hexahelicene PAH (CID: 98863) which has a high strain energy bending the whole molecule. Conversion was performed with smi23d. Visualization with ChemAxon Mview and Marvin Space (Java Webstart).


SMI23D
More information on the original smi23d site (Indiana University, Kevin Gilbert, Rajarshi Guha). There was a controversy regarding patents on CCL.net which is resolved now. The code can be optimized by GNU compiler options. A five-fold speed increase was obtained compared to the original option. The aggressive compiler option "-O3" may lead to errors. The SSE3 streaming commands are inlined and may be fine. A tutorial how to compile and use can be found on depth-first.

CFLAGS = -g -m32 -O3 -ffast-math -msse3 -mfpmath=sse

Under Windows smi23d can be compiled using the CygWIN environment. You can use the downloadable ZIP file which contains examples and the EXE executables and an example file. The program uses a batch file to convert PAH.SMI into the 3D version of the molecules (opt.sdf). There are still some open issues with different SMILES codes and aromaticity handling. Depending on what kind of SMILES or SDF codes are supplied, different results may be obtained (June 2008). Conversion time is around one second for the 14 molecules.

smi2sdf -o rough.sdf -p mmxconst.prm pah.smi
mengine -dxi -p mmff94.prm -c mmxconst.prm -o opt.sdf rough.sdf

You can later view the results with MVIEW or the free Instant-JChem by loading the SDF file into Instant-JChem. the options -dxi also generate MMFF94 force field energy, dipole moment, point groups, xlogp, moment of inertia, and thermodynamics dE, dH, S, dG, CP (vibrational energy, vibrational enthalpy, vibrational entropy, vibrational free energy, vibrational heat capacity). Zero Point Energy, Translational, Rotational,Vibrational, Mixing and Total energy can be enabled. The following point groups are calculated: C1, C2, C2h, C2v, C3, C3h, C3v, C4, C4h, C4v, C5, C5h, C5v, C6, C6h, C6v, Ci, Cs, D2, D2 , D2d, D2h, D3, D3d, D3h, D4, D4d, D4h, D5, D5d, D5h, D6, D6d, D6h, O, S10, S12, S4, S6, S8, T, T , Td, Th.

Download smi23d version for Windows [smi23d-win.zip]

Micro Benchmarks for 14 PAH molecules (PAH.smi)

Intel Xeon Core 2 Duo 2.66 GHz running Apple OS.X Leopard
time ./mengine -p mmff94.prm -c mmxconst.prm -o opt.sdf rough.sdf
real 0m0.727s
user 0m0.711s
sys 0m0.012s

AMD Opteron 2.8 GHz running WIN XP 32 (A Intel Core 2 Duo - 2.0 GHz has the same integer speed)
time ./mengine -p mmff94.prm -c mmxconst.prm -o opt.sdf rough.sdf
real 0m0.967s
user 0m0.671s
sys 0m0.202s



Note: Smi23D has problems with nitro groups. They will be transformed into incorrect charged states: Example: O=N(=O)CCC1=CC=CC=C1 transforms into O=[N+](=O)CCC1=CC=CC=C1
Solution: Do not use the allowed pentavalent nitro form, instead use the charged species form with nitrogen(+) and oxygen(-) as in SMILES [O-][N+](=O)CCC1=CC=CC=C1


ChemAxon cxcalc

ChemAxon tools like Marvin or cxcalc generate conformers using a fragment fusing method [PPT]. Energies calculated are based on the UFF (universal forcefield) which creates energies not to be compared wirth other force field methods. A more detailed explanation can be found here [PDF] and [PPT]. The code to The program to superimpose atoms of two molecules by quaternion fit JQuatFit.java can be found here [LINK].

There are three modi operandi. Modus one is starting Marvin View (mview.sh or view.bat). Marvin View is one of "the" essential tools handling small structure files. There is no other tool which is that versatile for handling small structure files and doing quick calculations structure-by-structure.




The second mode is to use the cxcalc batch file (under MacOS, WINDOWS or LUNIX).
Example: cxcalc conformers pah.smi -O 0 -m 1 -f sdf >pah-3d-optimized.sdf
This will create a SDF file from a SMILES file with the low energy conformers in an SDF file. This approach is comparable to SMI23D with the advantage that for most of the molecules (99.9%) conformers are generated, meanwhile SMI23D has problems with certain molecules. In the current development state I would always go for ChemAxon first and secondly proof or validate with SMI23D.

The third option is the programmatic access via JAVA API functions (the ChemAxon tools are all written in JAVA and can also be accessed via Microsoft .NET (DOT NET). The code can be found at the ChemAxon forum [LINK] and requires a license (commercial or free academic) for the JChem API.



3D molecule database search (molecular superposition) can be performed with tools like EShape3d (EigenSpectrum Shape Fingerprints), USR (Ultrafast shape recognition), 3D Shape Signatures or ROSS (Random Orthogonal Shape Sections) or OBFit any other molecular descriptor or hybrids thereof. The problem is speed.


Literature:

  • Ultrafast shape recognition for similarity search in molecular databases. [DOI]
  • A novel hybrid ultrafast shape descriptor method for use in virtual screening [DOI]