A molecular formula can be used to generate molecular isomers or constitutional isomers or structural isomers with the help of molecular isomer generators or structure generators. You can see some of the 217 C6H6 isomers in the following picture.

For generation of molecular formulae two kinds of generators exist. One is of deterministic nature (like in MOLGEN and in the CDK) and the other one is of stochastic nature (like in the CDK or Signature). Such molecular isomer generators can generate a huge number of isomers from very simple formulas. As an example the simple formula C12H12 already generates 23 million (23,862,255) possible isomers. The generation time on a modest modern PC (Opteron 2.8 GHz) is 45 seconds. Using more complex formulas (>500 Da) such deterministic generators render themselfs useless because of the time and space needed. However this was debated in a recent NMR review which used a deterministic approach and correct 2D-NMR assignments. They prooved that StrucEluc is capable of ranking correct isomer structure candidates up to 1200 Da.

For that reason, certain substructures or hybridization states can be entered or mass spectral information like fragmentation data can be included. Such an advanced structure generator is MOLGEN-MS. However the automation of such processes is complex and MOLGEN-MS is suffering from the complexity of mass spectral fragmentations and rearrangement reactions possible. Random or stochastic isomer generators may also help in the future.

Using molecular isomer generators is one way of (total) de-novo structure elucidation with mass spectrometry. De-novo structure elucidation is usually the domain of 2-D NMR. Currently there is no mass spectral software or approach available which would allow the calculation of the correct structure based on mass spectral data of small molecules only (< 2000 Da) (peptides not included because they can be sequenced with MS/MS data).

For more complex isomers (including heteroatoms N,O,P,S) other than linear and cyclic alkanes, alkenes and alkines no mathematical formula exist to calculate the number of structural isomers. One reason are aromatic doublettes, which have to be handled independently from graph theoretic models. For hydrocarbons itself, algorithms exist to count the number of linear (tree like) and cyclic hydrocarbons without generating them (A134818).

That means you always must generate the full set of isomers for a given formula and you must later filter and count these isomers. Most of the information you find in the literature is for such simple sequences, you may find help at the the ATT On-Line Encyclopedia of Integer Sequences.

Programs

  • MOLGEN - The fastest available structural isomer generator (Kerber, Gruener, Laue, Meringer @ Uni Bayreuth)
  • CDK - the open source implementation using deterministic and stochastic approaches
  • SMOG - a free implementation with multiple and comprehensive constraints mechanisms
  • Assemble - from upstream.ch (Badertscher, Bischofberger, Pretsch)
  • nauty - a graph theory and world's fastest isomorphism testing program
  • OMG - open source isomer generator based on nauty [PDF]

Literature

Data and Links

  • Number of structural isomers - An EXCEL file containg the number of all isomers for elements CHNO (<150u)
  • Program for calculation of number of cyclic and tree-like hydrocarbons by Vesa Linja-aho (per EMail request)
  • Some Stuff - interesting blog about signature and isomer generation by Gilleain Torrance
  • OSCSG - Open Source Chemical Structure Generator by Julio E. Peironcely (Faulon/Hankemeier collaboration)
  • Canonical Signatures by Jean-Loup Faulon (Fichera/Faulon/Carbonell)