}

MOL or SDF file format V2000

Created:

Introduction

In this cheat sheet we are going to describe the MOL format of the popular version V2000. the .mol or .sdf fileformat is used to store the connection table which contains information describing the structure of a molecule. This file could describe molecules, molecular fragments, substructures , etc.

Quick overview

We will explain each section with the following example. Please SCROLL DOWN to see the atom line description!.

6 5 0 0 1 0 3 V2000
 -0.6622 0.5342 0.0000 C 0 0 2 0 0 0
 0.6622 -0.3000 0.0000 C 0 0 0 0 0 0
 -0.7207 2.0817 0.0000 C 1 0 0 0 0 0
 -1.8622 -0.3695 0.0000 N 0 3 0 0 0 0
 0.6220 -1.8037 0.0000 O 0 0 0 0 0 0
 1.9464 0.4244 0.0000 O 0 5 0 0 0 0
 1 2 1 0 0 0
 1 3 1 1 0 0
 1 4 1 0 0 0
 2 5 2 0 0 0
 2 6 1 0 0 0
M CHG 2 4 1 6 -1
M ISO 1 3 13
M END

The counts line

Specifies the number, in order of appereance: * A: Atoms (6 in the example). * B: Bonds (5). * L: Number of atom lists. * F: Not used/ obsolete. * C: Chiral flag: 0= not chiral, 1= chiral. * S: Number of stext entries * M: Number of lines of additional properties. * V: File version.

The format is:

A B L F C S M V

The Atom Block

After the counts line we are going to have N lines (6 in the example) with atom information. The format of each atom line in the atom block is:

X, Y, X, ATOM SYMBOL, MASS DIFFERENCE, CHARGE, ATOM STEREO PARITY, HYDROGEN COUNT + 1, STEREO CARE BOX, VALENCE.
  • ATOM SYMBOL: Entry in periodic table or L for atom list.
  • MASS DIFFERENCE: -3, -2, -1, 0, 1, 2, 3, 4.
  • CHARGE: 0 = uncharged or value other than these, 1 = +3, 2 = +2, 3 = +1, 4 = doublet radical, 5 = -1, 6 = -2, 7 = -3.
  • ATOM STEREO PARITY (chirality): 0 = not stereo, 1 = odd, 2 = even, 3 = either or unmarked stereo center
  • HYDROGEN COUNT + 1: 1 = H0, 2 = H1, 3 = H2, 4 = H3, 5 = H4.
  • STEREO CARE BOX: 0 = ignore stereo configuration of this double bond atom, 1 = stereo configuration of double bond atom must match.
  • VALENCE: 0 = no marking (default) (1 to 14) = (1 to 14) 15 = zero valence.

For example the first line of the atoms block is:

 -0.6622 0.5342 0.0000 C 0 0 2 0 0 0

Which means that a Carbon Atom is in the position (-0.6622 0.5342 0.0000) and which a charge of +2.

Bond Block

The Bond Block is made up of bond lines, one line per bond, with the following format:

1 2 t s x r c

Where:

  • 1: First atom number.
  • 2: Second atom number.
  • t: Bond type: 1 = Single, 2 = Double, 3 = Triple, 4 = Aromatic, 5 = Single or Double, 6 = Single or Aromatic, 7 = Double or Aromatic, 8 = Any.
  • s: Bond stereo: Single bonds: 0 = not stereo, 1 = Up, 4 = Either, 6 = Down, Double bonds: 0 = Use x-, y-, z-coords from atom block to determine cis or trans, 3 = Cis or trans (either) double bond.
  • x: Not used.
  • r: Bond topology: 0 = Either, 1 = Ring, 2 = Chain.
  • c: Reacting center status: 0 = unmarked, 1 = a center, -1 = not a center, Additional: 2 = no change, 4 = bond made/broken, 8 = bond order changes 12 = 4+8 (both made/broken and changes); 5 = (4 + 1), 9 = (8 + 1), and 13 = (12 + 1) are also possible.

Properties Block

The Properties Block is made up of mmm lines of additional properties, where mmm is the number in the counts line described above.