Molecule Processing Toolkit

Overview
Fuse
Clean2D
Replace add-in
Split add-in
Links
About me

Fuse

This article describes a way to automate step 4 shown in the overview, specifically when the transformations described in the last sentence are sufficient, and an application called fuse that implements it. The rest of this section is dedicated to this question. The next section describes an approach used to find if the resulting molecule has inner intersections.

Analysis shows that from chemical perspective the following transformations (conformations) of fragments with R-atoms keep the layout valid.

  1. Flip the fragment, i.e. use its mirror reflection, taking into account the stereochemical aspects of constituent atoms.
    flipped fragment
  2. If the atom adjacent to the R-atom has only 2 adjacent atoms, then there are 2 orientations of the R-bond relative to its adjacent bond at 120° to each other.
    120 degrees conformation
  3. Otherwise R-bond has just one orientation, which the application chooses to be along the bisector of the largest angle formed by the remaining adjacent bonds as shown in the figure in section Advanced features.

Thus step 4 of the fusion process can be performed by searching through all combinations of transformations of the fragments until a combination with no intersections is found.

Inner intersections

Part of step 4 of the fusion process involves checking that the fragments do not intersect once joined into the final molecule. The approach taken in this application is to represent the molecular graph as a vector image and rasterize it. Each bond is represented as a rectangle with dimensions 1×4. Each fragment is drawn on a canvas with different color. The color of intersection is taken to be the logical disjunction of the colors of the fragments. The task of figuring out if there is an intersection is to count the number of cells with a color that is different from the color of the fragments.

The actual implementation is a bit more complicated than described above. The thing is that the bond connecting the fragments must be treated differently so that its intersection with adjacent bonds does not add to the inner intersection of the fragments. Thus there are 3 different colors used for each fragment and the bond connecting them, and only intersections of the proper bonds of fragment are counted.
inner intersections

The figure above shows the first fragment rasterized in green color, the second one - in blue color, the bond connecting them is in the yellow color, and the red box shows that there is an intersection between fragments.

Synopsis

fuse <sdf_in1> <bond_type> <sdf_in2> <sdf_out> [/p1] [/p2] [/g:<no>] [/r:<no>]

Here

For instance, the following command
fuse ket.sdf 1 amn.sdf prod.sdf
fuses each molecule in turn from file ket.sdf with each molecule in file amn.sdf. The result is saved in file prod.sdf. Molecules are fused with single bond. The format of input must be MDL's V2000. The output is also in that format.

Each molecule in the input files must contain exactly one R-atom. The program deletes these atoms and creates a bond between the atoms adjacent to them. Bond type is determined by the corresponding command line option. Each record in the output SDF file contains 2 fields in addition to the resulting molecule: IDNUMBER and error_measure. IDNUMBER is an integer counter.

error_measure is an integer value that indicates how cleanly the molecules were fused: 0 means that all is ok, whereas a nonzero value (-1, 1, 2, etc.) indicates that the program failed to correctly align the molecules. A positive value means that the bond connecting the molecules was lengthened in order to avoid intersections. The greater the number, the longer the bond. A value of -1 means that the program could not align the molecules without intersections so the new bond has normal length, but the molecule has intersections.

One would typically search for records with non-zero error_measure value and see how the source molecules could be edited to minimize the intersections during fusion. Most often this means flipping tail parts of the molecule so that they do not occupy the neighborhood of the R-atom.

Installation

The application was originally written in C++ under Windows and was recently ported to Linux. Installation is pretty straightforward: all you need to do is check out the latest code and build the fuse program. To obtain the source code issue the following command in the command line
svn checkout https://molproc.svn.sourceforge.net/svnroot/molproc/fuse
See here for more details.

Here is how you build the program
make fuse

The source code comes with unit tests that cover essential functionality of the class library. The tests use Boost unit testing framework, so make sure you have libboost_unit_test_framework in your library path before compiling and running the tests. In order to run the tests use the following command
make test

In order to clean the files generated by compilers and linkers including the executable file use the following command
make clean

Advanced options

In some cases you need to preserve the initial orientation of the R-bond in the source molecule. For instance, in case of necessity of keeping cis/trans topology unchanged. In other cases the R-atom is connected to a concave molecule fragment. In such cases the application may choose erroneous configuration.
concave fragment

Command line arguments /p1 and /p2 instruct the application to preserve the orientation of the R-bond of the corresponding source molecule.
fuse ket.sdf 2 amn.sdf prod.sdf /p2

In this example the orientation of R-bond in the second molecule will be preserved.

Fine tuning

The application rasterizes the vector representation of the molecules in order to find intersections. By default average bond takes up 4 sequential square cells when drawn horizontally or vertically. This value can be changed with the /g command line argument, e.g. /g:5 will make the average bond size equal to 5 cells. The width of the bond is set by the /r command line argument, e.g. /r:2. The parameter specifies the diameter of a cylinder tube around the bond. By default it's equal to 1. Intersection of molecule fragments is determined by the intersection of tubes of their bonds. The ratio of tube diameter to average bond length equal 4 was found to be satisfactory in the majority of cases.