2013-08-14

Ring Detection - 1

In this two-part series, we look at how Ojus Chemistry Toolkit (OCT) currently implements ring detection. In this part, detection of rings and ring systems is described.

Preparation

N.B. OCT's molecule reader converts any hydrogen atoms that it encounters into implicit hydrogens, i.e., the list of atoms and bonds in an OCT molecule never contains a hydrogen atom. Similarly, the number of hydrogens attached to an atom is determined looking at the atom's valence and any charge specified for it.

N.B. The Frerejacque value of the molecule is checked to see if there are any cycles at all, before the following procedure is employed.

When we are interested in finding cycles (and only cycles), we should not look at open branches — those that contain only leaves. Accordingly, we remove them. In reality, all of this computation happens on temporary data structures, not the actual molecule itself.

Removing Terminal Chains

Remember terminal atoms from this old article? Quickly, a terminal atom is one which has a single neighbour. We remove all such atoms. However, removing a terminal atom could make its sole neighbour a terminal atom itself! We have to remove it, too. But, then, removing it … ad infinitum. Here is the pseudocode.

    var removing := true.

    while (removing) {
        removing := false.

        for each atom `a` {
            if `a` is a terminal {
                remove `a`.
                removing := true.
            }
        }
    }

Note that removing an atom entails removing all the bonds in which it participates, etc. Thus, by the time the outer loop exits, all terminal chains will have been removed.

How Many Rings?

One Ring or More?

At this point, if all the atoms have exactly two neighbours, there is only one ring in the molecule. Why? More than one ring implies that a second ring is either fused to the first one, or is independent of it, but connected through a linking chain of atoms. In either case, there should be at least one atom that is a junction — it should have at least three bonds.

Cycle Detection

Case of One Ring

In the case of the molecule having only one ring, we mark that ring as the sole ring. We also create a single ring system with that sole ring. And, we are done!

Case of Multiple Rings

Detection of two or more rings present in the molecule is (rather) evidently more complex than that of a single ring. Over the last decade, or so, I have written four or five distinct algorithms to detect multiple rings in a molecule. All of them employed a recursive depth-first approach. The recursive approach made certain parts of the algorithm easier and clearer, while complicating certain others.

This time, though, I chose to employ a breadth-first approach. The outline is as follows.

    var candidates := new list of paths.
    var path := new list of atoms.

    var a := the first non-junction atom in the molecule.
    if no such atom exists {
        a := the first atom in the molecule.
    }
    path := `path` with `a` appended.
    candidates := `candidates` with `path` appended.

    while (`candidates` is not empty) {
        var test-path := fetch the first path in `candidates`.
        try-path `test-path`.
    }

As you can see, I have a queue into which I inject all candidate paths. A candidate path is potentially incomplete, and may need to be extended by adding a neighbour of the last atom in it, should one such exist. But that is part of processing the path.

Process a Test Path

    var current := last atom in `test-path`.

    for each neighbour `nbr` of `current` {
        if `nbr` was visited already in `test-path` {
            var ring := path from previous occurrence of `nbr` to `current`.
            if `ring` is valid {
                record `ring`.
            }
            continue loop.
        }

        var new-path := `test-path` with `nbr` appended.
        candidates := `candidates` with `new-path` appended.
    }

Thus, we add each path extension to the queue of candidates, at each atom that we encounter. Now, given a candidate ring, how do we validate it? First, the easy case: if the candidate has only three atoms, then it is certainly a genuine ring. Good! But, what if it has more than three?

Preliminary Validation

    if size of `test-path` is 3 {
        answer `true`.
    }

    for each atom `a` in `test-path` {
        if more than 2 neighbours of `a` exist in `test-path` {
            
            answer `false`.
        }
    }

    
    answer `true`.

Ring System Detection

Once we have recorded all rings, we sort them by size. The next task is to partition this set of rings into equivalence classes of ring systems. Two given rings belong to the same ring system if they have at least one bond or at least one atom in common.

Partitioning

The actual task of partitioning the set of rings is quite straightforward.

    var ring-systems := new list.

    outer:
    for each recorded ring `r` {
        for each ring system `rs` {
            if `r` and `rs` share a bond or
               `r` and `rs` share an atom {
                mark `r` as belonging to `rs`.
                rs := `rs` with `r` appended.
                continue `outer`.
            }
        }

        var new-rs := new ring system.
        new-rs := `new-rs` with `r` appended.
        ring-systems := `ring-systems` with `new-rs` appended.
    }

Conclusion

We have, now, identified the rings in the molecule, and have partitioned the set into ring systems. However, this set could contain potentially spurious rings. Eliminating the false candidates is a surprisingly more difficult procedure. We shall look into it in the next post!

No comments: