NAMS - Non-Contiguous Atom Matching Structural Similarity


What is NAMS? NAMS is an algorithm that aims at quantifying molecular similarities using a very robust and efficient graph matching algorithm. It compares favourably to other structural similarity approaches like molecular fingerprints. It is very flexible and can be used a generic tool for molecular analysis and has been used in actual research projects for in silico drug discovery. NAMS has been published in 2013

NAMS has a number of advantages over molecular fingerprints. In the first place it is totally unambiguous. If two molecules have a similarty of 100% then they are the same molecule; NAMS can discriminate between stereo-isomers and cis-trans isomers. Also it does not rely on any arbitrary hashing function that can cause difficult to understand mismatches. The output of NAMS is clear and can be understood, as it is based on molecular atom matching, so any user can verify which atom in one molecule best matches the other atoms in the second molecule.

As a drawback the NAMS algorithm is not as fast as molecular fingerprints and comparing large molecules can take more than linear time. Nevertheless its speed is typically very fast for screening even moderately sized databases. With modest hardware and without any parallel code optimization, using the C version over the ChEMBL database, each comparison takes on average 12 ms, but is very dependent on the size of the molecules being compared. On databases with a majority of small molecules (MW < 200 Da), the processing takes 1-3 ms per comparisos, while on very large molecules (MW > 2000 Da) it can take much longer.

Since its humble beginings as a proof of concept tool developed in Python, NAMS has come a long way. In the first place it was rewritten in C and it became about 100x faster. Also a flexible command line interface to the basic algorithm was made making it very easy to integrate in other tools. NAMS was further separated from all eventual chemical libraries necessary for "understanding" a molecule focusing on the graph matching algorithm, and now, each data set can be transformed into a "NAMS ready file" by using a separate Python utility that uses OpenBabel and PyBel.

This page aims at giving the resources for using NAMS by a wider community. Firstly by providing the source code of the NAMS tool in C and the dataset processing tool in Python. The Windows binaries of the tool are also provided as well as a make file for compiling the code in any other platform.

Here we will also be providing some tutorials for using NAMS by itself or within other platforms. This will be an ongoing effort but the first tutorials for using NAMS within R are already available.