NAMS Without Tears

This document will explain how [NAMS][1] can be used as a stand alone tool for 2D computational chemical analysis.

The setup in R

After NAMS has been installed with atom distance matrix files in the same directory of NAMS, we must set the path to where the NAMS binaries are.

Also, to use NAMS and makenamsdb conveniently, the following two helper functions have been created in R. It is only required libraries rsvg and png to process and view the molecular structures produced by option in makenamsdb

suppressWarnings(library(rsvg))
suppressWarnings(library(png))

call_nams<-function(parms, file_in="", file_db="", file_out="", NAMS_ROOT="NAMS/") {
  cur_dir<-getwd()
  setwd(NAMS_ROOT)
  fname_in<-paste0(cur_dir, "/", file_in)
  fname_db<-paste0(cur_dir, "/", file_db)
  fname_out<-paste0(cur_dir, "/", file_out)
  if(nchar(file_in)>0) p_in<-paste("-in", fname_in) else p_in<-""
  if(nchar(file_db)>0) p_db<-paste("-db", fname_db) else p_db<-""
  if(nchar(file_out)>0) p_out<-paste(">", fname_out) else p_out<-""
  cmd<-paste("nams", p_in, p_db, parms, p_out)
  out<-shell(cmd, wait = T, intern=T)
  setwd(cur_dir)
  return(out)
}

call_makenamsdb<-function(file_in, file_out, parms="", NAMS_ROOT="NAMS/", makepngs=T) {
  cur_dir<-getwd()
  fname_in<-paste0(cur_dir, "/", file_in)
  fname_out<-paste0(cur_dir, "/", file_out)
  setwd(NAMS_ROOT)
  cmd<-paste("python makenamsdb.py", fname_in, fname_out, parms)
  out<-shell(cmd, wait = T, intern=T)
  if(parms=="-writeimages") {
    dirs<-unlist(strsplit(list.dirs(cur_dir), cur_dir))
    if( !("/IMAGES" %in% dirs)) dir.create(paste0(cur_dir,"/IMAGES"))
    svgs<-unlist(strsplit(list.files(pattern="*.svg"), cur_dir))
    if(length(svgs)>0) {
      if(makepngs==T){
        for(fs in svgs) {
          bitmap <-   rsvg(fs, width = 640)
          fpng<-paste0(strsplit(fs, ".svg")[[1]], ".png")
          writePNG(bitmap, fpng, dpi = 144)
        }
      }
      file.rename(from=svgs, to=paste0(cur_dir, "/IMAGES/", svgs))
      if(makepngs==T){
        pngs<-unlist(strsplit(list.files(pattern="*.png"), cur_dir))
        file.rename(from=pngs, to=paste0(cur_dir, "/IMAGES/", pngs))
      }
    }
  }
  setwd(cur_dir)
  return(out)
}

With these helper functions NAMS can then be called from R easily. As a first example let’s call nams, just getting the Command line help instructions with ‘-help’, and print out the result, as it was called from the command line

out<-call_nams("-help")
head(out, 10)

##  [1] "NAMS - Non-contiguous atom matching structural similarity function"           
##  [2] "nams version - 0.98.20160704B"                                                
##  [3] "This program will compare one or several molecules against others producing"  
##  [4] "a similarity score for each pair of molecules"                                
##  [5] "Basic use:"                                                                   
##  [6] "nams -mode [mode] -in [mol_filename] -db [mod db filename]"                   
##  [7] "NOTE: -mode, and -db are required options"                                    
##  [8] "Details"                                                                      
##  [9] "    -in [filename] filename is the name of a file in the NAMS format with one"
## [10] "       or more molecules   "

Encoding SMILES files for usage with NAMS

The makenamsdb python utility must also be installed that is able to convert SMILES files into a NAMS ready file for processing. In this tutorial, it is assumed to be installed in the same directory as NAMS

The usage of makenamsdb just requires a SMILES file and the name of an output file (typically with the .nams extensision). It will read all the molecules within and create a file that is ready to be processed by NAMS. The number of molecules in the SMILES file is arbitrary, but take notice that the NAMS files can become quite large, so it may be advisable to split very large files into smaller chunks, that can be processed separately.

These examples will also create the images (SVG and PNG) associated with the only molecule in the file. We will create two separate NAMS files, one for aspirin, and the other for ibuprofen.

Please note that the molecules should be identified by a single unique integer ID. These are the codes used by NAMS and the identifiers of the molecules in the images produced

out<-call_makenamsdb("aspirin.smi", "aspirin.nams", parms="-writeimages")
out<-call_makenamsdb("ibuprofen.smi", "ibuprofen.nams", parms="-writeimages")

In the following example we will use one single file with the set of 20 amino acids stored in a .smi file. First we examine the data we are going to convert, verifying it is a SMI file

aa.smi<-read.table("aminoacids.smi")
head(aa.smi)

##                           V1 V2
## 1           O=C(O)[C@@H](N)C  1
## 2           SC[C@H](N)C(O)=O  2
## 3      OC(C[C@H](N)C(O)=O)=O  3
## 4    O=C(O)[C@@H](N)CCC(O)=O  4
## 5 N[C@H](C(O)=O)CC1=CC=CC=C1  5
## 6                   O=C(O)CN  6

Now we can convert it. The output produced is a control list generated by makenamsdb, that prints out the molecule id, the molecular weight and whether there was any problem in the structure conversion.

out=call_makenamsdb( "aminoacids.smi", "aminoacids.nams", parms="-writeimages")
head(out)

## [1] "1 16 89.09318 ... Done"  "2 16 121.15818 ... Done"
## [3] "3 22 133.10268 ... Done" "4 23 147.12926 ... Done"
## [5] "5 23 165.18914 ... Done" "6 8 75.0666 ... Done"

The simplest usage of NAMS - checking similarity between one molecule and another

For this we will use NAMS mode 1, which compares a molecule to all the molecules in the database. We will select as base molecule aspirin and the database the nams file for ibuprofen. The JALMS option for NAMS ensures that we will make the full output, including the atom matching matrix between bot molecules.

out<-call_nams("-mode 1 -opts JALMS", file_in="aspirin.nams", file_db="ibuprofen.nams")
for(lin in out) cat(lin, "\n")

## Atom matching matrix 
##          1      2      3      4      5      6      7      8      9     10     11     12     13     14     15  
##   1   1.23   1.24   1.22   0.98   0.62   0.64   1.05   0.64   0.62   1.70   1.44   0.62   0.63   1.34   1.23  
##   2   1.29   1.42   1.39   1.18   0.86   0.83   1.21   0.83   0.86   1.59   3.42   0.60   0.61   1.50   1.29  
##   3   0.48   0.44   0.47   0.43   0.36   0.37   0.45   0.37   0.36   0.57   0.58   1.74   1.61   0.54   0.48  
##   4   0.63   0.65   1.14   1.11   1.11   1.18   1.15   1.18   1.11   1.23   0.91   1.67   1.79   0.88   0.63  
##   5   0.94   1.37   2.00   3.03   3.10   3.14   3.18   3.14   3.10   1.99   1.89   0.63   0.63   1.27   0.94  
##   6   0.84   1.37   1.84   2.86   2.91   2.99   2.88   2.99   2.91   2.02   1.09   0.58   0.56   1.07   0.84  
##   7   0.84   1.37   1.83   2.84   2.85   2.84   2.91   2.84   2.85   1.86   1.04   0.37   0.37   1.10   0.84  
##   8   0.84   1.36   1.83   2.84   2.86   2.95   2.85   2.95   2.86   1.83   1.02   0.37   0.37   1.06   0.84  
##   9   0.91   1.37   1.98   2.97   3.03   3.07   3.16   3.07   3.03   1.98   1.14   0.45   0.45   1.28   0.91  
##  10   1.16   2.11   2.17   3.77   2.90   2.95   3.87   2.95   2.90   3.07   1.54   0.56   0.57   1.44   1.16  
##  11   1.23   1.33   1.86   1.46   1.41   1.42   1.56   1.42   1.41   2.07   3.40   0.49   0.49   1.53   1.23  
##  12   0.48   0.68   0.59   0.53   0.48   0.53   0.58   0.53   0.48   0.78   0.79   1.75   1.63   0.75   0.48  
##  13   0.48   0.68   0.59   0.53   0.48   0.53   0.59   0.53   0.48   0.78   0.79   1.63   1.75   0.76   0.48  
## Atom matching and scores 
##   1  14 ->  1.34 
##   2  11 ->  3.42 
##   3  12 ->  1.74 
##   4   3 ->  1.14 
##   5   5 ->  3.10 
##   6   6 ->  2.99 
##   7   9 ->  2.85 
##   8   8 ->  2.95 
##   9   7 ->  3.16 
##  10   4 ->  3.77 
##  11  10 ->  2.07 
##  12   2 ->  0.68 
##  13  13 ->  1.75 
##  
## Self Similarity:   25 ->  41.879 
## Self Similarity:  251 ->  48.758 
## Similarity: (25 251) ->  30.947 
## Jaccard Score: (25 251) -> 0.51846

The atoms refered above in the NAMS output can be analysed by graphically comparing the chemical structures. The atom IDs appear in red over the chemical structure. In this case, we can see that the IDs for aspirin and Ibuprofen are respectively 25 and 251 [these are actually the ChEMBL IDs for both molecules], so we can get their images and plot them side by side:

  aspirin_id=25
  ibuprofen_id=251
  asp_bm<-readPNG(paste0("IMAGES/", aspirin_id, ".png"), T)
  ibu_bm<-readPNG(paste0("IMAGES/", ibuprofen_id, ".png"), T)
  par(mfrow=c(1,2))
  plot(1:2, type='n', xaxt='n', yaxt='n', xlab="",ylab="", main="Aspirin")
  rasterImage(asp_bm, 1., 1., 2, 2)
  plot(1:2, type='n', xaxt='n', yaxt='n', xlab="",ylab="", main="Ibuprofen")
  rasterImage(ibu_bm, 1., 1., 2, 2)

  par(mfrow=c(1,1))

For larger scale analysis, the NAMS output can be simplified by providing only the simple similarity scores in a table format, where we can verify that aspirin has a 52% similarity to ibuprofen

out<-call_nams("-mode 1 -opts JLS", file_in="aspirin.nams", file_db="ibuprofen.nams")
for(lin in out) cat(lin, "\n")

##     molid1      molid2     ss1     ss2   sim12   Sscore 
##         25       251  41.879  48.758  30.947  0.5185

The same procedure can be executed for comparing a molecule to all molecules in a file. This way we may wish to save the results in a separate file for easier browsing and recovery with R as a data table. As an example we will compare ibuprofen with the 20 amino acids processed beforehand.

out<-call_nams("-mode 1 -opts JLS", file_in="ibuprofen.nams", file_db="aminoacids.nams", file_out="ibu_aa.txt")
ibu.aa<-read.table("ibu_aa.txt", header=T)

http://www.sigmaaldrich.com/life-science/metabolomics/learning-center/amino-acid-reference-chart.html

Now let’s get the amino acids code stored in the aminoacids.txt file to identify which are the most similar aminoacids to ibuprofen

aa.txt<-read.table("aminoacids.txt", header=T)
head(aa.txt)

##   aa_id aa_code                     smiles HP7 HP2
## 1     1     Ala           O=C(O)[C@@H](N)C  41  47
## 2     2     Cys           SC[C@H](N)C(O)=O  49  52
## 3     3     Asp      OC(C[C@H](N)C(O)=O)=O -55 -18
## 4     4     Glu    O=C(O)[C@@H](N)CCC(O)=O -31   8
## 5     5     Phe N[C@H](C(O)=O)CC1=CC=CC=C1 100  92
## 6     6     Gly                   O=C(O)CN   0   0

ord<-order(ibu.aa$Sscore, decreasing=T)

aa_id<-ibu.aa$molid2[ord]
aa_code<-aa.txt$aa_code[aa_id]
ibu.aa.df1<-data.frame(aa_id=aa_id, aa_code, Sscore=ibu.aa$Sscore[ord])
head(ibu.aa.df1)

##   aa_id aa_code Sscore
## 1     5     Phe 0.6862
## 2    20     Tyr 0.6679
## 3    19     Trp 0.5087
## 4     8     Ile 0.4029
## 5    10     Leu 0.3963
## 6     9     Lys 0.3655

It can be seen that obviously Phenylalanin and Tyrosin (in that order) are the amino acids that resemble the most ibuprofen. We can verify it by analising their actual 2D structures, for the 3 best candidates

  ibuprofen_id<- ibu.aa$molid1[1]
  par(mfrow=c(2,2))
  bitmap<-readPNG(paste0("IMAGES/", ibuprofen_id, ".png"), T)
  plot(1:2, type='n', xaxt='n', yaxt='n', xlab="",ylab="", main="Ibuprofen Structure")
  rasterImage(bitmap, 1., 1., 2, 2)

  for(i in 1:3){
    mol_id<-ibu.aa.df1$aa_id[i]
    bitmap<-readPNG(paste0("IMAGES/", mol_id, ".png"), T)
    plot(1:2, type='n', xaxt='n', yaxt='n', xlab="",ylab="", 
         main=paste(ibu.aa.df1$aa_code[i], "Similarity=", Sscore=ibu.aa.df1$Sscore[i]))
    rasterImage(bitmap, 1., 1., 2, 2)
  }

  par(mfrow=c(1,1))

Changing the NAMS parameters

For molecular comparisons, NAMS can use several atom distance matrices that work in providing the similarities between atoms with the NAMS heuristic. By default, NAMS uses a distance matrix gathered from the literature but is able to use several distance matrices that can be set via the ADM parameter. These are

ADM = 0 - A very strict matrix that considers one atom with 100 similarity only with itself and totally different from all others
ADM = 1 - A not so strict matrix, that in fact assumes all atoms are always at the most 90% different from all others. This matrix appeared to circumvent certain numeric problems that may occur if matrix 0 is used
ADM = 2 - The default. A similarity matrix based on data drom the literature
ADM = 3 - A different matrix constructed empyrically from known information about the atoms
ADM = 4 - A very loose matrix, that in fact considers that each atom is identical to all other atoms. This option is useful for verifying strict structural similarity

We can run now the same above procedure, changing the ADM parameter and verifying the differences

out<-call_nams("-mode 1 -opts JLS -ADM 4", file_in="ibuprofen.nams", file_db="aminoacids.nams", file_out="ibu_aa.4.txt")
ibu.aa4<-read.table("ibu_aa.4.txt", header=T)
ord<-order(ibu.aa4$Sscore, decreasing=T)

aa_id<-ibu.aa4$molid2[ord]
aa_code<-aa.txt$aa_code[aa_id]
ibu.aa.df2<-data.frame(aa_id=aa_id, aa_code, Sscore=ibu.aa4$Sscore[ord])
head(ibu.aa.df2)

##   aa_id aa_code Sscore
## 1    20     Tyr 0.7991
## 2     5     Phe 0.7285
## 3     7     His 0.7174
## 4    19     Trp 0.5989
## 5    15     Arg 0.5074
## 6     4     Glu 0.4561

Now, Tyrosine appears much more similar to ibuprofen, which is to be expected given their structures. The 3 best results and scores can be again graphically verified

  ibuprofen_id<- ibu.aa4$molid1[1]
  par(mfrow=c(2,2))
  bitmap<-readPNG(paste0("IMAGES/", ibuprofen_id, ".png"), T)
  plot(1:2, type='n', xaxt='n', yaxt='n', xlab="",ylab="", main="Ibuprofen Structure")
  rasterImage(bitmap, 1., 1., 2, 2)
  for(i in 1:3){
    mol_id<-ibu.aa.df2$aa_id[i]
    bitmap<-readPNG(paste0("IMAGES/", mol_id, ".png"), T)
    plot(1:2, type='n', xaxt='n', yaxt='n', xlab="",ylab="", 
         main=paste(ibu.aa.df2$aa_code[i], "Similarity=", Sscore=ibu.aa.df2$Sscore[i]))
    rasterImage(bitmap, 1., 1., 2, 2)
  }

  par(mfrow=c(1,1))

Another very relevant parameter is the Alpha parameter. This factor accounts for how much NAMS should penalise an atom alignment in a positional mismatch. The higher the value, the lowest the impact of a distant atom in a molecule to the one being compared, thus in effect strengthening impact of the localfunctional groups. A lower value, has an opposing effect, emphasising a more stringent structural comparison. The default value is 2.0, but the full range of real numbers (within reasonable boundaries) can be used. We will repeat the above example with Alpha = 3.0 and will return to the initial ADM=2 matrix

out<-call_nams("-mode 1 -opts JLS -alpha 1.0 -ADM 3", file_in="ibuprofen.nams", file_db="aminoacids.nams", file_out="ibu_aa.a3.txt")
ibu.aa.a3<-read.table("ibu_aa.a3.txt", header=T)
ord<-order(ibu.aa.a3$Sscore, decreasing=T)

aa_id<-ibu.aa.a3$molid2[ord]
aa_code<-aa.txt$aa_code[aa_id]
ibu.aa.df3<-data.frame(aa_id=aa_id, aa_code, Sscore=ibu.aa.a3$Sscore[ord])
head(ibu.aa.df3)

##   aa_id aa_code Sscore
## 1     5     Phe 0.6456
## 2    20     Tyr 0.6407
## 3    19     Trp 0.4778
## 4     8     Ile 0.3467
## 5    10     Leu 0.3259
## 6     9     Lys 0.3212

In this situation Phenylalanine appears again on the top, but with a higher score, emphasising that the structural constraints are weaker. Nonetheless the overal order is very similar, strengthening the idea that, overal, the method is very robust.

[1]: http://pubs.acs.org/doi/abs/10.1021/ci400324u Teixeira, AL. Falcao AO. 2013 Noncontiguous atom matching structural similarity function. Journal of Chemical Information and Modeling. 2013, 53 (10), pp 2511-2524 DOI: 10.1021/ci400324u.

NAMS Without Tears

Andre Falcao

July 7, 2016

The setup in R

Encoding SMILES files for usage with NAMS

The simplest usage of NAMS - checking similarity between one molecule and another

Changing the NAMS parameters