smarts matching in JChem v2.3

30-07-2004 07:08:15

Hi, I'm trying out the prerelease of JChem v2.3pre2 and run into the following unexpected behavior:


The rule "[CX3]" matches with every single carbon in the benzene molecule. Is this a feature?

ChemAxon a3d59b832c

30-07-2004 07:17:13

It should not. In benzene, all atoms are aromatic, so they only match to [cX3].





See the output with jcsearch (no output means no match):
Quote:
$ jcsearch -q '[CX3]' 'c1ccccc1'





$ jcsearch -q '[cX3]' 'c1ccccc1'


c1ccccc1





$ jcsearch -q '[CX3]' 'C1=CC=CC=C1'





$ jcsearch -q '[cX3]' 'C1=CC=CC=C1'


C1=CC=CC=C1
Did you use search from API? If yes, the most probable cause is that you


forgot to aromatize the target molecule. You can do it by calling


Code:
MoleculeGraph.aromatize(true).



The "X3" part of the expression is correct, because all atoms of benzene


have 3 connections: two in the ring and one to a H. The H may be


explicit or implicit, both are calculated for query property X.





On the other hand, query property D will only consider explicit connections, so atoms of c1ccccc1 will all match [cD2] but not [cD3].

30-07-2004 07:21:01

Hi Szabolcs,





yes I was using the API. I thought that because my input smarts explicitly indicates that all carbons in the molecule are aromatic (c's), then I do not need to aromatize.





Does this mean that I should always aromatize the molecule doesn't matter how it's imported?

ChemAxon a3d59b832c

30-07-2004 07:24:33

In general I suggest to aromatize both the query and target molecules


before searching. You can only omit this step when you are 100% sure


that the appropriate rings are in aromatic form, for example in case of


a standardized database.





In your case it seems that the target molecule was not aromatized.

30-07-2004 07:31:36

Hi Szabolcs,


using the MolSearch api, I'm still getting the same results. Attached is the test program. What am I doing wrong?


Code:
import chemaxon.sss.search.*;


import chemaxon.struc.Molecule;


import chemaxon.formats.*;





public class Test {


    public static void main (String[] argv) throws Exception {


        MolSearch ms = new MolSearch ();


        Molecule target = MolImporter.importMol("c1ccccc1");


        target.aromatize(true);


        ms.setTarget(target);


        Molecule query = MolImporter.importMol("[CX3]");


        query.aromatize(true);


        ms.setQuery(query);


        int hits[][] = ms.findAll();


        System.out.println(hits.length+ " hit(s)");


        for (int i = 0; i < hits.length; ++i) {


            System.out.print("    "+(i+1)+":");


            for (int j = 0; j < hits[i].length; ++j) {


                System.out.print(" "+(hits[i][j]+1));


            }


            System.out.println();


        }


    }


}

ChemAxon a3d59b832c

30-07-2004 07:48:59

The problem is at the query import.





For historical reasons our "smiles" format still allows some query features. This is why your query smarts: "[CX3]" was recognized as smiles, and the aliphatic query flag was not assigned. This recognition will change in future. (Probably in the next release after version 2.3)





You have to read the smarts string in query mode:


Code:
MolImporter.setQueryMode(true)






However, it is simpler to use the MolHandler class, so you don't need to bother with streams:





Code:
import chemaxon.util.MolHandler;


...





MolHandler mh = new MolHandler("[CX3]", true);


Molecule query = mh.getMolecule();



The attached version of your test program works fine now:
Quote:
$ java Test


No hits
Code:
import chemaxon.sss.search.*;


import chemaxon.struc.Molecule;


import chemaxon.formats.*;


import chemaxon.util.MolHandler;





public class Test {


    public static void main (String[] argv) throws Exception {


        MolSearch ms = new MolSearch ();


        Molecule target = MolImporter.importMol("c1ccccc1");


        target.aromatize(true);


        ms.setTarget(target);


        // Nonquery mode: wrong


        // Molecule query = MolImporter.importMol("[CX3]");


        // Query mode: OK:


        MolHandler mh = new MolHandler("[CX3]", true);


        Molecule query = mh.getMolecule();


        query.aromatize(true);


        ms.setQuery(query);


        int hits[][] = ms.findAll();


        if(null == hits) {


            System.out.println("No hits");


        } else {


            System.out.println(hits.length+ " hit(s)");


            for (int i = 0; i < hits.length; ++i) {


                System.out.print("    "+(i+1)+":");


                for (int j = 0; j < hits[i].length; ++j) {


                    System.out.print(" "+(hits[i][j]+1));


                }


                System.out.println();


            }


        }


    }


}