SMARTS description: Atom not in ring

User aac7e94ee2

26-05-2006 15:52:11

Hi,


is it possible to define atoms in a SMARTS representation that are not part of a ring?


Regards,


Alex

ChemAxon a3d59b832c

26-05-2006 20:46:24

Hi Alex,





Yes, there are more than one possibilities. You may use R0 or !R SMARTS symbols for that. For example:





[C;R0] - Carbon, not part of a ring


[!R] - any atom except hydrogen, not part of a ring





In Marvin the R0 property can be added through the More window, using buttons R+ or R-. Furthermore, SMARTS atom expressions containing !R can be sketched by typing into the text box of the More window and pressing the SMARTS button.





Best regards,


Szabolcs

User aac7e94ee2

26-05-2006 20:53:04

Szabolcs wrote:
Hi Alex,


OK then, how can I define a Keto group which is not part of a ring system?


Regards,


Alex


Yes, there are more than one possibilities. You may use R0 or !R SMARTS symbols for that. For example:





[C;R0] - Carbon, not part of a ring


[!R] - any atom except hydrogen, not part of a ring





In Marvin the R0 property can be added through the More window, using buttons R+ or R-. Furthermore, SMARTS atom expressions containing !R can be sketched by typing into the text box of the More window and pressing the SMARTS button.





Best regards,


Szabolcs

ChemAxon a3d59b832c

26-05-2006 21:04:27

alex_boecker wrote:



OK then, how can I define a Keto group which is not part of a ring system?


Just add the property to each of the atoms, like:





[C;R0][C;R0]([C;R0])=[O;R0]





Szabolcs

User aac7e94ee2

31-05-2006 17:43:40

Hi,





I followed your advice about describing a double not, which is not allowed to be in a ring as:


*[C;R0]=[C;R0]*


I am trying to count the occurecne of C=C double bonds in a molecule using the findFirst and findNext commands. However if one double bond is present it is counted two times. Do you know the reason?





Kind regards,





Alex

ChemAxon a3d59b832c

31-05-2006 21:05:16

alex_boecker wrote:
*[C;R0]=[C;R0]*


I am trying to count the occurecne of C=C double bonds in a molecule using the findFirst and findNext commands. However if one double bond is present it is counted two times. Do you know the reason?
One reason may be if the orderSensitive flag is set to true.


http://www.chemaxon.com/jchem/doc/api/chemaxon/sss/search/MolSearch.html#setOrderSensitiveSearch(boolean)





If this option is true, a symmetrical query like yours produces multiple hits for the same atoms in different order.





You can use the jcsearch program to check the effect of this option:





Code:
$ jcsearch --allHits --orderSensitive -q '[C;R0]=[C;R0]' CC=CC


    Query has 2 matches:


        Match 1:[    2,   3 ]


        Match 2:[    3,   2 ]


CC=CC





$ jcsearch --allHits -q '[C;R0]=[C;R0]' CC=CC


    Query has 1 match:


        Match 1:[    2,   3 ]


CC=CC








jcsearch help page: http://www.chemaxon.com/jchem/doc/user/Jcsearch.html





Jcsearch also has a count feature by using the -t:c command-line option:





Code:
$ jcsearch -t:c --allHits -q '[C;R0]=[C;R0]' CC=CC


1






Best regards,


Szabolcs

User aac7e94ee2

06-06-2006 20:03:49

No this didn't work. The orderSensitivity is set to false by default. However I tried both options. In both cases the double bond is recognized as two.


So assume the following molecule represented by the SMILE:


CC=C(C)C


and the following SMARTS definition:


*[C;R0]=[C;R0]*


If I am using the following java code:


Molecule smartsTarget = new Molecule();


smartsTarget = MolImporter.importMol(lCC=C(C)C);


smartsTarget.aromatize();


MolHandler smartsfilter = new MolHandler();


smartsfilter.setMolecule(*[C;R0]=[C;R0]*);


smartsfilter.aromatize();


MolSearch search = new MolSearch();


search.setQuery(smartsfilter.getMolecule());


search.setTarget(smartsTarget);


if(search.findFirst()!=null){


k++;


}


else return;


if(search.findNext()!=null){


k++;


}


else return k;


the double bond is counted twice.


Further assume the following SMILE within a molecule


[S2+]([O-])([O-])(N)N


And the SMARTS definition:


[#8-]([#8-])(*)N(*)*


Than again the sulfonic amid is counted as two.





Kind regards,





Alex
Szabolcs wrote:
alex_boecker wrote:
*[C;R0]=[C;R0]*


I am trying to count the occurecne of C=C double bonds in a molecule using the findFirst and findNext commands. However if one double bond is present it is counted two times. Do you know the reason?
One reason may be if the orderSensitive flag is set to true.


http://www.chemaxon.com/jchem/doc/api/chemaxon/sss/search/MolSearch.html#setOrderSensitiveSearch(boolean)





If this option is true, a symmetrical query like yours produces multiple hits for the same atoms in different order.





You can use the jcsearch program to check the effect of this option:





Code:
$ jcsearch --allHits --orderSensitive -q '[C;R0]=[C;R0]' CC=CC


    Query has 2 matches:


        Match 1:[    2,   3 ]


        Match 2:[    3,   2 ]


CC=CC





$ jcsearch --allHits -q '[C;R0]=[C;R0]' CC=CC


    Query has 1 match:


        Match 1:[    2,   3 ]


CC=CC








jcsearch help page: http://www.chemaxon.com/jchem/doc/user/Jcsearch.html





Jcsearch also has a count feature by using the -t:c command-line option:





Code:
$ jcsearch -t:c --allHits -q '[C;R0]=[C;R0]' CC=CC


1






Best regards,


Szabolcs

User f359e526a1

09-06-2006 07:51:55

Hello, we are a bit delayed due to the UGM, I am sure Szabolcs will answer ASAP.

ChemAxon a3d59b832c

09-06-2006 10:49:59

Hi Alex,





Sorry to answer late, we were very busy with our user conference.





Now I know what is the problem. It is the extra *-s (ANY atoms) on the two sides of the double bonds. (Earlier I thought those were not part of the SMARTS, but rather formatting characters from the forum engine. I am sorry for this misunderstanding.)





So your SMARTS "*[C;R0]=[C;R0]*" does not match the double bond, but the "dihedrals" around double bonds. Depending the target double bond's non-hydrogen ligands, you may get zero, one, two or even four hits per double bonds. Some examples:





Code:
$ jcsearch --allHits -q '*[C;R0]=[C;R0]*' 'CC(C)=C(C)C'


    Query has 4 matches:


        Match 1:[    1,   2,   4,   5 ]


        Match 2:[    1,   2,   4,   6 ]


        Match 3:[    3,   2,   4,   5 ]


        Match 4:[    3,   2,   4,   6 ]


CC(C)=C(C)C





$ jcsearch --allHits -q '*[C;R0]=[C;R0]*' 'CC=C(C)C'


    Query has 2 matches:


        Match 1:[    1,   2,   3,   4 ]


        Match 2:[    1,   2,   3,   5 ]


CC=C(C)C





$ jcsearch --allHits -q '*[C;R0]=[C;R0]*' 'CC=CC'


    Query has 1 match:


        Match 1:[    1,   2,   3,   4 ]


CC=CC





$ jcsearch --allHits -q '*[C;R0]=[C;R0]*' 'CC=C'


<no hits>








I recommend to simply use SMARTS [C;R0]=[C;R0] to match the double bond only.





For the sulfonamide group, I recommend to also remove the ANY atoms (*) from the SMARTS to get only the functional group:





[#8-]([#8-])N





The use of the jcsearch command-line program as above or this page may be helpful when experimenting with SMARTS or other queries:


http://www.chemaxon.com/jchem/examples/sss/index.jsp





You may also find helpful this page at Daylight, with many functional group SMARTS examples:


http://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html





Kind Regards,


Szabolcs