SMARTS Substructure searches

User 12562e1acd

18-04-2016 20:14:19

In Excel,  I’m using =JCSubstructureMatchCount(J$2,$B31)
to look for specific substructures in SMILES structures.  


I have two specific questions. 


 


I have used a number of SMARTS like


















CH
(aromatic)    



[cH1]



-C
(aromatic)    



[cH0]



N
(aromatic)    



n



Where the small letter c or n should mean carbon or nitrogen
in an aromatic context,  but the JCSubstructureMatchCount is finding them
in any context.  


Secondly, I would like to find, for example, a primary
alchohol, but not in a carboxylic acid.  ([Cv4][OH1]) finds primary
alcohols in any context.


 I would think that
  ([Cv4][OH1])!([CX3](=O)[OX2H1])  would work. But it doesn’t. 
Can you suggest a solution?

ChemAxon abe887c64e

20-04-2016 09:10:40

Hi James,


 


Regarding your second question, as a first idea we recommend to apply the following SMARTS expression:


[$([CX4]O)]

The following pages might also help you to create the appropriate SMARTS:


http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html (see the Recursive SMARTS section)


http://www.daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html


 


Best regards,


Krisztina

ChemAxon a3dda216df

20-04-2016 10:22:24

Hi James,


The issue is, that [n] is recognized as a simple N is SMILES and SMARTS. Please, use [n;a] or [#7;a] instead, see in attached example file.


You can check if your SMARTS is correct in MarvinSketch, if you draw the structure and select Edit/Source/View: SMARTS, SMARTS.


Let me know, if this helps.


Regards,


Anna

User 12562e1acd

20-04-2016 15:04:37










aforro wrote:

Hi James,


The issue is, that [n] is recognized as a simple N is SMILES and SMARTS. Please, use [n;a] or [#7;a] instead, see in attached example file.


You can check if your SMARTS is correct in MarvinSketch, if you draw the structure and select Edit/Source/View: SMARTS, SMARTS.


Let me know, if this helps.


Regards,


Anna



Hi Anna, 


Thanks! 


This absolutely solved the problem!  I have referred to the Daylight documentation and I never saw the "a" functionality.  Is this a standard SMARTS nomenclature?

User 12562e1acd

20-04-2016 15:07:58

Thanks, Krisztina


That solved the problem. 


Can you explain what the function of $ is in this context?

User 12562e1acd

20-04-2016 15:20:32

Anna and Krisztina, 


As follow up questions. 


1. Similar to the alcohol but not carboxyllic acid, how do I specific and amine but not amide? I used [NX3;H2], but this also hits amides. 


2. How can I specify a primary secondary and tertiary alcohol eg CX3-OH, CX2H-OH, CXH2-OH?


Thanks


Alex 

ChemAxon abe887c64e

21-04-2016 12:52:59

Hi Alex,


I think the answers are found on the pages I linked in my previous post  -  I could not answer better.


On that page, please look for:


Recursive SMARTS


amine, not amide


If you need more detailed support we can assist within the frame of a consultancy project.


Best regards,


Krisztina