SMARTS match behaviour for ambiguous aromaticity

User c2ffbfa8f8

30-09-2010 12:45:44

JChem version used: 5.3.4


Target: C1Cc2ccccc2=NC1


Query: [NX2;R]=[C;R]


Example code:


		final Molecule target = MolImporter.importMol("C1CN=c2ccccc2C1","smiles");
final Molecule query= MolImporter.importMol("[NX2;R]=[C;R]", "smarts");
final MolSearch ms = new MolSearch();
final MolSearchOptions molSearchOptions = new MolSearchOptions();

molSearchOptions.setVagueBondLevel(MolSearchOptions.VAGUE_BOND_OFF);
ms.setSearchOptions(molSearchOptions);
ms.setTarget(target);
ms.setQuery(query);
System.out.println("HIT: " + ms.getMatchCount());

This matches but I'd expected it to NOT match because the carbon in the target is aromatic.  In this case should I  change the query to [NX2;R]=[!c;C;R]  or is there a MolSearch option I can use?


 


thanks muchly

ChemAxon a3d59b832c

01-10-2010 10:54:11

Hi Derek,


 


It is SMARTS import that does not put up the aliphatic property to the C atom, from valence considerations.


(It avoids to put on the aliphatic property when aromaticity can be excluded from the existing bonds and the total number of valence.)


Your target molecule basically has a valence error. (Pentavalent C - note that it is not possible to dearomatize it.)


 


So there is no option for MolSearch to avoid this structure, as the query input already lacks this information.


But the query you proposed should work fine.


 


Best regards,


Szabolcs

User c2ffbfa8f8

01-10-2010 13:51:36

Thanks Szabolcs, that makes sense.

User c2ffbfa8f8

20-10-2010 15:44:07

Hi Szabolcs,


 


I have another structure (this time with correct valence) that I would expect to not match (using the same test):


 


C1CN=c2ccccc2=C1


 


Have tried aromatising the target with general aromaticity but this didn't help.  Have I missed something?


 


thanks muchly


 


Derek

User 73531e86ff

20-10-2010 15:59:27

Here is another example which is causing an issue in a different part of our code.  We think it is related to the same general aromatisation as above.


SMILES: C=c1ccc(cc1)=C1CCCCO1


SMARTS: [OH0X2$(*-C=C)]


Similarly, we'd expect the smarts NOT to hit the structure above because the atom i've highlighted red in the SMARTS is aliphatic and the atom it is matched against in the SMILES is aromatic (with general aromatisation).

ChemAxon 25dcd765a3

21-10-2010 10:47:42

Hi Derek,


In this SMILES


C1CN=c2ccccc2=C1


there are two Carbon atoms (with indexes 4 and 9) which have valence 5.


These Carbon atoms have 2 aromatic bonds with three electrons and a double bond with two electrons which is all together 5 electrons. The Carbon atom cannot have 5 valence electrons. This is a misleading representation of this molecule.


I think this is the source of the problem.


However, there exist aromatic representation of the molecules which allows such chemically strange (lets say unaccepted by a chemist) representation. To avoid this situation, the SMARTS importer will mark these atoms with aliphatic flag.


It will most probably ready in Marvin 5.4.1



User 7c177bab3b

22-10-2010 09:08:12

Please could you clarify, do you mean "mark with aromatic flag"?


Taking the Kekule structure through standardize generates the smiles with "aromatic" carbons


> echo 'C1CN=C2C=CC=CC2=C1' | standardize -c "aromatize"
C1CN=c2ccccc2=C1


and we would expect these to be treated as such in SMARTS matching, i.e. N=[C;R] should not match.


 

ChemAxon a3d59b832c

22-10-2010 09:53:53

Hi all,


Please could you clarify, do you mean "mark with aromatic flag"?

It means that SMARTS import will put on the appropriate query property on the atom. (A) - for aliphatic in this case.


(See: http://www.chemaxon.com/jchem/doc/user/query_features.html#atprop )


 


I attach below two pictures for the current and the future representations of SMARTS N=[C;R].


I confirm that with this change, query N=[C;R] will not match C1CN=c2ccccc2=C1, and indeed all the other query target pairs in this topic will be fixed.


 


Furthermore, a workaround exists that works in the current version as well: Add the recursive SMARTS $(A) to the atoms in question, for example:


[N$(A)]=[C$(A);R]


[OH0X2$(*-C=[C;$(A) ])]


[NX2;R;$(A) ]=[C;R;$(A) ]




 


(It may not be necessary on the N, I just included there as well for safety.)


You can also try these here: http://www.chemaxon.com/jchem/examples/sss/index.jsp


 


Best regards,


Szabolcs

ChemAxon a3d59b832c

24-01-2011 10:02:41

We have fixed the above issue. None of the above SMARTS searches will match in the coming JChem 5.4.1 version.


5.4.1 will be released still in this month.


 


Best regards,


Szabolcs

User 73531e86ff

24-01-2011 10:30:31

Many thanks for the status update.  I will install 5.4.1 and re-run all our unit tests when it is available.


Cheers,


Shane