SMARTS match bug

User 036058eabf

17-03-2008 19:48:27

Hi,


I think I've found a bug in pattern matching using SMARTS.


With certain molecules, the pattern "[A;R0]" matches atoms that are in a ring.





This is an example molecule that generates the error:


[H]N1C(=O)C(=C([H])c2c([H])c(Cl)c([H])c([H])c12)N1C(=O)c2c([H])c([H])c([H])c([H])c2C1=O


(This SMILES code is generated thru a Molecule.toFormat("smiles") call)





I used this pattern:


[A,a][A;R0]


It should match all couples of atoms in which one atom is not in a ring; however, it matches couples with both atoms in rings.





The Daylight SMARTS matching tool ( http://www.daylight.com/daycgi_tutorials/depictmatch.cgi ) gives different results: there is only one match, C-Cl, which is missed in JChem searching.


Using the pattern "[A,a][A;!R]" is a possible workaround. Both patterns give the same matches using the Daylight tool.





I've attached an image with highlighted, in blue, the matches and a short program to generate that image.


Thank you!


Simone

ChemAxon a9ded07333

18-03-2008 15:49:32

Hi Simone,





Thanks for the report, I will analyse your example and get back to you soon.





Tamas

User 036058eabf

19-03-2008 16:13:07

Another one:


if as target molecule you use "CCC" and as query "[CH3]" or "[CH2]", it does not match anything. However, if you change the query to "[C;H3]" or "[C;H2]" it works.


You can try this with the same Java file I attached before, changing the molecule to "CCC" and the query to one of these.





Am I just using logical operators in a wrong way?


Thank you


Simone

ChemAxon a9ded07333

19-03-2008 17:04:30

Quote:
Am I just using logical operators in a wrong way?
No, you use correctly these SMARTS expressions.


Your additional example confirms my suspicion that hydrogen handling may be the cause of the problem.


E.g. if you remove explicit H-s from your first target molecule you get correct result: try to search "[A,a][A;R0]" against "N1C(=O)C(=Cc2cc(Cl)ccc12)N1C(=O)c2ccccc2C1=O" .





Tamas

ChemAxon a9ded07333

20-03-2008 12:46:48

This bug occurred due to an inconsistency in index-handling of hydrogens.


The bugfix will be available in JChem 5.0.2.1, planned to release by tomorrow.





Tamas

ChemAxon a9ded07333

20-03-2008 14:10:04

Regarding the second bug: it seems to have the same origin but it is caused by another phenomenon.


In your code


Code:
Molecule query = MolImporter.importMol("[CH3]");



is used to import the query molecule.


[CH3] and [C;H3] have the same meaning when you look them as SMARTS expressions, but MolImporter.importMol(String) tries to recognise the string in the simplest format, and since [CH3] can be interpreted as a SMILES expression, that will be SMILES.


During SMILES import hydrogens are removed so we lose information on hydrogen count.





I recommend using MolImporter.importMol(String, String, String) instead, whenever you know the exact format of your import string:


Code:
Molecule query = MolImporter.importMol("[CH3]", "smarts", null);






Tamas

ChemAxon a9ded07333

21-03-2008 12:50:04

JChem Base 5.0.2.1 has been released yesterday and is downloadable from our download page.





Also, I would like to refine my previous statment about removed hydrogens: during SMILES import the hydrogens are not really removed but become implicit and the search process doesn't use implicit H-count.





Tamas