Isotopes and possible bug

User f698d0529d

19-10-2005 13:27:28

Hi





If I have these three benzenes in a regular Oracle Jchem indexed table





SMILES


c1ccccc1


c1cc[12cH]cc1


c1cc[13cH]cc1





and I want to formulate SQL to find compounds which contain (or exactly match) benzene, but are not C13, is there a way to do that? I cannot figure it out. In other words, the result of the query should be the first two smiles, but not the last.





While trying to figure this out, I have spotted a potential bug, or at least an inconsistency, as well.





select t.SMILES from temp_mcr t where jc_compare(t.SMILES, 'c1c[!13cH]ccc1', 't:s') = 1; - all three returned





select t.SMILES from temp_mcr t where jc_compare(t.SMILES, 'c1c[!12cH]ccc1', 't:s') = 1; - only the C13 isotope returned.





Thanks


Mark

ChemAxon a3d59b832c

20-10-2005 08:36:57

Custom24 wrote:
Hi





If I have these three benzenes in a regular Oracle Jchem indexed table





SMILES


c1ccccc1


c1cc[12cH]cc1


c1cc[13cH]cc1





and I want to formulate SQL to find compounds which contain (or exactly match) benzene, but are not C13, is there a way to do that? I cannot figure it out. In other words, the result of the query should be the first two smiles, but not the last.
Mark,


I suggest the following query. I am showing it using the jcsearch command line tool ( http://www.chemaxon.com/jchem/doc/user/Jcsearch.html ):





Code:
$ jcsearch -q '[!13c]1[!13c][!13c][!13c][!13c][!13c]1' c1ccccc1 'c1cc[12cH]cc1' 'c1cc[13cH]cc1'


c1ccccc1


c1cc[12cH]cc1






This output means that the first two smiles matched but the third one did not, as you asked.


(Please note that I removed the 'H' from the atom expressions to allow substitution if benzene is a substructure.)





The --allHits option of jcsearch shows you the mapping from the query to the target atoms, so it is easy to figure out why the other query matches the last structure:





Code:
$ jcsearch --allHits -q 'c1c[!13cH]ccc1' 'c1cc[13cH]cc1'


    Query has 1 match:


        Match 1:[    3,   2,   1,   6,   5,   4 ]


c1cc[13cH]cc1






So, in this case [!13c] matched one of the nonisotopic 'c' atoms.
Custom24 wrote:



While trying to figure this out, I have spotted a potential bug, or at least an inconsistency, as well.





select t.SMILES from temp_mcr t where jc_compare(t.SMILES, 'c1c[!13cH]ccc1', 't:s') = 1; - all three returned





select t.SMILES from temp_mcr t where jc_compare(t.SMILES, 'c1c[!12cH]ccc1', 't:s') = 1; - only the C13 isotope returned.
You are right, there is an inconsistency here. At certain complex smarts atom expressions (like [!13C]), the isotope query may match to nonisotopic atoms if the query referred to the natural (most frequent) isotope. In case of carbon, this is isotope 12. For simple queries, it only matches explicitly specified isotopes. Examples:





Simple smarts behaviour:


Code:
$ jcsearch -q '[12C]' C [12CH4] [13CH4]


[12CH4]








Complex smarts behaviour:


Code:
$ jcsearch -q '[12C,c]' C [12CH4] [13CH4]


C


[12CH4]








We will correct this inconsistency. My bet is that we should choose the behaviour of the simple smarts. This seems to be how depictmatch works:


http://www.daylight.com/daycgi_tutorials/depictmatch.cgi





What do you think?





Best regards,





Szabolcs

User f698d0529d

20-10-2005 10:12:19

Thanks


I am not sure I understand, and I don't want to confuse the issue.
Quote:



At certain complex smarts atom expressions (like [!13C]), the isotope query may match to nonisotopic atoms if the query referred to the natural (most frequent) isotope. In case of carbon, this is isotope 12. For simple queries, it only matches explicitly specified isotopes.


But as I showed,





select t.SMILES from temp_mcr t where jc_compare(t.SMILES, 'c1c[!13cH]ccc1', 't:s') = 1; - all three returned





and this should not have matched the C13 one. I don't really understand the atom mapping issue, but I would have thought that this is incorrect.





Also, I do not want to introduce the jcsearch tool as a factor. I am only concerned with the jc_compare function. As long as this is consistent, I will be happy. But I don't think it is as simple as you suggest. There is also the exactIsotopeMatching flag. For example,





Code:



select t.SMILES from temp_mcr t where jc_compare(t.SMILES, 'c1c[13c,12c]ccc1', 't:s exactIsotopeMatching:n') = 1; -all three returned





select t.SMILES from temp_mcr t where jc_compare(t.SMILES, 'c1c[13c,12c]ccc1', 't:s exactIsotopeMatching:y') = 1; -only c1ccccc1 returned








I would suggest that this is another inconsistency.

ChemAxon a3d59b832c

20-10-2005 12:17:23

Custom24 wrote:
But as I showed,





select t.SMILES from temp_mcr t where jc_compare(t.SMILES, 'c1c[!13cH]ccc1', 't:s') = 1; - all three returned





and this should not have matched the C13 one. I don't really understand the atom mapping issue, but I would have thought that this is incorrect.
Actually, this behaviour is correct. The query 'c1c[!13cH]ccc1' literally means a benzene substructure where one of the atoms is not isotope 13 and has exactly one hydrogen. Due to the symmetry of the benzene ring, all three structures conform to this. (Please note that you only put constraints on one atom of the six, and the structure searching algorithm ignores the atom indices during the matching.)
Custom24 wrote:
Also, I do not want to introduce the jcsearch tool as a factor. I am only concerned with the jc_compare function. As long as this is consistent, I will be happy. But I don't think it is as simple as you suggest. There is also the exactIsotopeMatching flag. For example,





Code:



select t.SMILES from temp_mcr t where jc_compare(t.SMILES, 'c1c[13c,12c]ccc1', 't:s exactIsotopeMatching:n') = 1; -all three returned





select t.SMILES from temp_mcr t where jc_compare(t.SMILES, 'c1c[13c,12c]ccc1', 't:s exactIsotopeMatching:y') = 1; -only c1ccccc1 returned








I would suggest that this is another inconsistency.
You are right, the two other structures should have been returned for the second query. This inconsistency also seems to be related to the complex smarts case. We will fix it.





Best regards,





Szabolcs

ChemAxon a3d59b832c

05-11-2005 15:31:22

JChem 3.1.2 is out and this contains the fix for the above inconsistencies.





Best regards,





Szabolcs