jc_compare question

User cf4264f752

24-12-2011 18:49:12

Hi all


I have here 5 pairs of structures which matched with duplicate search type and didn't match with full fragment search type. In other words query


 


select jc_compare( structure1, structure2, 't:ff charge:i undefinedRAtom:u') 

from chem_structs1 c, chem_structs2 cc

where c.id = 111 and cc.id = 112;

returns 0 


and 


 


select jc_compare( structure1, structure2, 't:d charge:i undefinedRAtom:u') 

from chem_structs1 c, chem_structs2 cc

where c.id = 111 and cc.id = 112;

returns 1


Firstly, I cannot understand how this could happen cause I thought that every duplicate should match using full fragment search.


molfiles of those 5 pairs attached

User cf4264f752

24-12-2011 18:50:37

other mols

ChemAxon aa7c50abf8

26-12-2011 17:42:23

Hi Alena,


Please, could you specify which structures are in chem_structs1 and which in chem_structs2? Better yet: which are the two structures selected in the where condition (where c.id = 111 and cc.id = 112)?


Please, could you also post the output of the select jchem_core_pkg.getenvironment from dual command?


Thanks


Peter

User cf4264f752

26-12-2011 19:45:13

Hi, Peter











pkovacs wrote:

Please, could you specify which structures are in chem_structs1 and which in chem_structs2? Better yet: which are the two structures selected in the where condition (where c.id = 111 and cc.id = 112)?






























c.id cc.id
Albendazole.mol  Albendazole1.mol 
Choline Alfoscerate.mol   Choline Alfoscerate1.mol   
Clemizole hydrochloride.mol Clemizole hydrochloride1.mol  
Tetraphenylporphyrin.mol   Tetraphenylporphyrin1.mol 
 Transkarbam 12.mol  Transkarbam 12_1.mol  

Please, could you also post the output of the select jchem_core_pkg.getenvironment from dualcommand?

Oracle environment: 


Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production


PL/SQL Release 11.2.0.3.0 - Production


CORE 11.2.0.3.0 Production


TNS for Linux: Version 11.2.0.3.0 - Production


NLSRTL Version 11.2.0.3.0 - Production


 


JChem owner: JCHEM_DEVEL


 


JChem Server environment: 


Java VM vendor: Sun Microsystems Inc.


Java version: 1.6.0_27


Java VM version: 20.2-b06


JChem version: 5.7.0


JChem Index version: 5070000


JDBC driver version: 11.1.0.7.0-Production

ChemAxon aa7c50abf8

27-12-2011 17:28:49

Thank you, Alena! We will get back to you on this ASAP.


Peter

ChemAxon a3d59b832c

28-12-2011 12:11:47

Dear Alena,


This behaviour is intentional, even if it looks odd. I'll explain step by step.


 


1. What is the difference between these pairs of molecules?


Please note that all of these pairs of molecules differ in their arrangement of hydrogens.


Some of these different arrangements are related to charged / uncharged versions of the same functional group, others are not.


Please also note that in all cases explicit hydrogens are also involved in the different configuration.


 


2. Why are these molecule pairs matched by the above duplicate search expression?


It
is because the "ignore charge" option. This setting in case of
duplicate search also switches off hydrogen matching. (It forces implicitHMatching:i
in case of duplicate search.) The reason for that is so that the
different number of hydrogens should not prevent matching different
charge status of the functional groups.- As duplicate search would check
the number of hydrogens on each atoms by default.


See the documentation for some examples: 


http://www.chemaxon.com/jchem/doc/user/query_searchoptions.html class="attribute-value">#implicitHmatching_examples


This is the case in some of your molecule pairs, for example:









 Transkarbam 12.mol  Transkarbam 12_1.mol  

However, some other pairs do not have different
charge. These pairs are matched as a side effect of hydrogen matching
forced switch off:









Albendazole.mol  Albendazole1.mol 

 









Tetraphenylporphyrin.mol   Tetraphenylporphyrin1.mol 

 


3. Why does not full fragment match these structures?


In case of full fragment search, there are two relevant differences in behaviour compared to duplicate search:


3.1. The implicitHMatching option is not set to "ignore" in this case. (As full fragment would not check for hydrogen number equality, see next point.)


3.2.
As in case of substructure search, the explicit hydrogens are treated
specially at full fragment search as well. Here they mean a constraint:
at that position there must be hydrogens in the target molecules. See
examples here: http://www.chemaxon.com/jchem/doc/user/query_features.html#explH


 


OK, now I think we know all pieces of the puzzle. What to do now?




 


-
We will think about how to make the interaction of the ignore charge
option and hydrogen matching more logical in the future, but that is not
helping you in the short term. I can see the following options right
now, depending on how you would like to match these strutures:


-
In case of full fragment search, you can play around with the setting of
the implicitHMatching search option. This is the most complete
documentation that I could find about it:


--implicitHMatching:d/y/n/i   Describes the matching of implicit and 
explicit hydrogens.
Values:
d Default: its value is y in almost every cases.
There is only one exception: its value is n in case of duplicate
search against a query table in a database.
y Implicit and explicit hydrogens can match. In case of duplicate
search the sum of implicit and explicit hydrogens of the query atom
and the sum on the matched target atom must equal.
n Explicit hydrogens matches only on another explicit hydrogen. The
number of implicit hydrogens (of the matching atoms) are not checked.
i Implicit and explicit hydrogens are ignored. Hydrogens are excluded
from the matching.
For a more detailed explanation see: Search options apidoc.


(It from the jcsearch command line help: http://www.chemaxon.com/jchem/doc/user/Jcsearch.html )


However, in case of duplicate search and charge ignore option, you will not be able to change this option. (As it is now forced to "ignore" value.)


 


-
As an alternative to the "ignore charge" search option, you may create a
modified form of the input molecules by Standardizer. The dehydrogenize
and neutralize actions would be useful. 


See more details here:


Neutralize action: http://www.chemaxon.com/jchem/doc/user/Standardizer_files/examples/Examples.html#25


Removing explicit H action: http://www.chemaxon.com/jchem/doc/user/Standardizer_files/examples/Examples.html#05


 


This method would work in case of both duplicate and in full fragment search types.


 


Best regards,


Szabolcs

User cf4264f752

20-01-2012 14:17:41

Dear Szabolcs, 


thanks for such detailed answer. I guess, I should use implicitHMatching:i in my case to make fragment search produce results comparable to duplicate search results.

ChemAxon a3d59b832c

20-01-2012 14:59:47

Yes, that is correct.


 


Szabolcs