Dear Alena,
This behaviour is intentional, even if it looks odd. I'll explain step by step.
1. What is the difference between these pairs of molecules?
Please note that all of these pairs of molecules differ in their arrangement of hydrogens.
Some of these different arrangements are related to charged / uncharged versions of the same functional group, others are not.
Please also note that in all cases explicit hydrogens are also involved in the different configuration.
2. Why are these molecule pairs matched by the above duplicate search expression?
It
is because the "ignore charge" option. This setting in case of
duplicate search also switches off hydrogen matching. (It forces implicitHMatching:i
in case of duplicate search.) The reason for that is so that the
different number of hydrogens should not prevent matching different
charge status of the functional groups.- As duplicate search would check
the number of hydrogens on each atoms by default.
See the documentation for some examples:
http://www.chemaxon.com/jchem/doc/user/query_searchoptions.html class="attribute-value">#implicitHmatching_examples
This is the case in some of your molecule pairs, for example:
However, some other pairs do not have different
charge. These pairs are matched as a side effect of hydrogen matching
forced switch off:
3. Why does not full fragment match these structures?
In case of full fragment search, there are two relevant differences in behaviour compared to duplicate search:
3.1. The implicitHMatching option is not set to "ignore" in this case. (As full fragment would not check for hydrogen number equality, see next point.)
3.2.
As in case of substructure search, the explicit hydrogens are treated
specially at full fragment search as well. Here they mean a constraint:
at that position there must be hydrogens in the target molecules. See
examples here: http://www.chemaxon.com/jchem/doc/user/query_features.html#explH
OK, now I think we know all pieces of the puzzle. What to do now?
-
We will think about how to make the interaction of the ignore charge
option and hydrogen matching more logical in the future, but that is not
helping you in the short term. I can see the following options right
now, depending on how you would like to match these strutures:
-
In case of full fragment search, you can play around with the setting of
the implicitHMatching search option. This is the most complete
documentation that I could find about it:
--implicitHMatching:d/y/n/i Describes the matching of implicit and
explicit hydrogens.
Values:
d Default: its value is y in almost every cases.
There is only one exception: its value is n in case of duplicate
search against a query table in a database.
y Implicit and explicit hydrogens can match. In case of duplicate
search the sum of implicit and explicit hydrogens of the query atom
and the sum on the matched target atom must equal.
n Explicit hydrogens matches only on another explicit hydrogen. The
number of implicit hydrogens (of the matching atoms) are not checked.
i Implicit and explicit hydrogens are ignored. Hydrogens are excluded
from the matching.
For a more detailed explanation see: Search options apidoc.
(It from the jcsearch command line help: http://www.chemaxon.com/jchem/doc/user/Jcsearch.html )
However, in case of duplicate search and charge ignore option, you will not be able to change this option. (As it is now forced to "ignore" value.)
-
As an alternative to the "ignore charge" search option, you may create a
modified form of the input molecules by Standardizer. The dehydrogenize
and neutralize actions would be useful.
See more details here:
Neutralize action: http://www.chemaxon.com/jchem/doc/user/Standardizer_files/examples/Examples.html#25
Removing explicit H action: http://www.chemaxon.com/jchem/doc/user/Standardizer_files/examples/Examples.html#05
This method would work in case of both duplicate and in full fragment search types.
Best regards,
Szabolcs