JChem search bug

User 870ab5b546

13-11-2010 02:12:38

 


query  [H]C#C, target N.[H]C#CC


Setting setSearchType = substructure, setOrderSensitiveSearch = true, setStereoModel = local, setStereoSearch = true,  exactStereoMatching = false, doubleBondStereoMatchingMode = true, stereoMatchingModel = local, setChargeMatching = ignore, setRadicalMatching = ignore, setIsotopeMatching = ignore, setValenceMatching = false, setVagueBondLevel = 1.


JChem 5.3.3 findFirst() gives results of:


     [2, 1, -2147483640]

     [1, 2, -2147483640]

If I change setChargeMatching, setRadicalMatching, and setIsotopeMatching to exact, JChem 5.3.3 findFirst() gives:


    [1, 2, 3]


What's with the bizarre index when the charge, radical, and isotope matching is ignored?  Am I missing something obvious here?


This bug does not appear to be present in JChem 5.3.8pre.

 

ChemAxon a3d59b832c

14-11-2010 08:42:42

Hi Bob,


 


It is not a bug, but a feature. :)


Please see API documentation for MolSearch.findNext():


(Also referenced from findFirst()'s API doc regarding special return values.)


http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/search/MolSearch.html#findNext%28%29


Returns:
an array containing the atom indexes of the target atoms
that match the query atoms (in the order of the appropriate query atoms)
or null if there are no more hits.

Special atom indexes:



  • In case of explicit query H atoms matching to implied H atoms in
    the target
    , a negative number is returned. The absolute value of this
    number equals with the atom index of the heavy atom bearing the
    implicit hydrogen, or Integer.MIN_VALUE in case of 0 heavy atom index.

  • The same method is used for explicit LP (lone pair) atoms in
    the query. The hit contains the negated number of the target heavy
    atom with the matching lone pair, or Integer.MIN_VALUE
    for 0 index. SearchConstants.HIT_LP is set
    for isolated lone pairs (in which case there is no such target heavy atom).

  • Multicenter atoms (e.g. of multicenter coordinate bonds) are
    not returned, the match array always contains
    SearchConstants.HIT_MULTICENTER
    for these atoms.

  • For R-group queries, R-atom matches are not returned,
    the match array conatins SearchConstants.HIT_R
    for these atoms. If the "hitIncludesRNodes" parameter is NOT set then only the
    scaffold atoms are included in the match array. See
    MolSearchOptions.setHitIncludesRNodes(boolean).

  • Undefined R-atom handling depends on parameter "undefinedRAtom", see
    SearchOptions.setUndefinedRAtom(int).
    In case of undefined R-atom matching a group of atoms
    (see SearchOptions.isUndefinedRAtomMatchingGroup()),
    only one matching atom is set in the match array, or
    SearchConstants.HIT_R_EMPTY_MATCH denoting the empty group.
    To get the matching groups call findFirstGroup(), findNextGroup()
    or findAllGroups().

  • Unmapable atoms (e.g. polymer star atoms) are denoted by
    SearchConstants.HIT_UNMAPABLE in the match array.

  • Excluded atoms
    (see setTarget(chemaxon.struc.Molecule,int[])) will not
    appear in the match array at all (their appropriate indexes are
    left out).

  • When the query contains link nodes, the returned array may
    contain more indices than the query atoms. In this case the extra
    atom indices appear at the end and method
    getMatchingQuery() can be
    used to get the most specific matching form of the query.

  • All Superatom S-groups are treated as expanded during the
    search, so atom indices are returned accordingly.

  • (I have bolded and underlined the relevant parts for your input.)


     


    Best regards,


    Szabolcs

    User 870ab5b546

    14-11-2010 17:12:02

    I see, but in the case of query [H]C#C and target N.[H]C#CC, there should be no explicit query H atoms matching to implied H atoms in the target. 

    ChemAxon a3d59b832c

    15-11-2010 08:03:32

    Hi Bob,


     


    You are right, there is no implicit H that should be matching in that case.


    It took a while for me to track down the problem, but here is the story:


     


    In 5.3.2, we have introduced a new search option value to respond to a support question related to duplicate search:


    o New search option value for implicitHMatcing: "ignore". "Charge:ignore" search option now forces implicitHMatcing:ignore.

    This is the description of this search option:


    --implicitHMatching:d/y/n/i   Describes the matching of implicit and
                                  explicit hydrogens.
          Values:
            d   Default: its value is y in almost every cases.
                There is only one exception: its value is n in case of duplicate
                search against a query table in a database.
            y   Implicit and explicit hydrogens can match. The sum of implicit
                and explicit hydrogens of the query atom and the sum on the
                matched target atom must equal.
            n   Implicit and explicit hydrogens cannot match. The number of
                implicit hydrogens (of the matching atoms) are not checked.
            i   Implicit and explicit hydrogens are ignored.

    In case of i (ignore), all H-s are ignored by the search, and so H atom indexes are not traced. This is why in this case the negative indexes are returned if a hit is requested for an explicit H.


    Later we realized that this setting is causing problems at non-duplicate search (this is exactly what is happening in your case), so we relaxed the forcing of the ignore H option in 5.3.6:


    o Search option charge:i sets implicitHMatching:i only in case of duplicate search type.

    So the workaround in your case for 5.3.3 is to set implicitHMatching to IMPLICIT_H_MATCHING_ENABLED:


    https://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/search/SearchOptions.html#setImplicitHMatching%28int%29


     


    I am sorry for the confusion.


    I will also make the documentation clearer about the implicit H matching option. Let me know if you have any questions relating to this.


     


    Best regards,


    Szabolcs


    New search option value for implicitHMatcing: "ignore". "Charge:ignore" search option now forces implicitHMatcing:ignore.

    User 870ab5b546

    15-11-2010 15:01:41

    Good detective work!


    So, if I set charge matching to exact or default, then I don't need to set implicit H matching, but if I set charge matching to ignore, then I do need to set implicit H matching to default, correct?

    ChemAxon a3d59b832c

    15-11-2010 15:07:25

    Yes, that's correct.


    Szabolcs

    User 870ab5b546

    15-11-2010 18:31:03

    I made the change, and it has no effect.  See this page.  The relevant code is:


            final MolSearchOptions ourSearchOpts = new MolSearchOptions();
    final Molecule targetMol = MolImporter.importMol(target);
    final Molecule queryMol = MolImporter.importMol(query);
    System.out.println("JChemCompare.jsp: target = " + targetMol.toFormat("smarts"));
    System.out.println("JChemCompare.jsp: query = " + queryMol.toFormat("smarts"));
    ourSearchOpts.setSearchType(searchType);
    ourSearchOpts.setOrderSensitiveSearch(orderSensitive);
    ourSearchOpts.setStereoModel(stereoMatchingModel);
    if (searchType != SearchConstants.PERFECT) {
    ourSearchOpts.setStereoSearch(setStereoSearch);
    ourSearchOpts.setExactStereoMatching(exactStereoMatching);
    ourSearchOpts.setDoubleBondStereoMatchingMode(doubleBondStereoMatching);
    ourSearchOpts.setChargeMatching(chargeType);
    ourSearchOpts.setRadicalMatching(radicalType);
    ourSearchOpts.setIsotopeMatching(isotopeType);
    ourSearchOpts.setValenceMatching(valenceType);
    ourSearchOpts.setVagueBondLevel(bondVagueness);
    ourSearchOpts.setImplicitHMatching(SearchConstants.IMPLICIT_H_MATCHING_ENABLED);
    }
    final MolSearch ourSearch = new MolSearch();
    ourSearch.setSearchOptions(ourSearchOpts);
    ourSearch.setTarget(targetMol);
    ourSearch.setQuery(queryMol);
    searchResult = ourSearch.isMatching();
    System.out.println("JChemCompare.jsp: searchResult = " + searchResult);

    Set the target to [#7-].[H]C#C[#6] and the query to [H]C#C.  Set charge matching to ignore and valence matching to false, choose to list all matches' atom index arrays, and choose an order-sensitive search.  


    Any other suggestions?

    ChemAxon a3d59b832c

    17-11-2010 07:26:59

    Hi Bob,


     


    I am sorry, you are right! Unfortunately there is no way to force implicitHMatching in case of charge:ignore.


    (The dependency between the charge and implicitHMatching options was coded more deeply, and it does not depend on the order of the search options that I thought before.)


     


    So the only way to fix it is to upgrade to 5.3.6 or later.


     


    I am sorry again for the inconvenience.


     


    Best regards,


    Szabolcs

    User 870ab5b546

    17-11-2010 14:51:39

    OK.  But I have another question.  You said,


    In case of explicit query H atoms matching to implied H atoms in
    the target
    , a negative number is returned. The absolute value of this
    number equals with the atom index of the heavy atom bearing the
    implicit hydrogen, or Integer.MIN_VALUE in case of 0 heavy atom index.


    My search returns,


     



         [2, 1, -2147483640]

         [1, 2, -2147483640]


    However, Integer.MIN_VALUE is -2147483648.  Why is the search returning Integer.MIN_VALUE + 8?  Seems to me it should return -6, maybe even Integer.MIN_VALUE + 6.


     

    ChemAxon 42004978e8

    19-11-2010 16:03:40

    Hi Bob,


     


    I tried query "[H]C#C", target "[#7-].[H]C#C[#6]" and obtained [-2147483641    2    3]. which is Integer.MIN_VALUE+7. Did  you obtain Integer.MIN_VALUE+8?


    Integer.MIN_VALUE+7 is the hit index for the excluded atoms. in case of implicitHMatching ignore the hydrogens are treated as excluded atoms. We will correct the documentation regarding hydrogen hit in case of ignorance and docs about the hit of excluded  atoms.


    Bye,


    Robert

    User 870ab5b546

    22-11-2010 18:04:50










    rwagner wrote:

    Hi Bob,


    I tried query "[H]C#C", target "[#7-].[H]C#C[#6]" and obtained [-2147483641    2    3]. which is Integer.MIN_VALUE+7. Did  you obtain Integer.MIN_VALUE+8?



    I got what I reported above.

    ChemAxon 42004978e8

    30-11-2010 16:07:01

    Hi Bob!


     


    Could you please paste me the output of the following command?


    jcsearch --allHits -q '[H]C#C' '[#7-].[H]C#C[#6]' --implicitHMatching:i


    if possible please attach the code sniplet handling the retrieved hit indexes, the part which writes them out after calling findFirst().


    Thanks,


    Robert

    User 870ab5b546

    30-11-2010 18:56:39

     


    bob@epoch-virtual:jchem5.3.3$ ./bin/jcsearch --allHits -q '[H]C#C' '[#7-].[H]C#C[#6]' --implicitHMatching:i
    Query has 1 match:
    Match 1:[ EXCL, 3, 4 ]
    [NH2-].[H]C#CC

    I specified the JChem 5.3.3 version of jcsearch as you can see above, but I don't know enough about Linux to know whether it would have gone for the JChem 5.4.0 version despite my instructions.  If so, please tell me how to tell it to use JChem 5.3.3.  

    ChemAxon a3d59b832c

    01-12-2010 15:54:12

    Hi Bob,


    The first line of the jcsearch command line displays the version number. So this line might be useful:


    $ ./jcsearch | head -1
    JChem Search Utility 5.4.0.0, (C) 2000-2010 ChemAxon Ltd.

    However, I think that the constants used are the same for both versions.


    We define these special constant values in class SearchConstant, all start with "HIT_":


    See:


    http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/SearchConstants.html#HIT_R


    (and similar constants below it.)


    In this case, HIT_EXCLUDEDQ should be returned, which has this value:


    public static final int HIT_EXCLUDEDQ = Integer.MIN_VALUE + 7;

     


    Are you sure that you don't increase the indexes by 1 for example for displaying the atom numbers as shown in Marvin?


    Best regards,


    Szabolcs

    User 870ab5b546

    01-12-2010 16:14:25

    Oh, yeah, I display them as +1 so they match the atom numbers that MarvinSketch displays.  Doesn't really work for Integer.MIN_VALUE + 7.


    That value still doesn't match what you wrote below:


    In case of explicit query H atoms matching to implied H atoms in 
    the target
    , a negative number is returned. The absolute value of this
    number equals with the atom index of the heavy atom bearing the
    implicit hydrogen, or Integer.MIN_VALUE in case of 0 heavy atom index.


    Anyway, this bug motivated me to switch to Marvin 5.4.0 now, so problem is moot.

    ChemAxon a3d59b832c

    01-12-2010 17:14:46

    Oh yes, I was confused myself :) .


    Previously I did not realize that in this case the H matching is switched off, and query Hydrogens are excluded automatically. So instead of a negative reference to the neighbour atom (etc...), a constant was returned.


     


    Best regards,


    Szabolcs