stereochemistry matching difference in JChem 3.0.7?

User 870ab5b546

01-02-2005 12:17:57

We just upgraded to JChem 3.0.7 from JChem 2.3.4, and now our stereochemistry matching doesn't work anymore. For example, the following structure is not matching against itself:





Code:



  Marvin  02010507162D





 12 12  0  0  0  0            999 V2000


   -0.6832    1.5375    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0


   -2.1121    1.5375    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


   -2.1121    0.7125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


   -1.3977    0.3000    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0


   -1.3977   -0.5250    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


   -0.6832    0.7125    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0


    0.0313    1.9500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


    0.0313    0.3000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


   -1.3977    1.9500    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0


   -1.3977    2.7750    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


    0.7458    1.5375    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0


    0.0313   -0.5249    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0


  4  6  1  0  0  0  0


  1  6  1  0  0  0  0


  1  9  1  0  0  0  0


  9  2  1  0  0  0  0


  2  3  2  0  0  0  0


  3  4  1  0  0  0  0


  4  5  1  6  0  0  0


  9 10  1  6  0  0  0


  7 11  3  0  0  0  0


  1  7  1  6  0  0  0


  8 12  3  0  0  0  0


  6  8  1  6  0  0  0


M  APO  2   7   1   8   1


M  STY  2   1 SUP   2 SUP


M  SAL   1  2   7  11


M  SBL   1  1  10


M  SMT   1 CN


M  SAL   2  2   8  12


M  SBL   2  1  12


M  SMT   2 CN


M  END








However, the same structure does match to the structure in which the two stereobonds to CN are replaced with regular bonds. Generally, there seems to be a problem with structures containing three or more stereocenters. Any insights? We have confirmed that the only difference between working code and not-working code is JChem.

ChemAxon a3d59b832c

01-02-2005 15:55:31

bobgr wrote:
We just upgraded to JChem 3.0.7 from JChem 2.3.4, and now our stereochemistry matching doesn't work anymore. For example, the following structure is not matching against itself:





However, the same structure does match to the structure in which the two stereobonds to CN are replaced with regular bonds. Generally, there seems to be a problem with structures containing three or more stereocenters. Any insights? We have confirmed that the only difference between working code and not-working code is JChem.
I just checked it. There seems to be some strange behaviour in marvin regarding R/S labels, we will examine this further. However, I could not reproduce the search problem with the above molfile:





Code:
$ jcsearch -q stereo.mol stereo.mol


C[C@@H]1C=C[C@H](C)[C@H](C#N)[C@@H]1C#N





$ jcsearch -t:p -q stereo.mol stereo.mol


C[C@@H]1C=C[C@H](C)[C@H](C#N)[C@@H]1C#N






One possible reason: the molecule contains superatom S-groups. It may be beneficial to expand them by calling Molecule.expandSgroups() before searching.





If that does not help, could you tell us what search option you used?

User 870ab5b546

01-02-2005 16:33:39

We are stripping out the shortcut groups before doing the comparison.





I don't know exactly what method we are using for the comparison. I just wanted to know whether you had made any changes to stereochemistry methods in JChem 3.0, because the routines we are using worked when we used JChem 2.x.

ChemAxon a3d59b832c

01-02-2005 17:02:15

bobgr wrote:
We are stripping out the shortcut groups before doing the comparison.
OK. Could you send a molfile after the stripping out?
bobgr wrote:
I don't know exactly what method we are using for the comparison. I just wanted to know whether you had made any changes to stereochemistry methods in JChem 3.0, because the routines we are using worked when we used JChem 2.x.
We made no changes in atom strereo matching since 2.3.4.

User b87faa9c01

01-02-2005 20:16:16

Try this as the expanded S-group mol file:


Code:



  Marvin  02010515012D





 12 12  0  0  0  0            999 V2000


   -0.6832    1.5375    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0


   -2.1121    1.5375    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


   -2.1121    0.7125    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


   -1.3977    0.3000    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0


   -1.3977   -0.5250    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


   -0.6832    0.7125    0.0000 C   0  0  1  0  0  0  0  0  0  0  0  0


    0.0313    1.9500    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


    0.0313    0.3000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


   -1.3977    1.9500    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0


   -1.3977    2.7750    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


    0.7458    1.5375    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0


    0.0313   -0.5249    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0


  4  6  1  0  0  0  0


  1  6  1  0  0  0  0


  1  9  1  0  0  0  0


  9  2  1  0  0  0  0


  2  3  2  0  0  0  0


  3  4  1  0  0  0  0


  4  5  1  6  0  0  0


  9 10  1  6  0  0  0


  7 11  3  0  0  0  0


  1  7  1  6  0  0  0


  8 12  3  0  0  0  0


  6  8  1  6  0  0  0


M  APO  2   7   1   8   1


M  STY  2   1 SUP   2 SUP


M  SAL   1  2   7  11


M  SBL   1  1  10


M  SMT   1 CN


M  SDS EXP  1   1


M  SAL   2  2   8  12


M  SBL   2  1  12


M  SMT   2 CN


M  SDS EXP  1   2


M  END





This Molfile is supposed to be compared to itself, and comes up false.





setStereoSearch is true.


setSearchType is PERFECT.


setExactStereoMatching is true.





Let me know if more info is needed.

ChemAxon a3d59b832c

01-02-2005 21:56:30

Hmm, still works here:


Code:
$ jcsearch --exactStereoSearch:y -t:p -q stereo_exp.mol stereo_exp.mol


C[C@@H]1C=C[C@H](C)[C@H](C#N)[C@@H]1C#N



Is your molecule stored in a database?





If yes, please have a look if the "Assume stereo flag" is checked on the table.


(You can do that in JCHem Manager (jcman) in the Regenerate menu.)


I would also be interested in the setting getAbsoluteStereo() of the JChemSearch object you are using for the search.

User 870ab5b546

02-02-2005 18:27:22

OK, let's try this. See the PNG file below. Each structure in the list can be submitted as a response. Each response is then compared successively with each structure in the list. Response 1 should match strucs 1, 3, and 5, response 2 should match strucs 2, 3, and 5, response 3 should match strucs 3 and 5, response 4 should match strucs 4 and 5, and response 5 should match only struc 5. What we are finding is that all responses match only struc 5.





Can you see whether you can duplicate our results? I'm not sure which struc is the "query" and which the "target" in your lingo, but I'm sure you can figure it out from the statement above.





One more tidbit: This problem seems to occur only when the structure has three or more stereocenters. In other words, when the MeO group is deleted from strucs 3, 4, and 5, to make strucs A, B, and C, then response A matches A and C, and response B matches B and C.





If you can't duplicate our findings, then possibly the problem is in how we handle the MOL files before doing the comparison. Sam and Mike are going to get the post-processing files to you so you can compare them.

ChemAxon a3d59b832c

02-02-2005 21:43:09

bobgr wrote:
OK, let's try this. See the PNG file below. Each structure in the list can be submitted as a response. Each response is then compared successively with each structure in the list. Response 1 should match strucs 1, 3, and 5, response 2 should match strucs 2, 3, and 5, response 3 should match strucs 3 and 5, response 4 should match strucs 4 and 5, and response 5 should match only struc 5. What we are finding is that all responses match only struc 5.





Can you see whether you can duplicate our results? I'm not sure which struc is the "query" and which the "target" in your lingo, but I'm sure you can figure it out from the statement above.
OK, so the response is the target and the list are the queries, I suppose.


Be careful here! You would not get these results with the above settings.


exactStereoMatching should be unset for these results. I used exact searching with the default stereochemistry search settings and received the results as you described it should work.





Could you please check what happens when the chiral flag is turned on for the structures? ( Edit/Absolute stereo(CHIRAL) )


(For me the chiral flag was unchecked in all 5 structures.)





See also my previous question about the absolute stereo flags in the database part.
bobgr wrote:
One more tidbit: This problem seems to occur only when the structure has three or more stereocenters. In other words, when the MeO group is deleted from strucs 3, 4, and 5, to make strucs A, B, and C, then response A matches A and C, and response B matches B and C.





If you can't duplicate our findings, then possibly the problem is in how we handle the MOL files before doing the comparison. Sam and Mike are going to get the post-processing files to you so you can compare them.
OK. May I have those files, please?

User 870ab5b546

02-02-2005 22:20:43

The problem shouldn't be related to the chiral flag, because we compare each response and its mirror image (generated mathematically) to the structure in the database. We don't ever use the chiral flag. If we will accept either enantiomer, we just check the response and its enantiomer. If we don't, we don't.





I've asked Sam and Mike again to send you those files post-processing. It appears that either we are doing something to the files that is causing the match to fail (can't imagine what; remember, it worked when we used JChem 2.3.4), or there is something wrong with our JChem installation.

User b87faa9c01

03-02-2005 16:05:42

This is the code that matches the expanded, ungrouped structures.





Code:



public static boolean matchExact(String resp_struct,


            String ans_struct, boolean ignoreStereo)


            throws MolFileException {


      boolean preMatch =


            matchExact_JChem(resp_struct, ans_struct, ignoreStereo);


      if (!preMatch) return false;


      // now we know that there is a match


      //if stereo is ignored, never mind the rest


      if (ignoreStereo) return true;





      // If this is a perfect match, there is no further contest


      if (matchPrecise_JChem(resp_struct, ans_struct)) {


         //System.out.println(" precise match found; result true");


         return true;


      }





      // if matched, then check whether the match was made due to the


      // problem with squiggly bond matching


      


      // If this is a perfect match, there is no further contest








      String[] substMols  = getSquigglyReplacedCombinations(ans_struct);


      // no sqiuggly bonds in author's structure


      if (substMols==null) {


         //System.out.println(" no squiggly bonds; result true");


            return true;


      }


      for (int i=0; i<substMols.length; i++) {


         //one of the combinations matches


         //that means it could be the (4,2) or (4,3) cells of match matrix.


         if (matchExact_JChem(resp_struct,substMols[i],false)) {


         //System.out.println("match is due to squiggly bonds; res false");


            return false;


         }


      }


      //System.out.println(" squiggly bond test over; result true ");


      return true;


   }





   // Matches according the predetermined behaviour of answers/responses


   // ans_struct is the response given by the user


    // struct is the expected answe


    // See the documentation for the behaviour of this function


    // with/without ignoreStereo, for single/double bonds


    // NOTE: This is not a diagonal match.


     public static boolean matchExact_JChem(String resp_struct,


         String ans_struct, boolean ignoreStereo)


         throws MolFileException {





         // NOTE: Following options are set using an extensive testing


            // of possible combinations in MolSearch params to get an


         // expected behavior. If changes are made to this, make sure


         // that expected behavior is unchanged for all input combinations


         // System.out.println("Match exact resp " + resp_struct);


         // System.out.println("Match exact answer " + ans_struct);





         try {





               MolSearch s1 = new MolSearch();


              s1.setSearchType(SearchConstants.EXACT);


               if ( ignoreStereo ) {


                   s1.setStereoSearch(false);


               } else {


                  s1.setStereoSearch(true);


                   //s1.setDoubleBondStereoMatchingMode(


                     //StereoConstants.DBS_ALL);


               /* Deprecated in JChem 2.2 with the above method */


               s1.setStereoCareChecking(false);


            }


            // additonal flags for matching isotopes, radical and charge


            s1.setExactChargeMatching(true);


            s1.setExactIsotopeMatching(true);


            s1.setExactRadicalMatching(true);





               MolHandler target1 = new MolHandler(resp_struct);


               MolHandler qry1 = new MolHandler(ans_struct);


               s1.setTarget(target1.getMolecule());


               s1.setQuery(qry1.getMolecule());


            //System.out.println("Match exact resp " + resp_struct);


            //System.out.println("Match exact struct " + ans_struct);


               boolean res1 = s1.isMatching();


               return res1;





         } catch (MolFormatException e1) {


            System.out.println("Error in matchExact ");


            System.out.println(" MOLFORMAT EXCEPTION FOR either "


                  + "of these two ");


            System.out.println(resp_struct + "\n\n\n" + ans_struct);


            e1.printStackTrace();


            throw new MolFileException("MolFile error " + e1.getMessage());


         } catch (SearchException e2){


            System.out.println("Error in matchExact ");


            e2.printStackTrace();


            throw new MolFileException("MolFile error " + e2.getMessage());


         }


    }





   // Perfect matching (only diagonal elements in match matrix is yes)


   public static boolean matchPrecise_JChem(String resp_struct,


            String ans_struct)


            throws MolFileException {


       try {


                MolSearch s1 = new MolSearch();


                s1.setSearchType(SearchConstants.PERFECT);


            s1.setStereoSearch(true);


            s1.setExactStereoMatching(true);


                MolHandler target1 = new MolHandler(resp_struct);


                MolHandler qry1 = new MolHandler(ans_struct);


                s1.setTarget(target1.getMolecule());


                s1.setQuery(qry1.getMolecule());


                //System.out.println("Match exact resp " + resp_struct);


                //System.out.println("Match exact struct " + ans_struct);


                return s1.isMatching();


      } catch (MolFormatException e1) {


                System.out.println("Error in matchExact ");


                System.out.println(" MOLFORMAT EXCEPTION FOR either "


                        + "of these two ");


                System.out.println(resp_struct + "\n\n\n" + ans_struct);


                e1.printStackTrace();


                throw new MolFileException("MolFile error " + e1.getMessage());


      } catch (SearchException e2){


                System.out.println("Error in matchExact ");


                e2.printStackTrace();


                throw new MolFileException("MolFile error " + e2.getMessage());


        }


    }


User 870ab5b546

03-02-2005 16:31:36

Just to explain the part of the code regarding squiggly bonds: We use a different stereochemistry-matching matrix from either one in JChem. In one matrix, JChem treats squiggly bonds and unspecified bonds as identical, in the other, it doesn't match bold or hashed to unspecified. We need squiggly bonds to be treated just like bold or hashed. So, if the condition's structure contains a tetrahedral stereocenter, and its configuration is specified as R, S, or a mixture (the last with a squiggly bond), then the stereocenter in the response must be the same for an exact match to occur. If, on the other hand, the condition's structure does not specify the configuration of the stereocenter, the stereocenter in the response may have any configuration.





Here's the algorithm we use to check squiggly bonds (obviously, it looks different in Java):





Code:
Call the condition's structure, A.


Call the number of squiggly bonds in A, s.


Call the response, R.





n = 0


If R matches A then match = yes else match = no


While (match == yes and n < s)  {


   n++


   Convert A to A1 by changing squiggly bond n to an up bond.


   Convert A to A2 by changing squiggly bond n to a down bond.


   If R matches A1 or R matches A2 then match = no


}


ChemAxon a3d59b832c

03-02-2005 16:33:52

Thanks, I am looking at the code now. Could you also send the body of


getSquigglyReplacedCombinations() ? It is missing for the compilation.

User b87faa9c01

03-02-2005 16:49:57

Code:



private static String[] getSquigglyReplacedCombinations(String molStruct) {


      Vector result = new Vector();


        try {


         // go through all the bonds and check for squiggly bonds


         // store all the squggly bond positions in squgglyPos   


            MolHandler mh = new MolHandler(molStruct);


            Molecule mol = mh.getMolecule();


         int[] squigglyPos = new int[20];


         int squigglyCount = 0;


            for (int i=0; i<mol.getEdgeCount(); i++) {


                MolBond mb = (MolBond) mol.getEdge(i);


            int stereo = mb.getFlags() & MolBond.STEREO_MASK;


            if (stereo == MOLBOND_SQUIGGLY) {


               squigglyPos[squigglyCount++] = i;


            }


         }


         // return null if no squggly bonds


         if (squigglyCount==0) return null;


         int two_power = (int) Math.pow(2, squigglyCount);


         for (int bitVect=1; bitVect<two_power; bitVect*=2) {


            //flip through rightmost n bits to check the ones


                int[] replacePos = new int[20];


                int pickCt = 0;


                for (int i=0; i<squigglyCount; i++) {


                    int bitVal = (bitVect >>> i ) & 1;


                    if (bitVal==1) {


                        replacePos[pickCt++] = squigglyPos[i];


                    }


                }


                Vector molList = new Vector(); molList.add(molStruct);


                Vector list = getCombinations(molList, replacePos, pickCt-1);


                System.out.println("combinations at level = " + list.size());


                result.add(list);


            }





         /*


         // to find all combinations of postions to substitute


         // create bitVectors with values 1..2^squigglyCount-1


         // the ones represent the positions to substitute.


            // those at the zeros positions remain the same


         int two_power = (int) Math.pow(2, squigglyCount);


         for (int bitVect=1; bitVect<two_power; bitVect++) {


         //flip through rightmost n bits to check the ones


            int[] replacePos = new int[20];


               int pickCt = 0;


            for (int i=0; i<squigglyCount; i++) {


               int bitVal = (bitVect >>> i ) & 1;


               if (bitVal==1) {


                  replacePos[pickCt++] = squigglyPos[i];


               }


            }


            Vector molList = new Vector(); molList.add(molStruct);


            Vector list = getCombinations(molList, replacePos, pickCt-1);


            //System.out.println(" combinations at level = " + list.size());


            result.add(list);


         }


         */


      } catch (MolFormatException e) {


            System.out.println("MOLFORMAT EXCEPTION FOR "+ molStruct);


            e.printStackTrace();


         return null;


        }


      // get the results from the vetor of vectors


      int count = 0;   


      for (int i=0; i<result.size(); i++) {


         Vector list = (Vector) result.get(i);


         count += list.size();


      }


      String[] resultArr = new String[count];


      int ct = 0;


      for (int i=0; i<result.size(); i++) {


         Vector list = (Vector) result.get(i);


         for (int j=0; j<list.size(); j++)


            resultArr[ct++] = (String) list.get(j);


      }


      return resultArr;


    }


ChemAxon a3d59b832c

04-02-2005 09:44:06

Thanks!





Now I was able to reproduce the bug.





I used the following value for MOLBOND_SQUIGGLY:


Code:
private static final int MOLBOND_SQUIGGLY = (MolBond.UP | MolBond.DOWN);



Is this correct?





Back to the bug: I am investigating it now. In the meantime, I found a workaround: in method matchExact_JChem() insert this:





Code:
s1.setOption(SearchConstants.OPTION_KEEP_QUERY_ORDER, SearchConstants.KEEP_QUERY_ORDER);



before line


Code:
boolean res1 = s1.isMatching();






What is happening here in brief: MolSearch in recent JChem versions rearranges the atoms of the query in a way that the search will require the least possible time for matching. It should preserve parity information, but most likely this part fails. The above line tells MolSearch not to rearrange atoms of the query, and it also means that the search will be a bit slower.

User 870ab5b546

05-02-2005 11:14:30

Your workaround worked wonderfully.





Let us know when you have fixed the bug so we can remove the workaround.





(You guys owe us.... ;-) )

ChemAxon a3d59b832c

08-02-2005 17:22:59

bobgr wrote:
Your workaround worked wonderfully.





Let us know when you have fixed the bug so we can remove the workaround.





(You guys owe us.... ;-) )
Yes. Thank you for the bug report.





You can download the current release (3.0.8 ) which contains the fix.





Please tell us when you find something suspicious again.

User 870ab5b546

08-02-2005 18:52:06

Szabolcs wrote:
Please tell us when you find something suspicious again.
"When"? Goodness, I hope not.