User 870ab5b546
01-02-2005 12:17:57
We just upgraded to JChem 3.0.7 from JChem 2.3.4, and now our stereochemistry matching doesn't work anymore. For example, the following structure is not matching against itself:
Code: |
Marvin 02010507162D
12 12 0 0 0 0 999 V2000
-0.6832 1.5375 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
-2.1121 1.5375 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.1121 0.7125 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.3977 0.3000 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
-1.3977 -0.5250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.6832 0.7125 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
0.0313 1.9500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0313 0.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.3977 1.9500 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0
-1.3977 2.7750 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7458 1.5375 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
0.0313 -0.5249 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
4 6 1 0 0 0 0
1 6 1 0 0 0 0
1 9 1 0 0 0 0
9 2 1 0 0 0 0
2 3 2 0 0 0 0
3 4 1 0 0 0 0
4 5 1 6 0 0 0
9 10 1 6 0 0 0
7 11 3 0 0 0 0
1 7 1 6 0 0 0
8 12 3 0 0 0 0
6 8 1 6 0 0 0
M APO 2 7 1 8 1
M STY 2 1 SUP 2 SUP
M SAL 1 2 7 11
M SBL 1 1 10
M SMT 1 CN
M SAL 2 2 8 12
M SBL 2 1 12
M SMT 2 CN
M END
|
However, the same structure does match to the structure in which the two stereobonds to CN are replaced with regular bonds. Generally, there seems to be a problem with structures containing three or more stereocenters. Any insights? We have confirmed that the only difference between working code and not-working code is JChem.
User 870ab5b546
01-02-2005 16:33:39
We are stripping out the shortcut groups before doing the comparison.
I don't know exactly what method we are using for the comparison. I just wanted to know whether you had made any changes to stereochemistry methods in JChem 3.0, because the routines we are using worked when we used JChem 2.x.
User b87faa9c01
01-02-2005 20:16:16
Try this as the expanded S-group mol file:
Code: |
Marvin 02010515012D
12 12 0 0 0 0 999 V2000
-0.6832 1.5375 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
-2.1121 1.5375 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.1121 0.7125 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.3977 0.3000 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
-1.3977 -0.5250 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.6832 0.7125 0.0000 C 0 0 1 0 0 0 0 0 0 0 0 0
0.0313 1.9500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0313 0.3000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.3977 1.9500 0.0000 C 0 0 2 0 0 0 0 0 0 0 0 0
-1.3977 2.7750 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7458 1.5375 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
0.0313 -0.5249 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
4 6 1 0 0 0 0
1 6 1 0 0 0 0
1 9 1 0 0 0 0
9 2 1 0 0 0 0
2 3 2 0 0 0 0
3 4 1 0 0 0 0
4 5 1 6 0 0 0
9 10 1 6 0 0 0
7 11 3 0 0 0 0
1 7 1 6 0 0 0
8 12 3 0 0 0 0
6 8 1 6 0 0 0
M APO 2 7 1 8 1
M STY 2 1 SUP 2 SUP
M SAL 1 2 7 11
M SBL 1 1 10
M SMT 1 CN
M SDS EXP 1 1
M SAL 2 2 8 12
M SBL 2 1 12
M SMT 2 CN
M SDS EXP 1 2
M END
|
This Molfile is supposed to be compared to itself, and comes up false.
setStereoSearch is true.
setSearchType is PERFECT.
setExactStereoMatching is true.
Let me know if more info is needed.
User 870ab5b546
02-02-2005 18:27:22
OK, let's try this. See the PNG file below. Each structure in the list can be submitted as a response. Each response is then compared successively with each structure in the list. Response 1 should match strucs 1, 3, and 5, response 2 should match strucs 2, 3, and 5, response 3 should match strucs 3 and 5, response 4 should match strucs 4 and 5, and response 5 should match only struc 5. What we are finding is that all responses match only struc 5.
Can you see whether you can duplicate our results? I'm not sure which struc is the "query" and which the "target" in your lingo, but I'm sure you can figure it out from the statement above.
One more tidbit: This problem seems to occur only when the structure has three or more stereocenters. In other words, when the MeO group is deleted from strucs 3, 4, and 5, to make strucs A, B, and C, then response A matches A and C, and response B matches B and C.
If you can't duplicate our findings, then possibly the problem is in how we handle the MOL files before doing the comparison. Sam and Mike are going to get the post-processing files to you so you can compare them.
User 870ab5b546
02-02-2005 22:20:43
The problem shouldn't be related to the chiral flag, because we compare each response and its mirror image (generated mathematically) to the structure in the database. We don't ever use the chiral flag. If we will accept either enantiomer, we just check the response and its enantiomer. If we don't, we don't.
I've asked Sam and Mike again to send you those files post-processing. It appears that either we are doing something to the files that is causing the match to fail (can't imagine what; remember, it worked when we used JChem 2.3.4), or there is something wrong with our JChem installation.
User b87faa9c01
03-02-2005 16:05:42
This is the code that matches the expanded, ungrouped structures.
Code: |
public static boolean matchExact(String resp_struct,
String ans_struct, boolean ignoreStereo)
throws MolFileException {
boolean preMatch =
matchExact_JChem(resp_struct, ans_struct, ignoreStereo);
if (!preMatch) return false;
// now we know that there is a match
//if stereo is ignored, never mind the rest
if (ignoreStereo) return true;
// If this is a perfect match, there is no further contest
if (matchPrecise_JChem(resp_struct, ans_struct)) {
//System.out.println(" precise match found; result true");
return true;
}
// if matched, then check whether the match was made due to the
// problem with squiggly bond matching
// If this is a perfect match, there is no further contest
String[] substMols = getSquigglyReplacedCombinations(ans_struct);
// no sqiuggly bonds in author's structure
if (substMols==null) {
//System.out.println(" no squiggly bonds; result true");
return true;
}
for (int i=0; i<substMols.length; i++) {
//one of the combinations matches
//that means it could be the (4,2) or (4,3) cells of match matrix.
if (matchExact_JChem(resp_struct,substMols[i],false)) {
//System.out.println("match is due to squiggly bonds; res false");
return false;
}
}
//System.out.println(" squiggly bond test over; result true ");
return true;
}
// Matches according the predetermined behaviour of answers/responses
// ans_struct is the response given by the user
// struct is the expected answe
// See the documentation for the behaviour of this function
// with/without ignoreStereo, for single/double bonds
// NOTE: This is not a diagonal match.
public static boolean matchExact_JChem(String resp_struct,
String ans_struct, boolean ignoreStereo)
throws MolFileException {
// NOTE: Following options are set using an extensive testing
// of possible combinations in MolSearch params to get an
// expected behavior. If changes are made to this, make sure
// that expected behavior is unchanged for all input combinations
// System.out.println("Match exact resp " + resp_struct);
// System.out.println("Match exact answer " + ans_struct);
try {
MolSearch s1 = new MolSearch();
s1.setSearchType(SearchConstants.EXACT);
if ( ignoreStereo ) {
s1.setStereoSearch(false);
} else {
s1.setStereoSearch(true);
//s1.setDoubleBondStereoMatchingMode(
//StereoConstants.DBS_ALL);
/* Deprecated in JChem 2.2 with the above method */
s1.setStereoCareChecking(false);
}
// additonal flags for matching isotopes, radical and charge
s1.setExactChargeMatching(true);
s1.setExactIsotopeMatching(true);
s1.setExactRadicalMatching(true);
MolHandler target1 = new MolHandler(resp_struct);
MolHandler qry1 = new MolHandler(ans_struct);
s1.setTarget(target1.getMolecule());
s1.setQuery(qry1.getMolecule());
//System.out.println("Match exact resp " + resp_struct);
//System.out.println("Match exact struct " + ans_struct);
boolean res1 = s1.isMatching();
return res1;
} catch (MolFormatException e1) {
System.out.println("Error in matchExact ");
System.out.println(" MOLFORMAT EXCEPTION FOR either "
+ "of these two ");
System.out.println(resp_struct + "\n\n\n" + ans_struct);
e1.printStackTrace();
throw new MolFileException("MolFile error " + e1.getMessage());
} catch (SearchException e2){
System.out.println("Error in matchExact ");
e2.printStackTrace();
throw new MolFileException("MolFile error " + e2.getMessage());
}
}
// Perfect matching (only diagonal elements in match matrix is yes)
public static boolean matchPrecise_JChem(String resp_struct,
String ans_struct)
throws MolFileException {
try {
MolSearch s1 = new MolSearch();
s1.setSearchType(SearchConstants.PERFECT);
s1.setStereoSearch(true);
s1.setExactStereoMatching(true);
MolHandler target1 = new MolHandler(resp_struct);
MolHandler qry1 = new MolHandler(ans_struct);
s1.setTarget(target1.getMolecule());
s1.setQuery(qry1.getMolecule());
//System.out.println("Match exact resp " + resp_struct);
//System.out.println("Match exact struct " + ans_struct);
return s1.isMatching();
} catch (MolFormatException e1) {
System.out.println("Error in matchExact ");
System.out.println(" MOLFORMAT EXCEPTION FOR either "
+ "of these two ");
System.out.println(resp_struct + "\n\n\n" + ans_struct);
e1.printStackTrace();
throw new MolFileException("MolFile error " + e1.getMessage());
} catch (SearchException e2){
System.out.println("Error in matchExact ");
e2.printStackTrace();
throw new MolFileException("MolFile error " + e2.getMessage());
}
}
|
User 870ab5b546
03-02-2005 16:31:36
Just to explain the part of the code regarding squiggly bonds: We use a different stereochemistry-matching matrix from either one in JChem. In one matrix, JChem treats squiggly bonds and unspecified bonds as identical, in the other, it doesn't match bold or hashed to unspecified. We need squiggly bonds to be treated just like bold or hashed. So, if the condition's structure contains a tetrahedral stereocenter, and its configuration is specified as R, S, or a mixture (the last with a squiggly bond), then the stereocenter in the response must be the same for an exact match to occur. If, on the other hand, the condition's structure does not specify the configuration of the stereocenter, the stereocenter in the response may have any configuration.
Here's the algorithm we use to check squiggly bonds (obviously, it looks different in Java):
Code: |
Call the condition's structure, A.
Call the number of squiggly bonds in A, s.
Call the response, R.
n = 0
If R matches A then match = yes else match = no
While (match == yes and n < s) {
n++
Convert A to A1 by changing squiggly bond n to an up bond.
Convert A to A2 by changing squiggly bond n to a down bond.
If R matches A1 or R matches A2 then match = no
}
|
ChemAxon a3d59b832c
03-02-2005 16:33:52
Thanks, I am looking at the code now. Could you also send the body of
getSquigglyReplacedCombinations() ? It is missing for the compilation.
ChemAxon a3d59b832c
04-02-2005 09:44:06
Thanks!
Now I was able to reproduce the bug.
I used the following value for MOLBOND_SQUIGGLY:
Code: |
private static final int MOLBOND_SQUIGGLY = (MolBond.UP | MolBond.DOWN); |
Is this correct?
Back to the bug: I am investigating it now. In the meantime, I found a workaround: in method matchExact_JChem() insert this:
Code: |
s1.setOption(SearchConstants.OPTION_KEEP_QUERY_ORDER, SearchConstants.KEEP_QUERY_ORDER); |
before line
Code: |
boolean res1 = s1.isMatching(); |
What is happening here in brief: MolSearch in recent JChem versions rearranges the atoms of the query in a way that the search will require the least possible time for matching. It should preserve parity information, but most likely this part fails. The above line tells MolSearch not to rearrange atoms of the query, and it also means that the search will be a bit slower.
User 870ab5b546
05-02-2005 11:14:30
Your workaround worked wonderfully.
Let us know when you have fixed the bug so we can remove the workaround.
(You guys owe us.... ;-) )