MolSearch inconsistency

User 870ab5b546

01-05-2011 20:13:59

The code:



/** Determines whether a response molecule (target in ChemAxon parlance)
* matches to an author's molecule (query). If stereochemistry is not
* ignored, any stereo bond in the author's molecule must be present in
* the response molecule, but a nonstereobond in the author's molecule
* matches to any stereo bond (or none) in the response molecule.
* (Value of setImplicitHMatching() is set to IMPLICIT_H_MATCHING_ENABLED
* by default. DISABLED would mean that explicit H atoms in the author's
* substructure would have to be explicit in the response.)
* @param respMol a response molecule
* @param authMol an author's molecule
* @param stereoType flags for treating stereochemistry
* @return true if the response molecule matches the author's molecule
*/
public static boolean matchExact(Molecule respMol, Molecule authMol,
int stereoType) throws MolFileException {
final String SELF = "MolFunctions.matchExact: ";
boolean match = false;
final MolSearchOptions searchOpts = new MolSearchOptions();
searchOpts.setSearchType(FULL);
searchOpts.setVagueBondLevel(VAGUE_BOND_OFF);
// required for comparing nonaromatized aromatic rings
searchOpts.setStereoModel(STEREO_MODEL_GLOBAL);
searchOpts.setChargeMatching(CHARGE_MATCHING_EXACT);
searchOpts.setIsotopeMatching(ISOTOPE_MATCHING_EXACT);
searchOpts.setRadicalMatching(RADICAL_MATCHING_EXACT);
searchOpts.setValenceMatching(true);
setStereoOptions(searchOpts, stereoType);
final MolSearch search = new MolSearch();
search.setSearchOptions(searchOpts);
search.setTarget(respMol);
search.setQuery(authMol);
debugPrint(SELF + "stereotype = ", stereoType);
debugPrintMRV(SELF + "response:\n", respMol);
debugPrintMRV(SELF + "author structure:\n", authMol);
try {
match = search.isMatching();
debugPrint(SELF + "search result is ", match);
} catch (SearchException e2) {
Utils.alwaysPrint("Error in " + SELF);
e2.printStackTrace();
throw new MolFileException(ERROR + e2.getMessage());
} // try
return match;
} // matchExact(Molecule, Molecule, int)

/** Sets the search options related to stereochemistry.
* @param searchOpts contains the search options
* @param stereoType flags for treating stereochemistry
*/
private static void setStereoOptions(MolSearchOptions searchOpts,
int stereoType) {
final String SELF = "MolFunctions.setStereoOptions: ";
final int searchType = searchOpts.getSearchType();
debugPrint(SELF + "searchType = ",
(searchType == DUPLICATE ? "DUPLICATE"
: searchType == FULL ? "FULL" : searchType));
final boolean ignore2D =
(stereoType & IGNORE_DBL_BOND_STEREO) != 0;
final boolean ignore3D =
(stereoType & IGNORE_TETRAHEDRAL_STEREO) != 0;
final boolean wavyAnd = (stereoType & WAVY_AND) != 0;
if (ignore2D) {
debugPrint(SELF + "ignoring 2D stereochemistry.");
searchOpts.setDoubleBondStereoMatchingMode(DBS_NONE);
} else {
debugPrint(SELF + "pay attention to 2D stereochemistry.");
searchOpts.setDoubleBondStereoMatchingMode(DBS_ALL);
} // if ignore2D
if (ignore3D) {
debugPrint(SELF + "ignoring 3D stereochemistry.");
searchOpts.setStereoSearchType(STEREO_IGNORE);
} else {
debugPrint(SELF + "pay attention to 3D stereochemistry.");
if (searchType != DUPLICATE) {
searchOpts.setStereoSearchType(STEREO_SPECIFIC);
} // if search initially set to FULL, not DUPLICATE
if (wavyAnd) {
searchOpts.setKeepQueryOrder(true); // ChemAxon says it's needed
debugPrint(SELF + "adding WavyBondMatcher.");
searchOpts.addUserComparator(new WavyBondMatcher());
} else debugPrint(SELF + "not adding WavyBondMatcher.");
} // if ignore3D
} // setStereoOptions(MolSearchOptions, int)

The log output when the molecules are taken from an array and submitted to matchExact():

MolFunctions.setStereoOptions: searchType = FULL
MolFunctions.setStereoOptions: pay attention to 2D stereochemistry.
MolFunctions.setStereoOptions: pay attention to 3D stereochemistry.
MolFunctions.setStereoOptions: adding WavyBondMatcher.
MolFunctions.matchExact: stereotype = 1
MolFunctions.matchExact: response:
<?xml version="1.0" ?>
<cml>
<MDocument>
<MChemicalStruct>
<molecule molID="m1">
<atomArray
atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10"
elementType="C C C C C C C C C C"
x2="54.23687744140625 52.90320810035799 52.90320810035799 54.23687744140625 55.570546782454514 55.570546782454514 56.90421612350277 56.90421612350277 58.237885464551034 58.237885464551034"
y2="32.687014166762616 31.91699722620357 30.376963345085493 29.606946404526447 30.376963345085493 31.91699722620357 32.687014166762616 29.606946404526447 30.376963345085493 31.91699722620357"
/>
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a1 a6" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a6 a5" order="2" />
<bond atomRefs2="a5 a8" order="1" />
<bond atomRefs2="a7 a6" order="1" />
<bond atomRefs2="a7 a10" order="1" />
<bond atomRefs2="a8 a9" order="1" />
<bond atomRefs2="a9 a10" order="1" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
</cml>

MolFunctions.matchExact: author structure:
<?xml version="1.0" ?>
<cml>
<MDocument>
<MChemicalStruct>
<molecule molID="m1">
<atomArray
atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11"
elementType="C C C C C C C C C C H"
x2="54.26373105015538 52.93006170910712 52.93006170910712 54.26373105015538 55.59740039120364 55.59740039120364 56.9310697322519 56.9310697322519 58.26473907330016 58.26473907330016 56.9310697322519"
y2="18.06352453710796 17.293507596548913 15.753473715430836 14.98345677487179 15.753473715430836 17.293507596548913 18.06352453710796 14.98345677487179 15.753473715430836 17.293507596548913 13.443456774871791"
/>
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a1 a6" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a6 a5" order="2" />
<bond atomRefs2="a5 a8" order="1" />
<bond atomRefs2="a7 a6" order="1" />
<bond atomRefs2="a7 a10" order="1" />
<bond atomRefs2="a8 a9" order="1" />
<bond atomRefs2="a8 a11" order="1" />
<bond atomRefs2="a9 a10" order="1" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
</cml>

MolFunctions.matchExact: search result is false

So then I copy the two MRV encodings from the log and submit the structures to matchExact() again through a Web page:

MolFunctions.setStereoOptions: searchType = FULL
MolFunctions.setStereoOptions: pay attention to 2D stereochemistry.
MolFunctions.setStereoOptions: pay attention to 3D stereochemistry.
MolFunctions.setStereoOptions: adding WavyBondMatcher.
MolFunctions.matchExact: stereotype = 1
MolFunctions.matchExact: response:
<?xml version="1.0" ?>
<cml>
<MDocument>
<MChemicalStruct>
<molecule molID="m1">
<atomArray
atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10"
elementType="C C C C C C C C C C"
x2="54.23687744140625 52.90320810035799 52.90320810035799 54.23687744140625 55.570546782454514 55.570546782454514 56.90421612350277 56.90421612350277 58.237885464551034 58.237885464551034"
y2="32.687014166762616 31.91699722620357 30.376963345085493 29.606946404526447 30.376963345085493 31.91699722620357 32.687014166762616 29.606946404526447 30.376963345085493 31.91699722620357"
/>
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a1 a6" order="1" />
<bond atomRefs2="a7 a6" order="1" />
<bond atomRefs2="a6 a5" order="2" />
<bond atomRefs2="a5 a8" order="1" />
<bond atomRefs2="a8 a9" order="1" />
<bond atomRefs2="a9 a10" order="1" />
<bond atomRefs2="a7 a10" order="1" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
</cml>

MolFunctions.matchExact: author structure:
<?xml version="1.0" ?>
<cml>
<MDocument>
<MChemicalStruct>
<molecule molID="m1">
<atomArray
atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11"
elementType="C C C C C C C C C C H"
x2="54.26373105015538 52.93006170910712 52.93006170910712 54.26373105015538 55.59740039120364 55.59740039120364 56.9310697322519 56.9310697322519 58.26473907330016 58.26473907330016 56.9310697322519"
y2="18.06352453710796 17.293507596548913 15.753473715430836 14.98345677487179 15.753473715430836 17.293507596548913 18.06352453710796 14.98345677487179 15.753473715430836 17.293507596548913 13.443456774871791"
/>
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a1 a6" order="1" />
<bond atomRefs2="a7 a6" order="1" />
<bond atomRefs2="a6 a5" order="2" />
<bond atomRefs2="a5 a8" order="1" />
<bond atomRefs2="a8 a9" order="1" />
<bond atomRefs2="a9 a10" order="1" />
<bond atomRefs2="a7 a10" order="1" />
<bond atomRefs2="a8 a11" order="1" />
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument>
</cml>

MolFunctions.matchExact: search result is true

I cannot for the life of me figure out why the molecules give a false result in the first case, and a true result in the second.  Any ideas?  The behavior is consistent.  I notice that the sequences of the bonds in the two cases are different, but that shouldn't matter -- or maybe it does?


Here's some additional information about the incorrect matching behavior (the first case).  If I add an H atom anywhere to the target, it matches correctly to the query.  If I add a second H atom to the C of the query that already has one H, the original target does not match it.  If the query has one H atom on one, two, or three allylic C atoms, the target does not match it, but if it has one H atom on all four allylic C atoms, then it does.  However, if the query has one H atom on three allylic C atoms and two on the fourth, then the target does not match it.  So there seems to be some symmetry issue going on here: the symmetry of the H atom substitution pattern in the query matters to whether the target matches.  But not always, as shown by the other submission.

ChemAxon 42004978e8

03-05-2011 09:12:33

Hello Bob,


Please attach the code of WavyBondMatcher as well. It might be the cause of difference. 


Do you get the same results if that matcher is not added? 


Bye,


Robert

User 870ab5b546

03-05-2011 13:08:48

Good suggestion, but I get the same result when I comment out the line that adds WavyBondMatcher.


But here's the code for WavyBondMatcher anyway:


package com.epoch.chem;

import chemaxon.sss.search.MolComparator;
import chemaxon.struc.StereoConstants;

/** Alters normal JChem matching behavior so that a query wavy bond matches
* only to a target wavy bond, but a query single bond continues to match
* to any target bond. */
public final class WavyBondMatcher extends MolComparator
implements StereoConstants {

private void debugPrint(Object... msg) {
// Utils.printToLog(msg);
}

/** Constructor. */
public WavyBondMatcher() {
// intentionally empty
}

/** Compares the parity of two atoms to ensure that they match according to
* the desired behavior.
* @param queryAtomNum index of the query atom (author's structure)
* @param targetAtomNum index of the target atom (student's response)
* @return true if the atoms match or one is implicit H atom
*/
public boolean compareAtoms(int queryAtomNum, int targetAtomNum) {
boolean match = true;
final int qAtomNum = getOrigQueryAtom(queryAtomNum);
final int tAtomNum = getOrigTargetAtom(targetAtomNum);
if (qAtomNum != -1 && tAtomNum != -1) {
final int qLocalParity = query.getLocalParity(qAtomNum);
final int tLocalParity = target.getLocalParity(tAtomNum);
match = !(qLocalParity == PARITY_EITHER // query wavy bond present
&& tLocalParity != PARITY_EITHER); // no target wavy bond
debugPrint("WavyBondMatcher.compareAtoms: "
+ "query = ", query, ", target = ", target,
"; \nqAtom ", query.getAtom(qAtomNum), qAtomNum + 1,
" has local parity ", qLocalParity,
"; tAtom ", target.getAtom(tAtomNum), tAtomNum + 1,
" has local parity ", tLocalParity,
" (PARITY_EITHER = ", PARITY_EITHER, "); atoms ",
(match ? "match" : "do not match"));
} // if neither query nor target atom is implicit H
return match;
} // compareAtoms(int, int)

} // WavyBondMatcher

ChemAxon 42004978e8

05-05-2011 08:22:29

Hi Bob,


 


The because the explicit hydrogen causes the double bond in the query to be cis/trans specific, while for the target the molecule is symmetric. In global model the latter hasno stereo information. What I couldn't reproduce is the matching of the second pair of molecules. 


Could you please try the command: 


jcsearch -q <query> <target> -t:f --vagueBond:n --stereoModel:g --charge:e --isotope:e --radical:e --valence:d  --doubleBondStereo:A --keepQueryOrder 


on them an send the result?


Which jchem version are you using? Are you using jchem 5.3.8? 


By adding further H atoms to the query it may become symmetric again which enables matching.  


Adding target H makes the target asymmetric and therby becomes a suitable for an asymmetric query. 


We will still come back with the issue of asymmetric nature caused by explicit hydrogens only.


Bye,


Robert

User 870ab5b546

05-05-2011 12:24:23

I'm using JChem 5.4.1.1.


robert-grossmans-ibook-g4:~ bob$ jcsearch -q "C1CCC2=C(C1)CCCC2" "[H]C1CCCC2=C1CCCC2" -t:f --vagueBond:n --stereoModel:g --charge:e --isotope:e --radical:e --valence:d  --doubleBondStereo:A --keepQueryOrder
[H]C1CCCC2=C1CCCC2

Yes, adding H atoms to the target makes them match properly.  Unfortunately, I have no control over how the students draw their structures.  As you point out, the explicit H atom should not affect the cis/trans nature of the double bond.

User 870ab5b546

05-05-2011 12:35:42










rwagner wrote:

What I couldn't reproduce is the matching of the second pair of molecules. 



When I wrote my original post a few days ago, I could reproduce the matching of the second pair of molecules.  Today I can't.  I have no idea why.  Sigh.


Anyway, at least we now know the origin of the bug: the explicit H atom incorrectly causes the double bond to have cis/trans information.

ChemAxon 42004978e8

06-05-2011 06:15:42

Hi Bob,


 


We will correct the global stereo calculation on structures who's asymmetry is caused solely by hydrogens and notify you when the fix is implemented. 


Bye,


Robert