substructure match failure

User 870ab5b546

17-09-2010 17:22:04

The code:


    private MolSearch setupSearchObject(int ignoreFlags) {
final String SELF = "MechSubstructSearch.setupSearchObject: ";
final MolSearchOptions searchOpts = new MolSearchOptions();
searchOpts.setSearchType(SUBSTRUCTURE);
searchOpts.setOrderSensitiveSearch(true);
debugPrint(SELF + "search type == SUBSTRUCTURE, ordersensitive = true");
if ((ignoreFlags & CHARGE_MASK) == 0) {
debugPrint(SELF + "exact charge matching set for search.");
searchOpts.setChargeMatching(CHARGE_MATCHING_EXACT);
}
if ((ignoreFlags & ISOTOPES_MASK) == 0) {
debugPrint(SELF + "exact isotope matching set for search.");
searchOpts.setIsotopeMatching(ISOTOPE_MATCHING_EXACT);
}
if ((ignoreFlags & RADSTATE_MASK) == 0) {
debugPrint(SELF + "exact radical matching set for search.");
searchOpts.setRadicalMatching(RADICAL_MATCHING_EXACT);
}
final MolSearch searchObj = new MolSearch();
searchObj.setSearchOptions(searchOpts);
searchObj.setQuery(authMolecule);
searchObj.setTarget(stageMolecule);
try {
debugPrint(SELF + "query = ", authMolecule, ", target = ",
stageMolecule, ", searchObj.isMatching() = ",
searchObj.isMatching());
} catch (SearchException e) { ; }
return searchObj;
} // setupSearchObject(int)

The target:


<?xml version="1.0" ?>
<cml>
<MDocument>
<MChemicalStruct>
<molecule molID="m1">
<atomArray
atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14"
elementType="O C C C C C C H H H H H H H"
formalCharge="0 0 0 0 0 1 0 0 0 0 0 0 0 0"
x2="-7.074375152587891 -8.40804449363615 -8.40804449363615 -7.074375152587891 -5.740705811539632 -5.740705811539632 -9.74172850577338 -9.74172850577338 -7.074375152587891 -4.4070217994024015 -4.4070217994024015 -10.511720035447272 -11.075412517910612 -8.971736976099491"
y2="1.4919088823101738 0.7218919417511334 -0.8181419393669476 -1.588158879925988 -0.8181419393669476 0.7218919417511334 -1.5881334690408386 1.4918834714250244 -3.128158879925988 -1.5881334690408386 1.4918834714250244 -0.25444945690360754 -2.3581249987147292 -2.9218174811780697"
/>
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a1 a6" order="1" />
<bond atomRefs2="a2 a3" order="2" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a3 a7" order="1" />
<bond atomRefs2="a4 a5" order="2" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a2 a8" order="1" />
<bond atomRefs2="a4 a9" order="1" />
<bond atomRefs2="a5 a10" order="1" />
<bond atomRefs2="a6 a11" order="1" />
<bond atomRefs2="a7 a12" order="1" />
<bond atomRefs2="a7 a13" order="1" />
<bond atomRefs2="a7 a14" order="1" />
</bondArray>
</molecule>
</MChemicalStruct>
<MEFlow id="o2" arcAngle="-254.995522631729" headSkip="0.15"
headLength="0.5" headWidth="0.4" tailSkip="0.25">
<MEFlowBasePoint atomRef="m1.a1" />
<MAtomSetPoint atomRefs="m1.a1 m1.a2" />
</MEFlow>
<MEFlow id="o3" arcAngle="189.9" headSkip="0.15" headLength="0.5"
headWidth="0.4" tailSkip="0.15">
<MAtomSetPoint atomRefs="m1.a2 m1.a3" />
<MAtomSetPoint atomRefs="m1.a3 m1.a4" />
</MEFlow>
<MEFlow id="o4" arcAngle="189.9" headSkip="0.15" headLength="0.5"
headWidth="0.4" tailSkip="0.15">
<MAtomSetPoint atomRefs="m1.a4 m1.a5" />
<MAtomSetPoint atomRefs="m1.a5 m1.a6" />
</MEFlow>
</MDocument>
</cml>

With the following query, I correctly get a substructure match:


<?xml version="1.0" ?>
<cml>
<MDocument>
<MChemicalStruct>
<molecule molID="m1">
<atomArray
atomID="a1 a2 a3 a4 a5 a6"
elementType="O C C C C C"
formalCharge="0 0 0 0 0 1"
x2="-7.074375152587891 -8.40804449363615 -8.40804449363615 -7.074375152587891 -5.740705811539632 -5.740705811539632"
y2="1.4919088823101738 0.7218919417511334 -0.8181419393669476 -1.588158879925988 -0.8181419393669476 0.7218919417511334"
/>
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="2" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="2" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a1 a6" order="1" />
</bondArray>
</molecule>
</MChemicalStruct>
<MEFlow id="o2" arcAngle="-254.995522631729" headSkip="0.15"
headLength="0.5" headWidth="0.4" tailSkip="0.25">
<MEFlowBasePoint atomRef="m1.a1" />
<MAtomSetPoint atomRefs="m1.a1 m1.a2" />
</MEFlow>
<MEFlow id="o3" arcAngle="189.9" headSkip="0.15" headLength="0.5"
headWidth="0.4" tailSkip="0.15">
<MAtomSetPoint atomRefs="m1.a2 m1.a3" />
<MAtomSetPoint atomRefs="m1.a3 m1.a4" />
</MEFlow>
<MEFlow id="o4" arcAngle="189.9" headSkip="0.15" headLength="0.5"
headWidth="0.4" tailSkip="0.15">
<MAtomSetPoint atomRefs="m1.a4 m1.a5" />
<MAtomSetPoint atomRefs="m1.a5 m1.a6" />
</MEFlow>
</MDocument>
</cml>

MechSubstructSearch.setupSearchObject: search type == SUBSTRUCTURE, ordersensitive = true
MechSubstructSearch.setupSearchObject: exact charge matching set for search.
MechSubstructSearch.setupSearchObject: query = [CH+]1OC=CC=C1, target = [H][C+]1OC([H])=C(C([H])=C1[H])C([H])([H])[H], searchObj.isMatching() = true

However, with this next query, the match is inexplicably failing:

<?xml version="1.0" ?>
<cml>
<MDocument>
<MChemicalStruct>
<molecule molID="m1">
<atomArray
atomID="a1 a2 a3 a4 a5 a6 a7"
elementType="O C C C C C C"
formalCharge="0 0 0 0 0 1 0"
x2="-7.074375152587891 -8.40804449363615 -8.40804449363615 -7.074375152587891 -5.740705811539632 -5.740705811539632 -9.74172850577338"
y2="1.4919088823101738 0.7218919417511334 -0.8181419393669476 -1.588158879925988 -0.8181419393669476 0.7218919417511334 -1.5881334690408386"
/>
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="2" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="2" />
<bond atomRefs2="a5 a6" order="1" />
<bond atomRefs2="a1 a6" order="1" />
<bond atomRefs2="a3 a7" order="1" />
</bondArray>
</molecule>
</MChemicalStruct>
<MEFlow id="o2" arcAngle="-254.995522631729" headSkip="0.15"
headLength="0.5" headWidth="0.4" tailSkip="0.25">
<MEFlowBasePoint atomRef="m1.a1" />
<MAtomSetPoint atomRefs="m1.a1 m1.a2" />
</MEFlow>
<MEFlow id="o3" arcAngle="189.9" headSkip="0.15" headLength="0.5"
headWidth="0.4" tailSkip="0.15">
<MAtomSetPoint atomRefs="m1.a2 m1.a3" />
<MAtomSetPoint atomRefs="m1.a3 m1.a4" />
</MEFlow>
<MEFlow id="o4" arcAngle="189.9" headSkip="0.15" headLength="0.5"
headWidth="0.4" tailSkip="0.15">
<MAtomSetPoint atomRefs="m1.a4 m1.a5" />
<MAtomSetPoint atomRefs="m1.a5 m1.a6" />
</MEFlow>
</MDocument>
</cml>
MechSubstructSearch.setupSearchObject: search type == SUBSTRUCTURE, ordersensitive = true
MechSubstructSearch.setupSearchObject: exact charge matching set for search.
MechSubstructSearch.setupSearchObject: query = CC1=CO[CH+]C=C1, target = [H][C+]1OC([H])=C(C([H])=C1[H])C([H])([H])[H], searchObj.isMatching() = false

I cannot for the life of me figure out why the second search fails and the first does not.  Furthermore, if I paste these two structures in my JChem comparator here, the match succeeds.  


Any ideas what is going on?  The unpredictability is quite disturbing.


I get this behavior whether I use findFirst() or isMatching().  I get the same behavior in JChem5.3.3 and JChem 5.3.8pre.

ChemAxon 42004978e8

21-09-2010 19:30:56

Hi Bob,


I tried the non working query-target pair and couldn't reproduce the error. I even got match on your website as well. (substructure and charge-exact, the rest I changed but didn't matter)


I've used the attached query-target pair from you previous post(actually they work in both ways.)


Please refine if this is not the query-target you have used.


Bye,


Robert

User 870ab5b546

22-09-2010 17:11:37

Yes, those are the structures.  And yes, they correctly give a match at our substructure-matching page.  But as you can see from the log, they are not matching within ACE.


I found a solution to the problem.  I needed to set bond vagueness to off.


searchOpts.setVagueBondLevel(VAGUE_BOND_OFF);

With this change, the target matches the query properly.


I will leave it to you to figure out why the default value for bond vagueness caused the two not to match.


    private MolSearch setupSearchObject(int ignoreFlags) {
final String SELF = "MechSubstructSearch.setupSearchObject: ";
final MolSearchOptions searchOpts = new MolSearchOptions();
searchOpts.setSearchType(SUBSTRUCTURE);
searchOpts.setOrderSensitiveSearch(true);
final boolean ignoreChg = ((ignoreFlags & CHARGE_MASK) != 0);
final boolean ignoreIso = ((ignoreFlags & ISOTOPES_MASK) != 0);
final boolean ignoreRad = ((ignoreFlags & RADSTATE_MASK) != 0);
searchOpts.setChargeMatching(ignoreChg
? CHARGE_MATCHING_IGNORE : CHARGE_MATCHING_EXACT);
searchOpts.setChargeMatching(ignoreIso
? ISOTOPE_MATCHING_IGNORE : ISOTOPE_MATCHING_EXACT);
searchOpts.setChargeMatching(ignoreRad
? RADICAL_MATCHING_IGNORE : RADICAL_MATCHING_EXACT);
debugPrint(SELF, ignoreChg ? "ignore" : "exact",
" charge matching, ", ignoreIso ? "ignore" : "exact",
" isotope matching,", ignoreRad ? "ignore" : "exact",
" radical matching set for search.");
searchOpts.setValenceMatching(false);
searchOpts.setVagueBondLevel(VAGUE_BOND_OFF);
final MolSearch searchObj = new MolSearch();
searchObj.setSearchOptions(searchOpts);
searchObj.setQuery(authMolecule);
searchObj.setTarget(stageMolecule);
return searchObj;
} // setupSearchObject(int)

User 870ab5b546

22-09-2010 22:13:22

I updated our JChem comparator page to allow the user to choose the level of bond vagueness.  It turns out that bond vagueness level 1 (the default) is the only vagueness level at which target [H][C+]1OC([H])=C(C([H])=C1[H])C([H])([H])[H] does not match query CC1=CO[CH+]C=C1.  This situation is quite counterintuitive, and I encourage you to investigate it.

ChemAxon a3d59b832c

24-09-2010 07:03:13

Hi Bob,


Thanks for the bug report. We will check what is going on.


Szabolcs

ChemAxon 42004978e8

01-10-2010 09:17:13

Hi Bob,


 


Single ligands on aromatic rings are handled as single or aromatic bonds in case of vague bond level 1. During transforming such bonds the aromatic ring is aromatized as well.


You search code doesn't contain aromatization. Please aromatize your target and query to have correct matches. You can also use the StandardizedMolSearch object, which performs aromatization implicitely.


Bye,


Robert

User 870ab5b546

01-10-2010 12:46:14

Under the circumstances of this substructure search, we cannot aromatize the substrate.  It's a mechanism question, which requires that the single and double bonds stay exactly where they are placed.


What is weird is the way that the structure fails to match only under vagueness level 1 and only when a C atom is attached to the substructure in the query.


This behavior is completely counterintuitive, so, if it is expected according to your algorithm, I would suggest you reexamine your algorithm.

ChemAxon a3d59b832c

01-10-2010 12:51:54

Hi Bob,


 


Vague bond levels were introduced to handle cases where aromaticity is ambiguous or uncertain.


If you are searching the resonants only (no aromatization), then I suggest to turn off vague bond level completely. (By setting it to level 0.)


 


Let us know if it helps.


 


Best regards,


Szabolcs

User 870ab5b546

01-10-2010 13:31:44

That is what I did, as I reported above, and it solved the problem.  


But I still think it is a counterintuitive result.  Note that the search succeeds at all bond vagueness levels other than 1.

ChemAxon 42004978e8

05-10-2010 12:40:22

Hi Bob,


Some comments to the current behaviour:


Vague bond level 1 aims to handle query structures with undefined aromaticity. During handling these structures aromatization is performed in case of such structures. Your structure has a single ligand beside aromatic rings, which is treated at this level as vague aromatic - single or aromatic. To identify such structures the query has to be aromatized. 


During vague bond 1 handling for such structures the aromatic versions are generated in the aromatized form or otherwise they don't match an aromatized target. (Which is the supposed way of executing a search). The logic is, that if you want to have vague aromaticity handling then you have to be prepared to handle the aromatic query. Which means target aromatization, which implies query's as well.


Vague bond level 0 disregards these vague aromaticity structures. It's this option that you need for matching the bond types without executing aromatization.


Further vague bond levels (2,3) produce match in your case because they treat all ring bonds as  single/aromatic or double/aromatic.


 


Bye,


Robert