diastereotopic H atoms

User 870ab5b546

08-02-2007 02:58:57

Doing an order-sensitive search of this molecule against itself and looking at the array matches:





Code:
<?xml version="1.0" ?>


<MDocument>


  <MChemicalStruct>


    <molecule molID="m1">


      <atomArray


          atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11"


          elementType="Br C C C Br H H Cl H H H"


          x2="-7.411250114440918 -6.077570992612882 -4.7438918707848465 -3.4102127489568113 -2.076533627128775 -6.847570992612882 -5.307570992612882 -5.513891870784846 -3.973891870784846 -4.180212748956811 -2.6402127489568112"


          y2="1.7324999570846558 2.502499957084656 1.7324999570846558 2.502499957084656 1.7324999570846562 3.8361790789126915 3.836179078912691 0.39882083525662004 0.3988208352566205 3.836179078912691 3.836179078912691"


          />


      <bondArray>


        <bond atomRefs2="a1 a2" order="1" />


        <bond atomRefs2="a2 a3" order="1" />


        <bond atomRefs2="a3 a4" order="1" />


        <bond atomRefs2="a4 a5" order="1" />


        <bond atomRefs2="a2 a6" order="1">


          <bondStereo>W</bondStereo>


        </bond>


        <bond atomRefs2="a2 a7" order="1">


          <bondStereo>H</bondStereo>


        </bond>


        <bond atomRefs2="a3 a8" order="1">


          <bondStereo>W</bondStereo>


        </bond>


        <bond atomRefs2="a3 a9" order="1">


          <bondStereo>H</bondStereo>


        </bond>


        <bond atomRefs2="a4 a10" order="1">


          <bondStereo>W</bondStereo>


        </bond>


        <bond atomRefs2="a4 a11" order="1">


          <bondStereo>H</bondStereo>


        </bond>


      </bondArray>


    </molecule>


  </MChemicalStruct>


</MDocument>






We get these matches:





{1,2,3,4,5,6,7,8,9,10,11}


{1,2,3,4,5,6,7,8,9,11,10}


{1,2,3,4,5,7,6,8,9,10,11}


{1,2,3,4,5,7,6,8,9,11,10}


{5,4,3,2,1,10,11,8,9,6,7}


{5,4,3,2,1,10,11,8,9,7,6}


{5,4,3,2,1,11,10,8,9,6,7}


{5,4,3,2,1,11,10,8,9,7,6}





Note that the H atoms on the two outside C atoms are treated as equivalent. But they're not; they're diastereotopic. Is there a way of specifying in a search that diastereotopic H atoms should not be matched to one another?





Note that turning off the order-sensitive search won't do the trick, because then we'll randomly get the correct or incorrect matches. For example, I turned off order-sensitive search, then switched the stereochemistry of H10 and H11 in the query, but the match arrays before and after were the same.





We're using exactStereoMatching and global stereomodel.

ChemAxon a3d59b832c

08-02-2007 21:55:52

Hi,





As C2 and C4 have two hydrogens each, they are not stereo centers in the global stereo model. This means that the stereo bonds next to C2 and C4 are treated as plain bonds.





If you would like to distinguish these two, the local stereo model should be used. However, this is not perfect ether:





Code:
$ jcsearch --orderSensitive --stereoModel:l --allHits -q bob.mrv bob.mrv


    Query has 4 matches:


        Match 1:[    1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11 ]


        Match 2:[    1,   2,   3,   4,   5,   6,   7,   8,   9,  11,  10 ]


        Match 3:[    1,   2,   3,   4,   5,   7,   6,   8,   9,  10,  11 ]


        Match 4:[    1,   2,   3,   4,   5,   7,   6,   8,   9,  11,  10 ]


[H]C([H])(Br)C([H])(Cl)C([H])([H])Br






The reason for this is in the parity calculation method. We use the standard parity method as described in Appendix A of the CT File Formats document: http://www.mdl.com/solutions/white_papers/ctfile_formats.jsp


However, this method is not prepared for the case when two hydrogens are connected to an atom, as that is not a stereo center. (Hydrogens have a special role in this definition of parity, so that implied hydrogens can be handled the same way as explicit ones.)





If the H atoms are replaced with other atom type, there is only one match, as expected:





Code:
$ jcsearch --orderSensitive --stereoModel:l --allHits -q bob_m.mrv bob_m.mrv


    Query has 1 match:


        Match 1:[    1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11 ]


ClC(I)(C(Br)(I)I)C(Br)(I)I






We will think about how this parity algorithm could be improved, thank you for pointing this out.





We are also working on a comprehensive stereo model which (we hope) can make global and local stereo models obsolete.





Best regards,


Szabolcs

User 870ab5b546

08-02-2007 22:17:55

The problem is not exclusive to H atoms. Consider this target and query:





Code:
<?xml version="1.0" ?>


<MDocument>


  <MChemicalStruct>


    <molecule molID="m1">


      <atomArray


          atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10"


          elementType="Br C C C Br Br C C C C"


          x2="-5.541666507720947 -4.2079873858929115 -2.874308264064876 -1.5406291422368406 -0.20695002040880484 -3.6443082640648754 -4.2079873858929115 -1.5406291422368406 -5.2969318289201945 -2.629573585264124"


          y2="-0.5249999761581421 0.24500002384185782 -0.5249999761581422 0.24500002384185782 -0.5249999761581418 -1.8586790979861778 1.785000023841858 1.785000023841858 1.333944466869141 1.333944466869141"


          />


      <bondArray>


        <bond atomRefs2="a1 a2" order="1" />


        <bond atomRefs2="a2 a3" order="1" />


        <bond atomRefs2="a3 a4" order="1" />


        <bond atomRefs2="a4 a5" order="1" />


        <bond atomRefs2="a3 a6" order="1">


          <bondStereo>W</bondStereo>


        </bond>


        <bond atomRefs2="a2 a7" order="1">


          <bondStereo>H</bondStereo>


        </bond>


        <bond atomRefs2="a4 a8" order="1">


          <bondStereo>H</bondStereo>


        </bond>


        <bond atomRefs2="a2 a9" order="1">


          <bondStereo>W</bondStereo>


        </bond>


        <bond atomRefs2="a4 a10" order="1">


          <bondStereo>W</bondStereo>


        </bond>


      </bondArray>


    </molecule>


  </MChemicalStruct>


</MDocument>






Setting order-sensitive search to false, the matches are:





{1,2,3,4,5,6,7,8,9,10}


{1,2,3,4,5,6,7,10,9,8}


{1,2,3,4,5,6,9,8,7,10}


{1,2,3,4,5,6,9,10,7,8}


{5,4,3,2,1,6,8,7,10,9}


{5,4,3,2,1,6,8,9,10,7}


{5,4,3,2,1,6,10,7,8,9}


{5,4,3,2,1,6,10,9,8,7}





As you can see, C7-C10 are treated as interchangeable. Given the indicated stereochemistry, C7 should match to C8, but not to C9 or C10.





Perhaps you can use an isotopic substitution test to test for diastereotopicity. Make a copy of the original compound, and substitute one of the atoms in question with an element not already present in the compound. (I would use a pseudoatom in the API.) Then make another copy of the original compound, and substitute the other atom in question with the same element. Then see if the two structures are diastereomers. If they are, then the two atoms in question are diastereotopic, and they can't be allowed to match to one another.





To keep from slowing down the algorithms, I suggest you add an option to test for diastereotopicity when generating matches, and have the test off by default.

ChemAxon a3d59b832c

08-02-2007 22:36:03

I guess you still used the global stereo model. Local should be OK for your latter molecule.





Thank you for the suggestion, we will consider this. At the first sight it indeed seems computationally expensive, I hope we can find a lighter solution.

User 870ab5b546

08-02-2007 22:52:57

I tried a search with local parity, and it gave only a single match, even with order-sensitive search set to true. So that result isn't correct, either.





{1,2,3,4,5,6,7,8,9,10}





BTW, the page I am using to test the comparisons is here. It's really quite nice: allows you to set all sorts of flags, and it uses AJAX.

ChemAxon a3d59b832c

08-02-2007 23:18:13

bobgr wrote:
I tried a search with local parity, and it gave only a single match, even with order-sensitive search set to true. So that result isn't correct, either.





{1,2,3,4,5,6,7,8,9,10}
I thought this was the correct result. Could you explain what result do you expect and why?

User 870ab5b546

09-02-2007 00:26:14

There is a plane of symmetry that relates the two halves of the molecules.





So I would expect to see,





{1,2,3,4,5,6,7,8,9,10}


{5,4,3,2,1,6,8,7,10,9}





I can imagine one reason why one might not see this match. Atom pairs 1 and 5, 7 and 8, 9 and 10, etc. are enantiotopic, not homotopic. If you say that enantiotopic atoms shouldn't match, fine. But then the two Cl atoms in CHBrCl2 shouldn't match, either, because they are also enantiotopic.





If I erase the CH3 groups in the structure above, I get two matches. But the symmetry of the two structures is the same, so there should be the same number of matches.





The real problem here is that we are trying to use a two-state variable (a boolean, match/no match) to describe a three-state situation (diastereotopic, enantiotopic, homotopic). It's like trying to do a direct comparison between French and German gender.





Maybe you need another flag to determine whether enantiotopic atoms should match.





-- Bob

ChemAxon a3d59b832c

12-02-2007 13:49:39

Let us think about it.





In my view, we can only handle the symmetric atoms: homotopic, enantiotopic and diastereotopic the same way in searching: either match all of them or match none. (I am talking about the case when all stereochemical configuration is shown.)





The case of the hydrogens is somewhat different, the current model is not handling the symmetrical cases at all, but we can solve that, probably by introducing a new option.





Is your main purpose to decide whether two symmetrical atoms are homotopic, enantiotopic or diastereotopic? If so, maybe we could handle this question independent of structure searching.

User 870ab5b546

12-02-2007 14:17:26

Szabolcs wrote:
Let us think about it.





In my view, we can only handle the symmetric atoms: homotopic, enantiotopic and diastereotopic the same way in searching: either match all of them or match none. (I am talking about the case when all stereochemical configuration is shown.)
Yes, of course. If configuration is not shown, then the atoms should match, as they do now.
Quote:
The case of the hydrogens is somewhat different, the current model is not handling the symmetrical cases at all, but we can solve that, probably by introducing a new option.





Is your main purpose to decide whether two symmetrical atoms are homotopic, enantiotopic or diastereotopic? If so, maybe we could handle this question independent of structure searching.
I'm not sure whether that would work. Here's what we're trying to do. We want to compare a pattern of atom mapping of the student with a pattern of atom mapping of the author. The actual numbers aren't as important as the mapping. So, if the student is asked to map each element with a different number, and the author writes,





C C N O C


1 1 2 3 1





then a matched pattern by the student might be,





C C N O C


2 2 3 1 2





Currently, the way we are doing this is by,





int[] match = findFirst();


while (match != null) {


...


match = findNext();


}





We compare the maps of the atoms in each match array and see if we can find an array where the mapped atoms match. The problem is that findNext() is returning matches in which the stereochemistry of the H atoms should cause certain atoms *not* to match.





If your proposed method would work on atoms in two different compounds, then, after getting the match back from findNext(), we could do another test to make sure that none of the matched atoms are diastereotopic. But it seems to me that that test should be incorporated into the search process to begin with.





If your proposed method would only work on atoms in a single compound, then it would probably not suffice for us.

ChemAxon a3d59b832c

14-02-2007 11:42:20

Hi Bob,





After much thinking about all the above discussion, let me summarize the findings:





1. For the handling of these symmetrical systems, local parity should be used instead of global parity.





2. The current method cannot handle configurations with two Hydrogens on any given stereocenter. This problem could be handled in two ways: a) all explicit hydrogens are exchanged into another atom type, e.g. a pseudo atom b) We introduce an option to consider two explicit hydrogens in (local) parity calculation.





3. There is an inconsistency in the self-matchings of molecules "BrC[C@H](Br)CBr |@@:2|" and "CC(C)(Br)C(Br)C(C)(C)Br |@:1,@@:4,6|", the first having two, the latter having only one matching on itself(local stereo model), despite the two molecules have the same symmetry.





Having played a bit with plastic molecular models (real ball and stick), I think that there should be only one self-match for both molecules, when local stereo model is used. We will look into why the program reports two for the smaller one, it seems to be a bug.





4. A method to decide homotopicity, enantiotopicity and diastereotopicity would not really help, as always two molecules should be handled.





So my question is: Do you think solution 2a and fixing the bug at 3 would be suitable for your needs?





Best regards,


Szabolcs

User 870ab5b546

14-02-2007 13:24:56

Yes, I think if you can treat explicit H atoms as pseudoatoms within the context of the search, it will meet our needs.

ChemAxon a3d59b832c

16-02-2007 14:05:37

bobgr wrote:
Yes, I think if you can treat explicit H atoms as pseudoatoms within the context of the search, it will meet our needs.
Actually, you can do it easily. Slight modification of your method replaceHnoClone :





Code:
    public static Molecule replaceHnoClone(Molecule mol) {


        String hydrogenLabel = "H";


        for (int i=0; i<mol.getNodeCount(); i++) {


            MolAtom atom = (MolAtom) mol.getNode(i);


            if (atom.getSymbol().equalsIgnoreCase(hydrogenLabel)) {


                atom.setAtno(MolAtom.PSEUDO);


                atom.setAliasstr("PSEUDO_H");


            }


        } // for each node (atom)


        return mol;


    } // replaceHnoClone(Molecule)






This should be called on the molecules before searching.

User 870ab5b546

16-02-2007 16:36:17

Great idea! It works beautifully. Thanks!

User 870ab5b546

09-09-2008 15:02:28

Can you explain exactly what is the difference between the local and global stereo model matches? In what contexts would one want one or the other?

ChemAxon 42004978e8

09-09-2008 16:46:11

Hi Bob,





Local stereo model specifies the stereo information based on the neighbouring atoms. So it doesn't study whether two ligands of stereo center are the same.


This search type is useful in case of substructure search, because using this it can be specified that at a given position you want to have an atom with stereo information even though the query is symmetric.





Global stereo considers symmetry on the query side and for stereo centers with symmetrical ligands no stereo information is required on the target side. This is option useful for exact, exact fragment and perfect searches because here the symmetric query fragment can't match on a larger asymmetric stereo target.





The third option - comprehensive search - behaves in the similarly as the local model, but if the target is symmetric, stereo information for the given atom is not required.


(see http://www.chemaxon.com/jchem/doc/user/query_stereochemistry.html#stereomodels)





Best Regards,





Robert