User 870ab5b546
08-02-2007 02:58:57
Doing an order-sensitive search of this molecule against itself and looking at the array matches:
Code: |
<?xml version="1.0" ?>
<MDocument>
<MChemicalStruct>
<molecule molID="m1">
<atomArray
atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11"
elementType="Br C C C Br H H Cl H H H"
x2="-7.411250114440918 -6.077570992612882 -4.7438918707848465 -3.4102127489568113 -2.076533627128775 -6.847570992612882 -5.307570992612882 -5.513891870784846 -3.973891870784846 -4.180212748956811 -2.6402127489568112"
y2="1.7324999570846558 2.502499957084656 1.7324999570846558 2.502499957084656 1.7324999570846562 3.8361790789126915 3.836179078912691 0.39882083525662004 0.3988208352566205 3.836179078912691 3.836179078912691"
/>
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a2 a6" order="1">
<bondStereo>W</bondStereo>
</bond>
<bond atomRefs2="a2 a7" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a3 a8" order="1">
<bondStereo>W</bondStereo>
</bond>
<bond atomRefs2="a3 a9" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a4 a10" order="1">
<bondStereo>W</bondStereo>
</bond>
<bond atomRefs2="a4 a11" order="1">
<bondStereo>H</bondStereo>
</bond>
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument> |
We get these matches:
{1,2,3,4,5,6,7,8,9,10,11}
{1,2,3,4,5,6,7,8,9,11,10}
{1,2,3,4,5,7,6,8,9,10,11}
{1,2,3,4,5,7,6,8,9,11,10}
{5,4,3,2,1,10,11,8,9,6,7}
{5,4,3,2,1,10,11,8,9,7,6}
{5,4,3,2,1,11,10,8,9,6,7}
{5,4,3,2,1,11,10,8,9,7,6}
Note that the H atoms on the two outside C atoms are treated as equivalent. But they're not; they're diastereotopic. Is there a way of specifying in a search that diastereotopic H atoms should not be matched to one another?
Note that turning off the order-sensitive search won't do the trick, because then we'll randomly get the correct or incorrect matches. For example, I turned off order-sensitive search, then switched the stereochemistry of H10 and H11 in the query, but the match arrays before and after were the same.
We're using exactStereoMatching and global stereomodel.
ChemAxon a3d59b832c
08-02-2007 21:55:52
Hi,
As C2 and C4 have two hydrogens each, they are not stereo centers in the global stereo model. This means that the stereo bonds next to C2 and C4 are treated as plain bonds.
If you would like to distinguish these two, the local stereo model should be used. However, this is not perfect ether:
Code: |
$ jcsearch --orderSensitive --stereoModel:l --allHits -q bob.mrv bob.mrv
Query has 4 matches:
Match 1:[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ]
Match 2:[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 10 ]
Match 3:[ 1, 2, 3, 4, 5, 7, 6, 8, 9, 10, 11 ]
Match 4:[ 1, 2, 3, 4, 5, 7, 6, 8, 9, 11, 10 ]
[H]C([H])(Br)C([H])(Cl)C([H])([H])Br |
The reason for this is in the parity calculation method. We use the standard parity method as described in Appendix A of the CT File Formats document: http://www.mdl.com/solutions/white_papers/ctfile_formats.jsp
However, this method is not prepared for the case when two hydrogens are connected to an atom, as that is not a stereo center. (Hydrogens have a special role in this definition of parity, so that implied hydrogens can be handled the same way as explicit ones.)
If the H atoms are replaced with other atom type, there is only one match, as expected:
Code: |
$ jcsearch --orderSensitive --stereoModel:l --allHits -q bob_m.mrv bob_m.mrv
Query has 1 match:
Match 1:[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ]
ClC(I)(C(Br)(I)I)C(Br)(I)I |
We will think about how this parity algorithm could be improved, thank you for pointing this out.
We are also working on a comprehensive stereo model which (we hope) can make global and local stereo models obsolete.
Best regards,
Szabolcs
User 870ab5b546
08-02-2007 22:17:55
The problem is not exclusive to H atoms. Consider this target and query:
Code: |
<?xml version="1.0" ?>
<MDocument>
<MChemicalStruct>
<molecule molID="m1">
<atomArray
atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10"
elementType="Br C C C Br Br C C C C"
x2="-5.541666507720947 -4.2079873858929115 -2.874308264064876 -1.5406291422368406 -0.20695002040880484 -3.6443082640648754 -4.2079873858929115 -1.5406291422368406 -5.2969318289201945 -2.629573585264124"
y2="-0.5249999761581421 0.24500002384185782 -0.5249999761581422 0.24500002384185782 -0.5249999761581418 -1.8586790979861778 1.785000023841858 1.785000023841858 1.333944466869141 1.333944466869141"
/>
<bondArray>
<bond atomRefs2="a1 a2" order="1" />
<bond atomRefs2="a2 a3" order="1" />
<bond atomRefs2="a3 a4" order="1" />
<bond atomRefs2="a4 a5" order="1" />
<bond atomRefs2="a3 a6" order="1">
<bondStereo>W</bondStereo>
</bond>
<bond atomRefs2="a2 a7" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a4 a8" order="1">
<bondStereo>H</bondStereo>
</bond>
<bond atomRefs2="a2 a9" order="1">
<bondStereo>W</bondStereo>
</bond>
<bond atomRefs2="a4 a10" order="1">
<bondStereo>W</bondStereo>
</bond>
</bondArray>
</molecule>
</MChemicalStruct>
</MDocument> |
Setting order-sensitive search to false, the matches are:
{1,2,3,4,5,6,7,8,9,10}
{1,2,3,4,5,6,7,10,9,8}
{1,2,3,4,5,6,9,8,7,10}
{1,2,3,4,5,6,9,10,7,8}
{5,4,3,2,1,6,8,7,10,9}
{5,4,3,2,1,6,8,9,10,7}
{5,4,3,2,1,6,10,7,8,9}
{5,4,3,2,1,6,10,9,8,7}
As you can see, C7-C10 are treated as interchangeable. Given the indicated stereochemistry, C7 should match to C8, but not to C9 or C10.
Perhaps you can use an isotopic substitution test to test for diastereotopicity. Make a copy of the original compound, and substitute one of the atoms in question with an element not already present in the compound. (I would use a pseudoatom in the API.) Then make another copy of the original compound, and substitute the other atom in question with the same element. Then see if the two structures are diastereomers. If they are, then the two atoms in question are diastereotopic, and they can't be allowed to match to one another.
To keep from slowing down the algorithms, I suggest you add an option to test for diastereotopicity when generating matches, and have the test off by default.
ChemAxon a3d59b832c
08-02-2007 22:36:03
I guess you still used the global stereo model. Local should be OK for your latter molecule.
Thank you for the suggestion, we will consider this. At the first sight it indeed seems computationally expensive, I hope we can find a lighter solution.
User 870ab5b546
08-02-2007 22:52:57
I tried a search with local parity, and it gave only a single match, even with order-sensitive search set to true. So that result isn't correct, either.
{1,2,3,4,5,6,7,8,9,10}
BTW, the page I am using to test the comparisons is
here. It's really quite nice: allows you to set all sorts of flags, and it uses AJAX.
User 870ab5b546
09-02-2007 00:26:14
There is a plane of symmetry that relates the two halves of the molecules.
So I would expect to see,
{1,2,3,4,5,6,7,8,9,10}
{5,4,3,2,1,6,8,7,10,9}
I can imagine one reason why one might not see this match. Atom pairs 1 and 5, 7 and 8, 9 and 10, etc. are enantiotopic, not homotopic. If you say that enantiotopic atoms shouldn't match, fine. But then the two Cl atoms in CHBrCl2 shouldn't match, either, because they are also enantiotopic.
If I erase the CH3 groups in the structure above, I get two matches. But the symmetry of the two structures is the same, so there should be the same number of matches.
The real problem here is that we are trying to use a two-state variable (a boolean, match/no match) to describe a three-state situation (diastereotopic, enantiotopic, homotopic). It's like trying to do a direct comparison between French and German gender.
Maybe you need another flag to determine whether enantiotopic atoms should match.
-- Bob
ChemAxon a3d59b832c
12-02-2007 13:49:39
Let us think about it.
In my view, we can only handle the symmetric atoms: homotopic, enantiotopic and diastereotopic the same way in searching: either match all of them or match none. (I am talking about the case when all stereochemical configuration is shown.)
The case of the hydrogens is somewhat different, the current model is not handling the symmetrical cases at all, but we can solve that, probably by introducing a new option.
Is your main purpose to decide whether two symmetrical atoms are homotopic, enantiotopic or diastereotopic? If so, maybe we could handle this question independent of structure searching.
ChemAxon a3d59b832c
14-02-2007 11:42:20
Hi Bob,
After much thinking about all the above discussion, let me summarize the findings:
1. For the handling of these symmetrical systems, local parity should be used instead of global parity.
2. The current method cannot handle configurations with two Hydrogens on any given stereocenter. This problem could be handled in two ways: a) all explicit hydrogens are exchanged into another atom type, e.g. a pseudo atom b) We introduce an option to consider two explicit hydrogens in (local) parity calculation.
3. There is an inconsistency in the self-matchings of molecules "BrC[C@H](Br)CBr |@@:2|" and "CC(C)(Br)C(Br)C(C)(C)Br |@:1,@@:4,6|", the first having two, the latter having only one matching on itself(local stereo model), despite the two molecules have the same symmetry.
Having played a bit with plastic molecular models (real ball and stick), I think that there should be only one self-match for both molecules, when local stereo model is used. We will look into why the program reports two for the smaller one, it seems to be a bug.
4. A method to decide homotopicity, enantiotopicity and diastereotopicity would not really help, as always two molecules should be handled.
So my question is: Do you think solution 2a and fixing the bug at 3 would be suitable for your needs?
Best regards,
Szabolcs
User 870ab5b546
14-02-2007 13:24:56
Yes, I think if you can treat explicit H atoms as pseudoatoms within the context of the search, it will meet our needs.
User 870ab5b546
16-02-2007 16:36:17
Great idea! It works beautifully. Thanks!
User 870ab5b546
09-09-2008 15:02:28
Can you explain exactly what is the difference between the local and global stereo model matches? In what contexts would one want one or the other?
ChemAxon 42004978e8
09-09-2008 16:46:11
Hi Bob,
Local stereo model specifies the stereo information based on the neighbouring atoms. So it doesn't study whether two ligands of stereo center are the same.
This search type is useful in case of substructure search, because using this it can be specified that at a given position you want to have an atom with stereo information even though the query is symmetric.
Global stereo considers symmetry on the query side and for stereo centers with symmetrical ligands no stereo information is required on the target side. This is option useful for exact, exact fragment and perfect searches because here the symmetric query fragment can't match on a larger asymmetric stereo target.
The third option - comprehensive search - behaves in the similarly as the local model, but if the target is symmetric, stereo information for the given atom is not required.
(see
http://www.chemaxon.com/jchem/doc/user/query_stereochemistry.html#stereomodels)
Best Regards,
Robert