2 structures give 4 CXSMILES?

User 62a37f4796

04-06-2009 19:07:11

I have a coordination structure problem.


I have both forms of the coordination structures in the attached MRV file.


When I open the file up in MSketch it displays fine.


When I open the file up in a text editor and follow all the bond connections, I end up with two disconnected structures.


However, when I export them as CXSMILES, I get 4 SMILES strings connected with dots:


N[Co:3]1(N)(N)(N)O[Co:1]234(O1)O[Co:2](N)(N)(N)(N)O2.N[Co:4](N)(N)(N)(O3)O4.N[Co:8]1(N)(N)(N)O[Co:5]234(O1)O[Co:6](N)(N)(N)(N)O2.N[Co:7](N)(N)(N)(O3)O4


This doesn't make any sense to me. Is there a bug in the exporter? MBeans seems to do the same thing as MSketch.


I'm interested to see if students can get the wedge bonds correct, so I'd like to retrieve CXSMILES +w if at all possible.


BTW, retrieving SMILES instead of CXSMILES gives me the same problem.


 


Thanks.


 


(I know the bond order doesn't stack up strictly speaking, but coordination bonds do not convey the stereo information correctly.)


 


 

ChemAxon 25dcd765a3

05-06-2009 10:11:24

Hi,


I think you may misunderstand the SMILES documentation.


Two dot connected structure does not mean that the molecule has two fragments (two disconnected part).


Take just the following example:


C1CCC.C1CCC


it equals with the following non-dot connected structure:


CCCCCCCC


 


Regarding your other question, according to the cxsmiles documentation (http://www.chemaxon.com/marvin/help/formats/cxsmiles-doc.html):


"Atom indexes relating to wiggly bonds are written after "w:"
followed by a dot character and the wiggly bond index.
The wiggly bonds are separated by commas.

If atomic coordinates are also exported, then UP bonds are written
after "wU:"
DOWN bonds are written after "wD:" in a similar way to
wiggly bond export."


So the wedge bond can be exported if and only if the coordinates are also exported.


I should call your attention that until now the most common tetrahedral parity class is implemented, and the Octahedral class which is your case is not.


 


Andras

User 62a37f4796

08-06-2009 13:29:48










volfi wrote:

Hi,


I think you may misunderstand the SMILES documentation.


Two dot connected structure does not mean that the molecule has two fragments (two disconnected part).


Take just the following example:


C1CCC.C1CCC


it equals with the following non-dot connected structure:


CCCCCCCC


 


Regarding your other question, according to the cxsmiles documentation (http://www.chemaxon.com/marvin/help/formats/cxsmiles-doc.html):


"Atom indexes relating to wiggly bonds are written after "w:"
followed by a dot character and the wiggly bond index.
The wiggly bonds are separated by commas.

If atomic coordinates are also exported, then UP bonds are written
after "wU:"
DOWN bonds are written after "wD:" in a similar way to
wiggly bond export."


So the wedge bond can be exported if and only if the coordinates are also exported.


I should call your attention that until now the most common tetrahedral parity class is implemented, and the Octahedral class which is your case is not.


 


Andras



I think I understand what you are saying.


If I understand the implication correct, though, it means that I cannot rely on MBeans to consistently give me the same CXSMILES (ignoring the actual coordinates) for a given orientation, because I cannot predict how or where MBeans will split the CXSMILES.


Suppose one of my students entered the same set of structures but started with a different orientation 180 degrees turned around the y-axis. Would MBeans still break off the same segment into the disconnected/dot portion of the CXSMILES?


When I flipped my structures around a couple of times in MSketch, I get this different CXSMILES for the same set:


N[Co]1(N)(N)(N)O[Co]234(O1)O[Co](N)(N)(N)(N)O2.N[Co](N)(N)(N)(O3)O4.N[Co]1(N)(N)(N)O[Co]234(O1)O[Co](N)(N)(N)(N)O2.N[Co](N)(N)(N)(O3)O4


vs.


 


N[Co:3]1(N)(N)(N)O[Co:1]234(O1)O[Co:2](N)(N)(N)(N)O2.N[Co:4](N)(N)(N)(O3)O4.N[Co:8]1(N)(N)(N)O[Co:5]234(O1)O[Co:6](N)(N)(N)(N)O2.N[Co:7](N)(N)(N)(O3)O4


The segments still seem to be broken up with the same pattern, but the atom numbers have changed.


Algorithmically, how would I be able to tell if a student submitted a "correct" answer? Do any of the export flags influence this behavior?

ChemAxon 25dcd765a3

08-06-2009 16:04:08

Hi,


I don't really understand what is the connection between splitting the SMILES string and generating the same SMILES.


What I really don't understand is that what is the problem with the dot connected SMILES strings.


Generally the generated SMILES string is independent from the coordinates and atom indexes.


In your case there is an octahedral stereocenter. As octahedral parity not yet implemented your smiles string will always be the same not taking care of the stereoconfiguration, you will not be able to differentiate the stereoizomers.


The difference in the smiles strings you mention is that in the second case you mapped the Cobalt atoms (so the two molecule was not the same as the second had atom maps). Beside the atom maps I don't see any difference.


 

User 62a37f4796

09-06-2009 12:18:06










volfi wrote:

Hi,


I don't really understand what is the connection between splitting the SMILES string and generating the same SMILES.


What I really don't understand is that what is the problem with the dot connected SMILES strings.


The problem is: does it always split off the same segment and in the same order? If it doesn't reproducibly do so, then a "unique" SMILES is not really unique but degenerate.


Generally the generated SMILES string is independent from the coordinates and atom indexes.


I should think so, but I did not create the atom mappings in the SMILES I posted. Marvin created those atom indexes, and since they are different depending on the orientation of the molecule on the canvas I don't really consider them to be unique SMILES and I can't use them to compare student answers against my own.


In your case there is an octahedral stereocenter. As octahedral parity not yet implemented your smiles string will always be the same not taking care of the stereoconfiguration, you will not be able to differentiate the stereoizomers.


Is there or will there be an effort to include this important geometry? We use it very frequently in (transition metal) coordination chemistry.


The difference in the smiles strings you mention is that in the second case you mapped the Cobalt atoms (so the two molecule was not the same as the second had atom maps). Beside the atom maps I don't see any difference.


I did not create atom maps. Those are the indices being created and returned by Marvin. Apparently the canonicalizer is sensitive to the orientation of the structure on the canvas, which seems a bad thing to me.


 



 

ChemAxon 25dcd765a3

10-06-2009 10:17:45

The problem is: does it always split off the
same segment and in the same order?


Yes.


If it doesn't reproducibly do so,
then a "unique" SMILES is not really unique but degenerate.


It does.


I should think so, but I did not create the
atom mappings in the SMILES I posted. Marvin created those atom
indexes, and since they are different depending on the orientation of
the molecule on the canvas I don't really consider them to be unique
SMILES and I can't use them to compare student answers against my own.


Could you please send me how could you generate the atom indexes into the SMILES string.


I tried to reproduce it but without success. I have tried to rotate, translate the molecule, change the coordinates, change atom indexes.


Is there or will there be an effort to
include this important geometry? We use it very frequently in
(transition metal) coordination chemistry.


Yes, this is an important feature, but the development is depending our schedule.


I did not create atom maps. Those are the
indices being created and returned by Marvin. Apparently the
canonicalizer is sensitive to the orientation of the structure on the
canvas, which seems a bad thing to me.


I could not reproduce this either. So what should I do to be able to generate the second smiles string with the indexes. Also could you attach the example where the canonicalizer is sensitive to the orientation of the molecule?



User 62a37f4796

10-06-2009 19:06:34










volfi wrote:

The problem is: does it always split off the
same segment and in the same order?


Yes.


If it doesn't reproducibly do so,
then a "unique" SMILES is not really unique but degenerate.


It does.


I should think so, but I did not create the
atom mappings in the SMILES I posted. Marvin created those atom
indexes, and since they are different depending on the orientation of
the molecule on the canvas I don't really consider them to be unique
SMILES and I can't use them to compare student answers against my own.


Could you please send me how could you generate the atom indexes into the SMILES string.


I didn't. All I did was export the CXSMILES with the standard flags.


I tried to reproduce it but without success. I have tried to rotate, translate the molecule, change the coordinates, change atom indexes.


I started with the MRV I originally posted; selected the contents; did Edit >> Transform >> Horizontal Flip and Edit >> Transform >> Vertical Flip; then adjusted the wedge bonds so front and back conform.


Is there or will there be an effort to
include this important geometry? We use it very frequently in
(transition metal) coordination chemistry.


Yes, this is an important feature, but the development is depending our schedule.


We are also having problems with square planar geometry, which doesn't seem to be recognized when generating the CXSMILES, but that's another topic.


I did not create atom maps. Those are the
indices being created and returned by Marvin. Apparently the
canonicalizer is sensitive to the orientation of the structure on the
canvas, which seems a bad thing to me.


I could not reproduce this either. So what should I do to be able to generate the second smiles string with the indexes. Also could you attach the example where the canonicalizer is sensitive to the orientation of the molecule?





Attached is the MRV for the flipped structures.


I already posted the CXSMILES I obtained inline, but I attached the files generated by Marvin anyway.

ChemAxon 25dcd765a3

10-06-2009 21:08:11

Thank you for the attached files.


Here are my results:


molconvert smiles /tmp/ZumChem7_21.AE.066.flipped.mrv
N[Co]1(N)(N)(N)O[Co]234(O1)O[Co](N)(N)(N)(N)O2.N[Co](N)(N)(N)(O3)O4.N[Co]1(N)(N)(N)O[Co]234(O1)O[Co](N)(N)(N)(N)O2.N[Co](N)(N)(N)(O3)O4


The result is exactly the same as in your attached ZumChem7_21.AE.066.flipped.cxsmiles.


molconvert smiles /tmp/ZumChem7_21.AE.066.mrv.mrv
N[Co]1(N)(N)(N)O[Co]234(O1)O[Co](N)(N)(N)(N)O2.N[Co](N)(N)(N)(O3)O4.N[Co]1(N)(N)(N)O[Co]234(O1)O[Co](N)(N)(N)(N)O2.N[Co](N)(N)(N)(O3)O4


The result is exactly the same as in your attached ZumChem7_21.AE.066.flipped.cxsmiles. But differs from ZumChem7_21.AE.066.mrv.cxsmiles.


So I was still not able to generate different smiles strings as all the mrv files resulted in  ZumChem7_21.AE.066.flipped.cxsmiles.


How can I generate ZumChem7_21.AE.066.mrv.cxsmiles?

User 62a37f4796

24-06-2009 16:18:57










volfi wrote:

Thank you for the attached files.


Here are my results:


molconvert smiles /tmp/ZumChem7_21.AE.066.flipped.mrv
N[Co]1(N)(N)(N)O[Co]234(O1)O[Co](N)(N)(N)(N)O2.N[Co](N)(N)(N)(O3)O4.N[Co]1(N)(N)(N)O[Co]234(O1)O[Co](N)(N)(N)(N)O2.N[Co](N)(N)(N)(O3)O4


The result is exactly the same as in your attached ZumChem7_21.AE.066.flipped.cxsmiles.


molconvert smiles /tmp/ZumChem7_21.AE.066.mrv.mrv
N[Co]1(N)(N)(N)O[Co]234(O1)O[Co](N)(N)(N)(N)O2.N[Co](N)(N)(N)(O3)O4.N[Co]1(N)(N)(N)O[Co]234(O1)O[Co](N)(N)(N)(N)O2.N[Co](N)(N)(N)(O3)O4


The result is exactly the same as in your attached ZumChem7_21.AE.066.flipped.cxsmiles. But differs from ZumChem7_21.AE.066.mrv.cxsmiles.


So I was still not able to generate different smiles strings as all the mrv files resulted in  ZumChem7_21.AE.066.flipped.cxsmiles.


How can I generate ZumChem7_21.AE.066.mrv.cxsmiles?



This is coming from the applet not from molconvert.

ChemAxon 25dcd765a3

25-06-2009 06:50:12

Could you please tell me the version of the marvin sketch applet?

User 62a37f4796

25-06-2009 13:30:52










volfi wrote:

Could you please tell me the version of the marvin sketch applet?



That's a good question, actually.


 


I originally exported it from 5.1.4 and got the confusing results.


I just exported it from 5.2.2 and get the exact same output from both MRV files.


I guess it was fixed somewhere along the line.


 


Thanks!