Unexpected behaviour

User ee6724032a

27-07-2009 13:54:33

Hi,


We stumbled upon some unexpected behaviour of the cartridge during an import of the structures in the attachment.


When we checked the structures and corresponding smiles after the import with JCMan we noticed:
1. First structure has 'c:15' flag in the smiles field which is, according to our chemists, not correct.
2. Second structure has 't:15' flag, which is not only incorrect, but also makes this structure to fail against the first one on duplicate search.
3. Third structure yields '[H]\N=C\N' smiles, which makes standardizer fail to remove all 'non-wedged' explicit hydrogens. Also, MarvinSketch marks the double bond as having E configuration.


FYI: We are using JChem 5.2.03.1 on Oracle 11.1.0.7.0., JDBC 11.1.0.6.0-Production+.


Thanks,
Val

ChemAxon a9ded07333

27-07-2009 14:19:57

Hi Valeriy,


Since your problem is probably not cartridge but import related, I moved the topic to the Marvin forum. My colleagues will answer you soon.


Regards,
Tamás

ChemAxon 25dcd765a3

29-07-2009 15:05:16

Hi,


1. First structure has 'c:15' flag in the smiles field which is, according to our chemists, not correct.



Why do you think it is not correct?


As far as I see it is a cis double bond. (The two ligands of the bond is at the same side.)



2.
Second structure has 't:15' flag, which is not only incorrect, but also
makes this structure to fail against the first one on duplicate search.


Yes you are right, even if you draw the molecule that the 2D depiction suggest that the double bond is trans it is definitely a cis double bond. I think a standardizer action is needed to correct the stereo information on the double bond.


I think it is worth to mention that (cx)smiles import has this correction algorithm included so if you import it and export it again the double bond will be cis. (Of course this is not a solution, just a workaround until the problem get resolved.)


3.
Third structure yields '[H]\N=C\N' smiles, which makes
standardizer fail to remove all 'non-wedged' explicit hydrogens. Also,
MarvinSketch marks the double bond as having E configuration.


Yes, it is also a bug. We will fix this.


 


Thank you for the warnings.


 


Andras


 

User ee6724032a

30-07-2009 12:34:57

Hi Andras,


Here is what one of our chemists commented on the first structure,
"A double bond that is part of a ring system by definition can not have a Z- or E-configuration.
This naming is used for double bonds on which the atoms on the end of the double bond have two possibly positions. When the double bond is in a ring, these positions are fixed, i.e. there is no possibility for isomerism.
This is also demonstrated by the fact that it is impossible to come up with a chemically reasonable "E" or "trans" structure in this particular case."


About the second structure you said,


I think a standardizer action is needed to correct the stereo information on the double bond.


As you can see in SD file, there is no explicite stereo information. I wonder if it's possible to clean stereo information generated by JChem and keep explicitely defined intact.


Thank you,
Val

ChemAxon 25dcd765a3

31-07-2009 08:47:59

Hi,


"A double bond that is part of a ring system by definition can not have a Z- or E-configuration.
This
naming is used for double bonds on which the atoms on the end of the
double bond have two possibly positions. When the double bond is in a
ring, these positions are fixed, i.e. there is no possibility for
isomerism.
This is also demonstrated by the fact that it is
impossible to come up with a chemically reasonable "E" or "trans"
structure in this particular case."



I'm absolute agree with these statements.


Where have you noticed the Z or E configuration?


The flags at the extended part of the cxsmiles does not defines E or Z configuration, it just describes that two specific ligand of the doublebond is in cis or trans configuration. Please note that depending on the smiles indexing a real cis or Z double bond index may appears after the 't' flag.


As you can see in SD file, there is no explicite stereo information.


Sorry, but I have to complain about this. As far as I see the double bond stereo information is defined explicitly. It is defined by the x,y,z coordinates of the double bond and the corresponding ligands.


So the standardizer need to convert an incorrectly defined double bond stereo information to a correct one, by changing the coordinates of the double bond. I think it is not a so simple task, it would be better to draw the molecule with correct double bond setereo information.


Why do you think the double bond stereo information is not defined explicitly?


Please let me now your opinion.


Andras

User 5ca4f21417

31-07-2009 10:55:31

Dear Andras,


I'm the chemist that Valery keeps referring to


Thanks for all your help so far!


I agree with your statements about the coordinates in the sd-file defining the "cis"-configuration for this structure. Apparently, that is also the reason a "cis" indicator pops up in the smarts string, right?


However, because there is no "trans"-isomer possible in this case, the "cis" indication is meaningless!


Now, it may look like this is a harmless "quirk", but in fact, it gives us big problems when we import large (millions) numbers of structures in the Chemaxon database. When duplicate structures are imported, of which one has been given a "cis" indicator, and the other one does not, the database considers them as "different" (i.e., not duplicate), while in fact they ARE 100% duplicate.


Therefore, it would be best in this case when the database only creates a "cis" flag for structures that are REALLY "cis", i.e. for which also a real "trans" isomer exists!


Would it be possible to change this in the database?


Herman

ChemAxon 25dcd765a3

01-08-2009 21:02:09

Dear Herman,


Apparently, that is also the reason a "cis" indicator pops up in the smarts string, right?


However, because there is no "trans"-isomer possible in this case, the "cis" indication is meaningless!


Yes this is true. But we should also consider the import speed. So let's suppose that you suggested: ignore write out some cis / trans information which can be calculated. In this case, after import we should calculate these information and this slows down the import. Like in this case we should search rings in the molecule which is a slow process.


So our philosophy: keep the most information.


To handle these cases the structures should be standardized.


http://www.chemaxon.com/product/standardizer.html


There are plenty of such cases when two structures are the same but given in different representation. (The most well known probably the nitro group.) The standardizer is a tool to convert all these different representations to a standard form.


So the standardizer should correct the structures and not the import/export.


As far as I know there is no such a standardizer action to convert these specific double bonds to CIS but I have already complained about it.


 


Andras

ChemAxon d76e6e95eb

03-08-2009 08:44:01

We will build this double bond checking feature into the upcoming Structure Checker, that will be part of Marvin, so the import functions can call it.

ChemAxon 25dcd765a3

03-08-2009 11:36:21

Hi,


3. Third structure yields '[H]\N=C\N' smiles,
which makes standardizer fail to remove all 'non-wedged' explicit
hydrogens. Also, MarvinSketch marks the double bond as having E
configuration.


This bug is fixed. The fix will appear in the next release.


Thank you for the report again.


Andras

User 5ca4f21417

03-08-2009 12:27:17










volfi wrote:

So the standardizer should correct the structures and not the import/export.


As far as I know there is no such a standardizer action to convert these specific double bonds to CIS but I have already complained about it.


 




Dear Andras,


We have the standardizer, but I don't think that would solve the problem. The structure (in the sd-file) is fine as it is, and therefore there is nothing to standardize!


Even if we would "standardize", after standardization, we will import the structures into the database, and AGAIN on import a smiles string will be created showing "false" stereochemistry.... this problem can therefore not be solved by any standardizer, only by changing something in the smiles conversion during the import procedure.


Having said all this, I understand that is is not easy to change this in the import procedure in such a way that doesn't cost a lot of extra import time


Therefore, let's leave this issue for the time being.


I think my colleague Val will post some more interesting examples for you!

User ee6724032a

03-08-2009 12:42:44

Hi,


We stumbled upon some other case in connection to cis/trans in generated smiles.
The first structure shown on mol1.png has smiles 'CCCOc1ccc(cc1)C1=NNC(=O)CC1 |t:11|' and the second structure shown on mol2.png - 'CCCOc1ccc(cc1)C1=NNC(=O)CC1'
The problem is the same as in previous case, JCManager treats these two as unique structures. Strangely enough, Instant JChem's Overlap Analysis with 'Duplicate' search type matches the second structure (without trans flag) only to itself, but matches the first one to both structures with and without trans flag.


We appreciate your advices on the matter, but when it comes to Standardizer, we would not like to use it to correct all the double bonds and it's close to impossible to make it selective enough to cover only this case.


Thank you,
Val

ChemAxon 25dcd765a3

05-08-2009 12:47:30

Hi Val,


The second structure shown on mol2.png has a  CTUNKNOWN
double bond (the CIS/TRANS information cannot be determined because bond angle is close to
180 degrees).


In the first structure the double bond stereo information can be properly determined.


Andras

ChemAxon d76e6e95eb

05-08-2009 12:50:33

Sorry to argue, but the stereoconfiguration in this case seems unambiguous. It is a doublebond in a small ring.

ChemAxon 25dcd765a3

05-08-2009 12:59:42

As the cis trans detection function does not use ring information (for speed consideration) , just the coordinates of the atoms, it is not possible to differentiate the two cases: double bond is in ring, not in ring. See attached picture.

ChemAxon 25dcd765a3

06-08-2009 16:16:36

Hi,


I have checked the IUPAC recommendations (Pure Appl. Chem., Vol. 78, No. 10, pp. 1897–1970, 2006. doi:10.1351/pac200678101897) about your problems.


1. First structure has 'c:15' flag in the smiles field which is, according to our chemists, not correct.
2.
Second structure has 't:15' flag, which is not only incorrect, but also
makes this structure to fail against the first one on duplicate search.


This problem is mentioned at the page 1959:


ST-4.3 Double bonds in rings
In some cases, it may be necessary to draw a double bond viewed from a non-standard orientation. This is particularly true in perspective drawings and in depictions of bridged ring systems. Whenever possible in such cases, the local configuration of the double bond as drawn should match the actual configuration intended. That is, a cis double bond with two substituents should always be drawn with both of those substituents on the same side of the bond, regardless of the orientation of the bond itself.


That said, a viewer should generally expect that any double bond within a ring containing six or
fewer atoms will be in a cis configuration relative to the ring. In small rings of this type, the depiction of a trans bond when it could otherwise be drawn as cis only adds extra confusion for the viewer.


 


According to the second paragraph we think about to include some ring detection into the cis-trans configuration detection algorithm.


Andras