Explicit aromatic bonds in canonicalised structures

User 6f58eb8616

25-08-2009 16:58:23

Hi,


We have noticed a few changes to the canonicalisation algorithm of JChem between versions 5.1.3_2 and 5.2.3_1.  The difference which we're not sure about is the addition of explicit aromatic bonds appearing between aromatic atoms when we get a unique SMILES, eg:


c1ccccc1c1ccccc1     ->     c1ccc(cc1):c1ccccc1


From this thread ( http://www.chemaxon.com/forum/viewpost17128.html&highlight=explicit+aromatic+bond#17128 ) I can see that it is ChemAxons policy to assume an aromatic bond between two aromatic atoms but should this really be explicit in the canonicalised form ( especially when it shouldn't be an aromatic bond )?  We are just using the "toFormat("smiles:u")" unique SMILES option.


Any help much appreciated.


Derek


 


 


 


 

ChemAxon 25dcd765a3

25-08-2009 18:14:18

Hi Derek,


This is very strange.


If I convert the smiles c1ccccc1c1ccccc1:


with Marvin 5.2.3_1, I get the following smiles:


c1ccc(cc1)-c1ccccc1


In this case it is obvious that the two aromatic carbon is connected with single bond and definitely not aromatic one.


So could you write me how can I get the smiles with explicit aromatic bond?


I think that should be a bug and we should correct it ASAP.


Thank you


Andras

User 6f58eb8616

26-08-2009 09:03:13

Hi Andras,


I've just found out what is causing the explicit aromatic bond, its the "c" import option on the SMILES.  So basically this test does not pass:


        Molecule s = null;
        try {
            s = MolImporter.importMol("c1ccccc1c1ccccc1","smiles:c");
        } catch (MolFormatException e) {
            throw new TranslationException(" Could not import this molecule, " + e.getMessage(),e);
        }
        assertEquals("c1ccc(cc1)-c1ccccc1",s.toFormat("smiles:u"));


Whereas this test will pass:


        Molecule s = null;
        try {
            s = MolImporter.importMol("c1ccccc1c1ccccc1","smiles");
        } catch (MolFormatException e) {
            throw new TranslationException(" Could not import this molecule, " + e.getMessage(),e);
        }
        assertEquals("c1ccc(cc1)-c1ccccc1",s.toFormat("smiles:u"));


 


So something has changed in the JChem API between these 2 versions.  Also, just out of interest, what are the situations where we should be using the "c" import option and when we shouldn't be using it?  I was under the impression we should be using it at all times.


 


Thanks muchly


 


Derek


 


 


 

ChemAxon 25dcd765a3

26-08-2009 13:47:39

Hi,






I have looked after it and found that on 27 Apr I have made moved the changes from the developement head to the developement branch but somehow the documentation is left out.

So it is out in Marvin/Jchem 5.2.2.



I'm sorry about that.

The documentation which is missing from (http://www.chemaxon.com/marvin/help/formats/smiles-doc.html#ioptions ) is the following:









c      

Ignore fixing of double bond stereo information in small rings,
also ignore fixing of aromatic bonds to aliphatic if necessary.



bonds in small rings (ring size < 8) is imported
automatically with CIS stereo information. If c options is set,
the double bond stereo information is not changed to CIS
during the import.



By default the bond is aromatic between two aromatic atom. But this
is not true e.g. in case of biphenyl where the bond connecting
the two aromatic ring is single. If biphenyl is represented with
the SMILES string: "c1ccc(cc1)c1ccccc1" then it is necessary to
set the bond between the two rings to single.
If the molecule is exported by Chemaxon tools,
the single bond between two aromatic atom is always
explicitly written to avoid any confusion, so fixing
aromatic bonds to aliphatic can be avoided.






I suggest not to use 'c' option.


Andras