smiles generation

User c31567e5e3

02-06-2008 20:40:54

Greetings, I have a question regarding how (unique) SMILES are generated. Basically I have two instances of Molecule's and the output of toFormat("smiles:u") yield the following:





CS(=O)(=O)Nc1cccc(c1)-c2nc(-NCc3ccccc3)c4ccccc4n2


CS(=O)(=O)Nc1cccc(c1)-c2nc(NCc3ccccc3)c4ccccc4n2





Can you tell me what's the difference between the two instances? More generally, are there specific rules for which explicit single bonds are generated instead of implicit ones?





Thanks for any pointer,


Trung





ps: I'm using JChem 3.2.12

ChemAxon e08c317633

03-06-2008 07:33:55

I moved this topic to " Structure editing, viewing and file formats" forum. My colleagues will answer you soon.





Zsolt

ChemAxon 25dcd765a3

03-06-2008 20:26:56

Hi,





I guess the two molecules just looks identical but they are not.


Could you please attach the molecules in original format so I can point out the difference.





Thank you


Andras

User c31567e5e3

03-06-2008 21:34:10

These two molecules are just different tautomer forms. After some debugging, I found that the second explicit single bond in the following





CS(=O)(=O)Nc1cccc(c1)-c2nc(-NCc3ccccc3)c4ccccc4n2





was due to invalid stereo annotation. A mistake on my part for using MolBond.setFlags() instead of MolBond.setType(). With this fix, I got both instances to generate identical SMILES:





CS(=O)(=O)Nc1cccc(c1)-c2nc(NCc3ccccc3)c4ccccc4n2


CS(=O)(=O)Nc1cccc(c1)-c2nc(NCc3ccccc3)c4ccccc4n2





For the purpose of standardization/canonicalization, this is good enough for me, though I'm still curious as to the general condition for which explicit single bonds are generated.





Trung

ChemAxon d76e6e95eb

04-06-2008 07:23:19

The default bond type between to aliphatic atoms is single bond, the default bond type between to aromatic atoms is aromatic bond.





This is why the single bond must be explicitly specified between two aromatic atoms of biphenyl type systems.

ChemAxon 25dcd765a3

04-06-2008 09:08:35

Exactly

User c31567e5e3

04-06-2008 12:20:23

Ok thanks. I don't recall reading about this in daylight documentation on SMILES. So according this rule, by removing the explicit single bond as in





CS(=O)(=O)Nc1cccc(c1)-c2nc(NCc3ccccc3)c4ccccc4n2


CS(=O)(=O)Nc1cccc(c1)c2nc(NCc3ccccc3)c4ccccc4n2





the single bond connecting two aromatic rings is now an aromatic bond? Clearly this is not the case here. Is this just a convention (perhaps for readability) or is there a deeper reason behind it?





Thanks,


Trung

ChemAxon 25dcd765a3

04-06-2008 12:32:59

As you have checked, we import the following SMILES


CS(=O)(=O)Nc1cccc(c1)c2nc(NCc3ccccc3)c4ccccc4n2


with single bonds between the aromatic rings.





Our philosophy is the following:


The single bond is written explicitly if it is between two aromatic atoms.


This helps readability and also ensures that other (third party) SMILES importers are getting the same structure as the exported.





Andras