User c31567e5e3
02-06-2008 20:40:54
Greetings, I have a question regarding how (unique) SMILES are generated. Basically I have two instances of Molecule's and the output of toFormat("smiles:u") yield the following:
CS(=O)(=O)Nc1cccc(c1)-c2nc(-NCc3ccccc3)c4ccccc4n2
CS(=O)(=O)Nc1cccc(c1)-c2nc(NCc3ccccc3)c4ccccc4n2
Can you tell me what's the difference between the two instances? More generally, are there specific rules for which explicit single bonds are generated instead of implicit ones?
Thanks for any pointer,
Trung
ps: I'm using JChem 3.2.12
ChemAxon e08c317633
03-06-2008 07:33:55
I moved this topic to " Structure editing, viewing and file formats" forum. My colleagues will answer you soon.
Zsolt
ChemAxon 25dcd765a3
03-06-2008 20:26:56
Hi,
I guess the two molecules just looks identical but they are not.
Could you please attach the molecules in original format so I can point out the difference.
Thank you
Andras
User c31567e5e3
03-06-2008 21:34:10
These two molecules are just different tautomer forms. After some debugging, I found that the second explicit single bond in the following
CS(=O)(=O)Nc1cccc(c1)-c2nc(-NCc3ccccc3)c4ccccc4n2
was due to invalid stereo annotation. A mistake on my part for using MolBond.setFlags() instead of MolBond.setType(). With this fix, I got both instances to generate identical SMILES:
CS(=O)(=O)Nc1cccc(c1)-c2nc(NCc3ccccc3)c4ccccc4n2
CS(=O)(=O)Nc1cccc(c1)-c2nc(NCc3ccccc3)c4ccccc4n2
For the purpose of standardization/canonicalization, this is good enough for me, though I'm still curious as to the general condition for which explicit single bonds are generated.
Trung
ChemAxon d76e6e95eb
04-06-2008 07:23:19
The default bond type between to aliphatic atoms is single bond, the default bond type between to aromatic atoms is aromatic bond.
This is why the single bond must be explicitly specified between two aromatic atoms of biphenyl type systems.
User c31567e5e3
04-06-2008 12:20:23
Ok thanks. I don't recall reading about this in daylight documentation on SMILES. So according this rule, by removing the explicit single bond as in
CS(=O)(=O)Nc1cccc(c1)-c2nc(NCc3ccccc3)c4ccccc4n2
CS(=O)(=O)Nc1cccc(c1)c2nc(NCc3ccccc3)c4ccccc4n2
the single bond connecting two aromatic rings is now an aromatic bond? Clearly this is not the case here. Is this just a convention (perhaps for readability) or is there a deeper reason behind it?
Thanks,
Trung
ChemAxon 25dcd765a3
04-06-2008 12:32:59
As you have checked, we import the following SMILES
CS(=O)(=O)Nc1cccc(c1)c2nc(NCc3ccccc3)c4ccccc4n2
with single bonds between the aromatic rings.
Our philosophy is the following:
The single bond is written explicitly if it is between two aromatic atoms.
This helps readability and also ensures that other (third party) SMILES importers are getting the same structure as the exported.
Andras