SMILES string that JChem handles poorly

User a11e9761d6

10-11-2008 21:28:23

We have a SMILES string:

Cn/1cccc\c1=N\c2cccc[n+]2C

that we insert into our database via JChem Base. Once inserted, the cd_smiles and cd_structure columns are as follows:

cd_smiles: Cn1-cccc\c1=N\c1cccc[n+]1C

cd_structure: Cn/1cccc\c1=N\c2cccc[n+]2C

When this second SMILES (the cd_smiles) is imported it resolves to a different unique structure, even after dearomatizing and removing explicit Hs.

Neither of these structures appear to have correct SMILES. Correct SMILES would be:

N(c1[n+](cccc1)C)=C2\C=C/C=C\N2C

or

Cn1ccccc1=Nc1cccc[n+]1C

(ChemSpider, for example, is able to find the correct structure when a search for the original SMILES is run).

We suspect it is a structure cleaning issue from the original SMILES because there is a forward slash after the first 'n'. It seems that JChem should either clean this SMILES correctly or simply reject it as an invalid SMILES string. Is this a known issue? Any recommendations for how to handle this?

Thanks,

Krishna Dole

ChemAxon 25dcd765a3

11-11-2008 14:08:06

Dear Krishna,

Thank you for the report.

We have also found this problem and we are fixing this problem ASAP.

It has no connection to the cleaning procedure.

Andras

User a11e9761d6

11-11-2008 18:30:25

Thanks volfi. I'm glad to hear you are working on this. How will we know when this problem has been fixed?

ChemAxon 25dcd765a3

12-11-2008 14:32:52

I will definitely write it to this post.

Andras

ChemAxon 25dcd765a3

18-11-2008 13:41:16

Dear Krishna,

The bug is fixed.

As it changes SMILES import / export (so regeneration of database tables is needed from our clients using databases) the fix will appear in marvin 5.2.

Andras