same smile string generated for different compounds?

User ddf03b522f

17-06-2007 16:23:25

I have the following two compounds in an sd file:





name: 4-Amino-N-cyclohexyl-benzenesulfonamide


formula: C12H18N2O2S


smile string generated by jchem: Nc1ccc(cc1)S(=O)(=O)NC2CCCCC2








name: 4-Amino-N-phenyl-benzenesulfonamide


formula: C12H12N2O2S


smile string generated by jchem: Nc1ccc(cc1)S(=O)(=O)Nc2ccccc2








As you can see these are two different compounds the only difference between the smile strings is the upper case NC2CCCCC2 for the first compound. Whilst marvin view can recognise the the different smile string if i store these smile strings in a database and do an exact search the database will retrieve both compounds because a database is not case sensitive when doing a search. is there any other way round this?

ChemAxon 9c0afc9aaf

18-06-2007 08:59:05

Hi,





The two SMILES strings describe two significantly different compound.


Upper / lower case is important in SMILES : lower case letters denote an aromatic ring in this case instead of single bonds.





Please see the "Aromaticity" section of this document for more information:





http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html width="90%" cellspacing="0" cellpadding="3" border="0" align="center"> Quote:


if i store these smile strings in a database and do an exact search the database will retrieve both compounds because a database is not case sensitive when doing a search. is there any other way round this? I suggest using our API to search for duplicates / exact match: this is the fastest and most efficient method, and you do not have to do any "hacking".


Please let me know if you are using JChem Base or JChem Cartridge, and what's the aim of this filtering (e.g. preventing registering duplicates), and then I can provide some further advice.





Best regards,





Szilard