Unique SMILES problem

ChemAxon 587f88acea

11-07-2005 16:21:40

Hi,





I have the same problem posted previously under the following link





http://www.chemaxon.com/forum/ftopic387.html





The final reply on Jan 25, 2005 says the issue is fixed and the change is in the CVS. I have just downloaded the newest version 3.5.8, and made the same test:





Code:



> echo "[C:1]NC" | ./molconvert smiles


[CH3:1]NC


> echo "CN[C:1]" | ./molconvert smiles


CN[CH3:1]








and it failed! I've also tried it within my code:





Code:



   MolExporter exporter = new MolExporter(System.err, "smiles:u");


   Molecule molecule1 = MolImporter.importMol("[C:1]NC");


   Molecule molecule2 = MolImporter.importMol("CN[C:1]");


   exporter.write(molecule1);


   exporter.write(molecule2);








and got the following as output:





Code:



[CH3:1]NC


CN[CH3:1]








I also have more complicated examples (non-symmetrical) where atom order in unique SMILES output is totally messed up... Isn't unique smiles for the same molecule supposed to be unique???

ChemAxon 43e6884a7a

11-07-2005 16:42:10

This has been fixed, but it will be released only with the next major version (Marvin 4.0) coming soon. Sorry about the confusion.


Concerning the other issue, please send us some examples of problematic structures.

ChemAxon 587f88acea

11-07-2005 19:04:13

Here is an example of the other issue:





Code:



Molecule moleculeWithMapNos = makeRandomMolecule();


ByteArrayOutputStream outputStream = new ByteArrayOutputStream();


MolExporter molExporter = new MolExporter(outputStream, "smiles:u");


molExporter.write(moleculeWithMapNos);


outputStream.close();


         


System.err.println(outputStream.toString());  // write unique smiles with map numbers!


Molecule uniqueSmilesMolecule = MolImporter.importMol(outputStream.toString());


for (int i= 0; i< uniqueSmilesMolecule.getAtomCount(); i++){


   MolAtom molAtom = uniqueSmilesMolecule.getAtom(i);


   molAtom.setAtomMap(0);


}


         


// rewrite unique smiles without map numbers!


outputStream = new ByteArrayOutputStream();


molExporter = new MolExporter(outputStream, "smiles:u");


molExporter.write(uniqueSmilesMolecule);


outputStream.close();


System.err.println(outputStream.toString());








Here I make a Molecule with map numbers and save it into unique SMILES, then load that SMILES into another molecule, remove the map numbers and save it as unique SMILES again... This produces the following for a random compound:





Code:



[CH3:32][CH2:31][C:33]1=[CH:34][CH:35]=[C:36]([CH:37]=[CH:38]1)[CH:27]2[S:28][CH:29]([CH2:30][S:25][CH:26]2[C:5]3=[CH:6][N:1]=[CH:2][CH:3]=[CH:4]3)[C:19]4=[CH:20][CH:21]=[C:22]([C:23](=[CH:24]4)[CH:15]5[CH2:16][CH2:17][CH2:18][CH:13]([CH3:14])[CH2:12]5)[CH:8]6[CH2:9][CH2:10][CH2:11][S:7]6





CCC1=CC=C(C=C1)C2SC(CSC2C3=CN=CC=C3)C4=CC=C(C5CCCS5)C(=C4)C6CCCC(C)C6








When I clean the atom map numbers and hydrogens, I get the following:





Code:



CCC1=CC=C(C=C1)C2SC(CSC2C3=CN=CC=C3)C4=CC=C(C(=C4)C5CCCC(C)C5)C6CCCS6


CCC1=CC=C(C=C1)C2SC(CSC2C3=CN=CC=C3)C4=CC=C(C5CCCS5)C(=C4)C6CCCC(C)C6








They are obviously different! My case is somewhat different from the previous posting since the SMILES string is not unique if map numbers are omitted.





As another piece of evidence that something is really wrong here, I imported the SMILES string WITH map numbers into another molecule, removed the map numbers and saved it into unique SMILES and got the second SMILES above. This shows that map numbers are not the culprit interfering with your algorithm but I guess it's the internal MolAtom array indeces that makes the difference.





Do you think this will get a fix at 4.0? Thanks

ChemAxon a3d59b832c

12-07-2005 09:22:56

Hello,





I tried your code and structures, Marvin 4.0 will work as expected.





You can download a pre-release version from here:





www.chemaxon.com/shared/alpha





Best regards,





Szabolcs