molconvert an sdf to smiles

User 2938454c70

21-01-2009 09:32:02

I get wierd output where the smiles string has explicit (numbered) hydrogens. I tried running with smiles:-H to no avail. Attached is my input SDF and my output smiles file

ChemAxon 25dcd765a3

21-01-2009 21:41:27

Hi,





The atom maps are stored in the smiles string that is probably what disturbs you.


Mapped H atoms are not removed by the -H option.


Clear atom maps from your molecule setting all atom map to 0.





Andras

User 2938454c70

22-01-2009 06:50:10

What is it that I am supposed to do? Give a differet switch to the molconvert application or do some pre-proccessing?

ChemAxon 8b644e6bf4

27-01-2009 15:47:07

Hi,








It seems that molconvert has no switch to remove atom mappings. You can remove them in msketch (right click on each atoms, the select map - off. Using this manual method you can check if removing the mappings is a suitable solution or not.





It is faster to remove the mappings using the MolAtom.setAtomMap() method (see http://www.chemaxon.com/marvin/help/developer/beans/api/chemaxon/struc/MolAtom.html#setAtomMap(int) ). As an example the following code reads molecules from standard input, uses the above mentioned method to remove maps and writes them to the output in SMILES format:





Code:



import chemaxon.formats.MolImporter;


import chemaxon.struc.Molecule;


import java.io.IOException;





/**


 * Read molecules from System.in, remove atom maps and write the modified


 * molecules to System.out in SMILES format


 */


public class ClearMaps {





    public static void main( String [] args ) throws IOException {


        MolImporter mi = new MolImporter( System.in );


        Molecule m = new Molecule();


        while ( mi.read(m) ) {


            for( int i = 0; i < m.getAtomCount(); i++ ) {


                m.getAtom(i).setAtomMap(0);


            }


            System.out.println( m.toFormat( "smiles" ) );


        }


    }


}








The removed maps from the given test.smi:








Code:



$ cat test.smi | java ClearMaps


[H]O[C@@]1(C)N([H])C=C2CN=C(C3=CC=CC=C3F)C3=C(C=CC(Cl)=C3)N12


[H]O[C@H]1N=C(C2=CC=CC=C2)C2=C(C=CC(Cl)=C2)N2C(C)=NN=C12


[H]O[C@H]1N=C(C2=C(F)C=CC=C2)C2=C(C=CC(Cl)=C2)N2C(C)=NC=C12








Molconvert can be used to remove the remaining explicit H atoms:





Code:



$ cat test.smi | java ClearMaps | molconvert smiles:-H -


C[C@]1(O)NC=C2CN=C(C3=CC=CC=C3F)C3=C(C=CC(Cl)=C3)N12


CC1=NN=C2[C@@H](O)N=C(C3=CC=CC=C3)C3=C(C=CC(Cl)=C3)N12


CC1=NC=C2[C@@H](O)N=C(C3=C(F)C=CC=C3)C3=C(C=CC(Cl)=C3)N12











Regards,


Gabor

ChemAxon 8b644e6bf4

27-01-2009 16:04:41

Hi,





I would like to note that our Standardizer ( see http://www.chemaxon.com/product/standardizer.html ) can be configured easily to remove atom mappings.





If You will have any questions related to the Standardizer, please do not hesitate to ask them in its forum: http://www.chemaxon.com/forum/forum49.html





regards,





Gabor

ChemAxon d76e6e95eb

27-01-2009 16:14:48

Yes, it is very easy with standardizer:








Code:
standardize input.sdf -c "unmap" -o output.smiles

User 2938454c70

28-01-2009 11:14:55

The atom mapping (the numbers) does indeed disappear, but the brackets remain (see attached).





Any way to get rid of these as well?

ChemAxon d76e6e95eb

28-01-2009 11:45:43

Those brackets are not related to the maps, they are necessary for explicit hydrogens, stereo marks and charges, according to the SMILES standards. If you do not like the many explicit hydrogens in your file, just remove them with Molconvert or Standardizer. Might you want to clear stereo, you can easily do that with Standardizer, though you will loose the important stereo information.