Code to convert to mol2 to smiles

User 0f28873a29

27-02-2008 14:32:57

Hi:





I need to covert a mol2 structure to smile in a program with the marvin API. My code look like this:





MolImporter importerMol2 = new MolImporter(name);


Molecule mol = null;


while ((mol = importerMol2.read()) != null){


this.molCount++;


System.out.println(mol.toFormat("smiles:u,a-H"));


}





The structure of mol2 is:





@<TRIPOS>MOLECULE


ZINC00391820


34 35 0 0 0


SMALL


USER_CHARGES


N-[(4-hydroxyphenyl)methyleneamino]-2-(2-methylimidazol-1-yl)-acetamide


@<TRIPOS>ATOM


1 C1 2.5016 0.6732 -2.5295 C.3 1 <0> -0.1150


2 C2 2.9517 0.4640 -1.1066 C.cat 1 <0> 0.3298


3 C3 4.0489 -0.2472 0.6101 C.2 1 <0> 0.0090


4 C4 3.1067 0.5965 1.0727 C.2 1 <0> 0.0303


5 N1 2.4066 1.0434 -0.0153 N.pl3 1 <0> -0.4027


6 C5 1.2803 1.9801 0.0005 C.3 1 <0> 0.0811


7 C6 -0.0144 1.2089 0.0087 C.2 1 <0> 0.5138


8 O1 0.0021 -0.0041 0.0020 O.2 1 <0> -0.4867


9 N2 -1.1906 1.8669 0.0178 N.am 1 <0> -0.5742


10 N3 -2.3943 1.1500 0.0197 N.2 1 <0> -0.2467


11 C7 -3.5269 1.7835 0.0285 C.2 1 <0> 0.1491


12 C8 -4.7935 1.0291 0.0304 C.ar 1 <0> -0.0878


13 C9 -4.7770 -0.3681 0.0169 C.ar 1 <0> -0.0550


14 C10 -5.9634 -1.0691 0.0242 C.ar 1 <0> -0.1537


15 C11 -7.1744 -0.3889 0.0336 C.ar 1 <0> 0.1382


16 C12 -7.1957 0.9999 0.0417 C.ar 1 <0> -0.1511


17 C13 -6.0142 1.7092 0.0399 C.ar 1 <0> -0.0626


18 O2 -8.3415 -1.0840 0.0345 O.3 1 <0> -0.4976


19 H1 3.0473 1.5109 -2.9637 H 1 <0> 0.1240


20 H2 2.6988 -0.2285 -3.1092 H 1 <0> 0.1207


21 H3 1.4332 0.8885 -2.5446 H 1 <0> 0.1099


22 H4 4.7739 -0.7787 1.2087 H 1 <0> 0.2259


23 H5 2.9349 0.8668 2.1042 H 1 <0> 0.2231


24 H6 1.3211 2.6123 -0.8866 H 1 <0> 0.1516


25 H7 1.3381 2.6027 0.8933 H 1 <0> 0.1586


26 H8 -1.2038 2.8368 0.0232 H 1 <0> 0.3913


27 H9 -3.5416 2.8634 0.0344 H 1 <0> 0.1287


28 H10 -3.8359 -0.8977 0.0047 H 1 <0> 0.1377


29 H11 -5.9516 -2.1490 0.0179 H 1 <0> 0.1363


30 H12 -8.1396 1.5246 0.0490 H 1 <0> 0.1391


31 H13 -6.0317 2.7890 0.0454 H 1 <0> 0.1365


32 H14 -8.6831 -1.2802 -0.8486 H 1 <0> 0.3994


33 N4 3.9326 -0.3058 -0.7240 N.pl3 1 <0> -0.4817


34 H15 4.5234 -0.8686 -1.3472 H 1 <0> 0.4804


@<TRIPOS>BOND


1 1 2 1


2 1 19 1


3 1 20 1


4 1 21 1


5 2 5 1


6 2 33 2


7 3 4 2


8 3 22 1


9 3 33 1


10 4 5 1


11 4 23 1


12 5 6 1


13 6 7 1


14 6 24 1


15 6 25 1


16 7 8 2


17 7 9 am


18 9 10 1


19 9 26 1


20 10 11 2


21 11 12 1


22 11 27 1


23 12 17 ar


24 12 13 ar


25 13 14 ar


26 13 28 1


27 14 15 ar


28 14 29 1


29 15 16 ar


30 15 18 1


31 16 17 ar


32 16 30 1


33 17 31 1


34 18 32 1


35 33 34 1





And the smiles ouput is:


CC1=NC=CN1C[C+](=O)[N-]\N=C\c1ccc(O)cc1





When I make the 2D representation with the this smiles and the msketch program, the representation has a carbon with a positive charge. (fig). When I do the same representation with the original smiles this carbon don't have this charge.





Why the my program put this positive charge in the carbon atom?





Thank in advance..

ChemAxon 25dcd765a3

27-02-2008 15:00:06

Hi,





You seems to mix something, or I don't understand the question.


Your original molecule is the original.png


The output of the smiles conversion is:


CC1=NC=CN1C[C+](=O)[N-]\N=C\c1ccc(O)cc1


depicted on smiles.png.


As far as I see all the charges are at the same atoms in the original file and in the smiles.


But neither the smiles, nor the original file has nothing to do with the smiles2.png attached by you.





Andras

User 0f28873a29

27-02-2008 15:18:25

Sorry you don't understand the question:





The original file is a mol2 file. When I convert (use Marvin API) with my script the mol2 to smiles format, I obtain this smiles: CC1=NC=CN1C[C+](=O)[N-]\N=C\c1ccc(O)cc1. This smiles have a positive charge in the carbon atom. When I represent the smiles in the msketch program, it shows this carbon atom with four bonds and the positive charge (fig).


This smiles is incorrect?





I search the original smiles of the molecule in the zinc database and it don't have this problem.





thank for your quick answer...

ChemAxon 25dcd765a3

27-02-2008 17:07:41

The smiles you wrote is correct:


CC1=NC=CN1C[C+](=O)[N-]\N=C\c1ccc(O)cc1


This is not the same molecule what you show on smiles2.png. The smiles2.png shows a totally different molecule.


The above smiles string is correct and moreover it is exactly the same molecule what you have in the mol2 file.


The positive charge on the carbon can also be found in the mol2 file if I import it to msketch, as it appears in the smiles.





The following things I don't understand:


- I don't understand the connection between the smiles string and smiles2.png.


- I don't understand the connection between the original mol2 file and smiles2.png.


- The original mol2 file has also positive charge in the carbon atom, can you confirm it?


- Do you also get the same molecule after depiction what I've got namely: original.png?





Andras

User 0f28873a29

27-02-2008 18:45:39

Hi:


Your suggestion is true the picture that I post t is incorrect. Your picture is correct. But my question not change. When i introduce the CC1=NC=CN1C[C+](=O)[N-]\N=C\c1ccc(O)cc1


It represent a positive charge in a carbon atom that have four bounds. Is this correct?





- When I introduce the smile in the in other software like chemsketch of ACDLab it return an error.





Thank for all

ChemAxon 25dcd765a3

28-02-2008 12:53:12

Hi,
Quote:
It represent a positive charge in a carbon atom that have four bounds. Is this correct?


Chemically I would draw your original molecule without any charges, which would generate:


CC1=NC=CN1CC(=O)N\N=C\c1ccc(O)cc1


I think your question is, why do we generate charges if it shouldn't be there any?


The answer is simple, because you have set charges there (see original.png attached earlier).


So if you draw methane with 2 positive charges (see attached png) then the export will keep these charges even if is not correct at all: [H][C++]([H])([H])[H]





So you started from a molecule with incorrect charges and after change in the file format the charges remained. You should start from a correct structure.
Quote:
- When I introduce the smile in the in other software like chemsketch of ACDLab it return an error.


Sure, as the atomic valences are correct without the given charges, and these charges corrupts them.


If you view it with msketch you will notice the red underline on the problematic atoms displaying the valence error.





Andras

User 0f28873a29

28-02-2008 14:57:08

Hi, Thank for your quick answer...





I run the msketch with a a file without charges, then it represent the same problem but change the carbon atom.


When I run the mol2 with other program (Cliff) without charges, it obtain the smiles without problems.





Thank in advance...





PD: This is a Zinc file, this file was generated with the Openeyes Suit.

ChemAxon 25dcd765a3

28-02-2008 15:26:56

This 1_nocharges.mol2 also contains charges, but on an other C and N atom.


You can see on your attached image on the right side, the carbon is even underlined by red.


So the problem remains as the charges are still there.





I wonder if the root of problem is not in the export, but rather our mol2 file import.


Are the charges appear if you load the 1_nocharges.mol2 to other programs.





As I examined the 1_nocharges.mol2 file and I don't see charge definition in any atom.


So as far as I understand the mol2 file definition, after import none of the atoms should get charge.


So we check our mol2 import.





Now it seems that our mol2 import is buggy.





Andras

User 0f28873a29

28-02-2008 16:15:03

Thanks for you quick answers.


Do you have an idea of estimate time to solve the problem.?





Thank for all ....

User 0f28873a29

29-02-2008 13:40:33

Hi Andras:





I suppose that the error occurred during the interpretation of the ATOM types of the molecules. In this particular case the carbon atom has a C.cat type.





In the chemaxon API exist any function to clear the atom types and recalculate this types of a molecule based on the bonds and atoms.





Thank for your answers.

ChemAxon 25dcd765a3

05-03-2008 09:32:20

Hi,





I have discussed it with my colleagues.


You are right the problem is with the atom types. As you wrote the C.cat is a carbon cation.


So there is no bug at the import.





I think you should remove the charges from API:


Code:



for (int i = 0; i < m.getAtomCount(); i++){


    MolAtom a = m.getAtom(i);


    a.setCharge(0);


}








So starting from your original molecule (attached startmol.mol2).


Removing the charges would result startmol_nocharge.mol2.


This molecule has the problem with the explicit H added badly.


Removing the explicit H atoms solves the problem:


CC1=NC=CN1CC(=O)N\N=C\c1ccc(O)cc1





See my attached test code.








Code:
java Test startmol.mol2


CC1=NC=CN1CC(=O)N\N=C\c1ccc(O)cc1








I hope this helps.





Andras

User 0f28873a29

05-03-2008 15:38:41

Thank for all.

User 0f28873a29

06-03-2008 21:42:14

Hi Andras:


I have another question:


the sybyl mol2 format has this format (example from the home page of sybyl):





# Name: benzene


# Creating user name: tom


# Creation time: Wed Dec 28 00:18:30 1988





# Modifying user name: tom


# Modification time: Wed Dec 28 00:18:30 1988





@<TRIPOS>MOLECULE


benzene


12 12 1 0 0


SMALL


NO_CHARGES








@<TRIPOS>ATOM


1 C1 1.207 2.091 0.000 C.ar 1 BENZENE0.000


2 C2 2.414 1.394 0.000 C.ar 1 BENZENE0.000


3 C3 2.414 0.000 0.000 C.ar 1 BENZENE0.000


4 C4 1.207 -0.697 0.000 C.ar 1 BENZENE0.000


5 C5 0.000 0.000 0.000 C.ar 1 BENZENE0.000


6 C6 0.000 1.394 0.000 C.ar 1 BENZENE0.000


7 H1 1.207 3.175 0.000 H 1 BENZENE0.000


8 H2 3.353 1.936 0.000 H 1 BENZENE0.000


9 H3 3.353 -0.542 0.000 H 1 BENZENE0.000


10 H4 1.207 -1.781 0.000 H 1 BENZENE0.000


11 H5 -0.939 -0.542 0.000 H 1 BENZENE0.000


12 H6 -0.939 1.936 0.000 H 1 BENZENE0.000


@<TRIPOS>BOND


1 1 2 ar


2 1 6 ar


3 2 3 ar


4 3 4 ar


5 4 5 ar


6 5 6 ar


7 1 7 1


8 2 8 1


9 3 9 1


10 4 10 1


11 5 11 1


12 6 12 1


@<TRIPOS>SUBSTRUCTURE


1 BENZENE1 PERM 0 **** **** 0 ROOT








When I run my program whit your api function to export to mol2 file:





while ((mol = importerMol2.read()) != null){


if ((name.matches(mol.getName())) && (importerMol2.tell() == position)){


for (int i = 0; i < mol.getAtomCount(); i++){


MolAtom a = mol.getAtom(i);


a.setCharge(0);


}


System.out.println(mol.toFormat("mol2"));


}


}


importerMol2.close();





This is the ouput :





@<TRIPOS>MOLECULE


ZINC00031164


27 28


SMALL


NO_CHARGES


@<TRIPOS>ATOM


1 C1 -2.7325 7.1245 -0.1055 C.ar


2 C2 -2.7307 5.7395 -0.1760 C.ar


3 C3 -1.5705 5.0630 0.1674 C.ar


4 C4 -0.4636 5.7859 0.5710 C.ar


5 N1 -0.5000 7.1030 0.6216 N.ar


6 C5 -1.5860 7.7795 0.3019 C.ar


7 C6 -1.5186 3.5580 0.1076 C.3


8 H1 -2.5327 3.1587 0.1266 H


9 C7 -0.8234 3.1228 -1.1844 C.3


10 C8 -0.7548 1.5938 -1.2312 C.3


11 C9 -0.0127 1.0858 0.0080 C.3


12 C10 -0.7306 1.5801 1.2659 C.3


13 H2 -3.6190 7.6840 -0.3653 H


14 H3 -3.6120 5.2009 -0.4913 H


15 H4 0.4428 5.2661 0.8442 H


16 H5 -1.5793 8.8579 0.3595 H


17 H6 0.1861 3.5332 -1.2105 H


18 H7 -1.3872 3.4882 -2.0427 H


19 H8 -0.2219 1.2804 -2.1289 H


20 H9 -1.7645 1.1833 -1.2450 H


21 H10 0.0021 -0.0041 0.0020 H


22 H11 1.0097 1.4637 0.0003 H


23 H12 -1.7483 1.1900 1.2802 H


24 H13 -0.1959 1.2321 2.1497 H


25 N2 -0.7649 3.0552 1.2721 N.3


26 H14 -1.2188 3.3793 2.1342 H


27 H15 0.1956 3.4160 1.2346 H


@<TRIPOS>BOND


1 1 6 ar


2 1 2 ar


3 1 13 1


4 2 3 ar


5 2 14 1


6 3 4 ar


7 3 7 1


8 4 5 ar


9 4 15 1


10 5 6 ar


11 6 16 1


12 7 8 1


13 7 9 1


14 7 25 1


15 9 10 1


16 9 17 1


17 9 18 1


18 10 11 1


19 10 19 1


20 10 20 1


21 11 12 1


22 11 21 1


23 11 22 1


24 12 23 1


25 12 24 1


26 12 25 1


27 25 26 1


28 25 27 1


@<TRIPOS>SUBSTRUCTURE


1 noname 1





this output has differences with the example of sybyl that for programs like DOck are very important. The 2 whiteline under SMALL and the tree numbers before the number of atoms and the bonds corresponding to the amount of substructure.





Thank in advance.

ChemAxon 25dcd765a3

07-03-2008 18:37:24

Your format question will be replied by my colleague.





However I don't really understand how can you have charge at the N atom, if you have removed all?





Andras

User 0f28873a29

10-03-2008 14:53:31

Hi:


My objective is to process the zinc files. These archives are with a 3d structure and the


charge. first of all I wan to generate the smile for each mol2 file. this objective cause the


first problem with the charges. Second, I want to write the mol2 of some of the compounds


and the second problem occurs (the lines misses).





Well these are my problems .





Thank for all.





Yasset.

ChemAxon 7c2d26e5cf

11-03-2008 15:31:31

We will check both mol2 import and export.