Code to convert to mol2 to smiles
User 0f28873a29
27-02-2008 14:32:57
Hi:
I need to covert a mol2 structure to smile in a program with the marvin API. My code look like this:
MolImporter importerMol2 = new MolImporter(name);
Molecule mol = null;
while ((mol = importerMol2.read()) != null){
this.molCount++;
System.out.println(mol.toFormat("smiles:u,a-H"));
}
The structure of mol2 is:
@<TRIPOS>MOLECULE
ZINC00391820
34 35 0 0 0
SMALL
USER_CHARGES
N-[(4-hydroxyphenyl)methyleneamino]-2-(2-methylimidazol-1-yl)-acetamide
@<TRIPOS>ATOM
1 C1 2.5016 0.6732 -2.5295 C.3 1 <0> -0.1150
2 C2 2.9517 0.4640 -1.1066 C.cat 1 <0> 0.3298
3 C3 4.0489 -0.2472 0.6101 C.2 1 <0> 0.0090
4 C4 3.1067 0.5965 1.0727 C.2 1 <0> 0.0303
5 N1 2.4066 1.0434 -0.0153 N.pl3 1 <0> -0.4027
6 C5 1.2803 1.9801 0.0005 C.3 1 <0> 0.0811
7 C6 -0.0144 1.2089 0.0087 C.2 1 <0> 0.5138
8 O1 0.0021 -0.0041 0.0020 O.2 1 <0> -0.4867
9 N2 -1.1906 1.8669 0.0178 N.am 1 <0> -0.5742
10 N3 -2.3943 1.1500 0.0197 N.2 1 <0> -0.2467
11 C7 -3.5269 1.7835 0.0285 C.2 1 <0> 0.1491
12 C8 -4.7935 1.0291 0.0304 C.ar 1 <0> -0.0878
13 C9 -4.7770 -0.3681 0.0169 C.ar 1 <0> -0.0550
14 C10 -5.9634 -1.0691 0.0242 C.ar 1 <0> -0.1537
15 C11 -7.1744 -0.3889 0.0336 C.ar 1 <0> 0.1382
16 C12 -7.1957 0.9999 0.0417 C.ar 1 <0> -0.1511
17 C13 -6.0142 1.7092 0.0399 C.ar 1 <0> -0.0626
18 O2 -8.3415 -1.0840 0.0345 O.3 1 <0> -0.4976
19 H1 3.0473 1.5109 -2.9637 H 1 <0> 0.1240
20 H2 2.6988 -0.2285 -3.1092 H 1 <0> 0.1207
21 H3 1.4332 0.8885 -2.5446 H 1 <0> 0.1099
22 H4 4.7739 -0.7787 1.2087 H 1 <0> 0.2259
23 H5 2.9349 0.8668 2.1042 H 1 <0> 0.2231
24 H6 1.3211 2.6123 -0.8866 H 1 <0> 0.1516
25 H7 1.3381 2.6027 0.8933 H 1 <0> 0.1586
26 H8 -1.2038 2.8368 0.0232 H 1 <0> 0.3913
27 H9 -3.5416 2.8634 0.0344 H 1 <0> 0.1287
28 H10 -3.8359 -0.8977 0.0047 H 1 <0> 0.1377
29 H11 -5.9516 -2.1490 0.0179 H 1 <0> 0.1363
30 H12 -8.1396 1.5246 0.0490 H 1 <0> 0.1391
31 H13 -6.0317 2.7890 0.0454 H 1 <0> 0.1365
32 H14 -8.6831 -1.2802 -0.8486 H 1 <0> 0.3994
33 N4 3.9326 -0.3058 -0.7240 N.pl3 1 <0> -0.4817
34 H15 4.5234 -0.8686 -1.3472 H 1 <0> 0.4804
@<TRIPOS>BOND
1 1 2 1
2 1 19 1
3 1 20 1
4 1 21 1
5 2 5 1
6 2 33 2
7 3 4 2
8 3 22 1
9 3 33 1
10 4 5 1
11 4 23 1
12 5 6 1
13 6 7 1
14 6 24 1
15 6 25 1
16 7 8 2
17 7 9 am
18 9 10 1
19 9 26 1
20 10 11 2
21 11 12 1
22 11 27 1
23 12 17 ar
24 12 13 ar
25 13 14 ar
26 13 28 1
27 14 15 ar
28 14 29 1
29 15 16 ar
30 15 18 1
31 16 17 ar
32 16 30 1
33 17 31 1
34 18 32 1
35 33 34 1
And the smiles ouput is:
CC1=NC=CN1C[C+](=O)[N-]\N=C\c1ccc(O)cc1
When I make the 2D representation with the this smiles and the msketch program, the representation has a carbon with a positive charge. (fig). When I do the same representation with the original smiles this carbon don't have this charge.
Why the my program put this positive charge in the carbon atom?
Thank in advance..
ChemAxon 25dcd765a3
27-02-2008 15:00:06
Hi,
You seems to mix something, or I don't understand the question.
Your original molecule is the original.png
The output of the smiles conversion is:
CC1=NC=CN1C[C+](=O)[N-]\N=C\c1ccc(O)cc1
depicted on smiles.png.
As far as I see all the charges are at the same atoms in the original file and in the smiles.
But neither the smiles, nor the original file has nothing to do with the smiles2.png attached by you.
Andras
User 0f28873a29
27-02-2008 15:18:25
Sorry you don't understand the question:
The original file is a mol2 file. When I convert (use Marvin API) with my script the mol2 to smiles format, I obtain this smiles: CC1=NC=CN1C[C+](=O)[N-]\N=C\c1ccc(O)cc1. This smiles have a positive charge in the carbon atom. When I represent the smiles in the msketch program, it shows this carbon atom with four bonds and the positive charge (fig).
This smiles is incorrect?
I search the original smiles of the molecule in the zinc database and it don't have this problem.
thank for your quick answer...
ChemAxon 25dcd765a3
27-02-2008 17:07:41
The smiles you wrote is correct:
CC1=NC=CN1C[C+](=O)[N-]\N=C\c1ccc(O)cc1
This is not the same molecule what you show on smiles2.png. The smiles2.png shows a totally different molecule.
The above smiles string is correct and moreover it is exactly the same molecule what you have in the mol2 file.
The positive charge on the carbon can also be found in the mol2 file if I import it to msketch, as it appears in the smiles.
The following things I don't understand:
- I don't understand the connection between the smiles string and smiles2.png.
- I don't understand the connection between the original mol2 file and smiles2.png.
- The original mol2 file has also positive charge in the carbon atom, can you confirm it?
- Do you also get the same molecule after depiction what I've got namely: original.png?
Andras
User 0f28873a29
27-02-2008 18:45:39
Hi:
Your suggestion is true the picture that I post t is incorrect. Your picture is correct. But my question not change. When i introduce the CC1=NC=CN1C[C+](=O)[N-]\N=C\c1ccc(O)cc1
It represent a positive charge in a carbon atom that have four bounds. Is this correct?
- When I introduce the smile in the in other software like chemsketch of ACDLab it return an error.
Thank for all
ChemAxon 25dcd765a3
28-02-2008 12:53:12
Hi,
Quote: |
It represent a positive charge in a carbon atom that have four bounds. Is this correct?
|
Chemically I would draw your original molecule without any charges, which would generate:
CC1=NC=CN1CC(=O)N\N=C\c1ccc(O)cc1
I think your question is, why do we generate charges if it shouldn't be there any?
The answer is simple, because you have set charges there (see original.png attached earlier).
So if you draw methane with 2 positive charges (see attached png) then the export will keep these charges even if is not correct at all: [H][C++]([H])([H])[H]
So you started from a molecule with incorrect charges and after change in the file format the charges remained. You should start from a correct structure. Quote: |
- When I introduce the smile in the in other software like chemsketch of ACDLab it return an error.
|
Sure, as the atomic valences are correct without the given charges, and these charges corrupts them.
If you view it with msketch you will notice the red underline on the problematic atoms displaying the valence error.
Andras
User 0f28873a29
28-02-2008 14:57:08
Hi, Thank for your quick answer...
I run the msketch with a a file without charges, then it represent the same problem but change the carbon atom.
When I run the mol2 with other program (Cliff) without charges, it obtain the smiles without problems.
Thank in advance...
PD: This is a Zinc file, this file was generated with the Openeyes Suit.
ChemAxon 25dcd765a3
28-02-2008 15:26:56
This 1_nocharges.mol2 also contains charges, but on an other C and N atom.
You can see on your attached image on the right side, the carbon is even underlined by red.
So the problem remains as the charges are still there.
I wonder if the root of problem is not in the export, but rather our mol2 file import.
Are the charges appear if you load the 1_nocharges.mol2 to other programs.
As I examined the 1_nocharges.mol2 file and I don't see charge definition in any atom.
So as far as I understand the mol2 file definition, after import none of the atoms should get charge.
So we check our mol2 import.
Now it seems that our mol2 import is buggy.
Andras
User 0f28873a29
28-02-2008 16:15:03
Thanks for you quick answers.
Do you have an idea of estimate time to solve the problem.?
Thank for all ....
User 0f28873a29
29-02-2008 13:40:33
Hi Andras:
I suppose that the error occurred during the interpretation of the ATOM types of the molecules. In this particular case the carbon atom has a C.cat type.
In the chemaxon API exist any function to clear the atom types and recalculate this types of a molecule based on the bonds and atoms.
Thank for your answers.
ChemAxon 25dcd765a3
05-03-2008 09:32:20
Hi,
I have discussed it with my colleagues.
You are right the problem is with the atom types. As you wrote the C.cat is a carbon cation.
So there is no bug at the import.
I think you should remove the charges from API:
Code: |
for (int i = 0; i < m.getAtomCount(); i++){
MolAtom a = m.getAtom(i);
a.setCharge(0);
}
|
So starting from your original molecule (attached startmol.mol2).
Removing the charges would result startmol_nocharge.mol2.
This molecule has the problem with the explicit H added badly.
Removing the explicit H atoms solves the problem:
CC1=NC=CN1CC(=O)N\N=C\c1ccc(O)cc1
See my attached test code.
Code: |
java Test startmol.mol2
CC1=NC=CN1CC(=O)N\N=C\c1ccc(O)cc1
|
I hope this helps.
Andras
User 0f28873a29
05-03-2008 15:38:41
Thank for all.
User 0f28873a29
06-03-2008 21:42:14
Hi Andras:
I have another question:
the sybyl mol2 format has this format (example from the home page of sybyl):
# Name: benzene
# Creating user name: tom
# Creation time: Wed Dec 28 00:18:30 1988
# Modifying user name: tom
# Modification time: Wed Dec 28 00:18:30 1988
@<TRIPOS>MOLECULE
benzene
12 12 1 0 0
SMALL
NO_CHARGES
@<TRIPOS>ATOM
1 C1 1.207 2.091 0.000 C.ar 1 BENZENE0.000
2 C2 2.414 1.394 0.000 C.ar 1 BENZENE0.000
3 C3 2.414 0.000 0.000 C.ar 1 BENZENE0.000
4 C4 1.207 -0.697 0.000 C.ar 1 BENZENE0.000
5 C5 0.000 0.000 0.000 C.ar 1 BENZENE0.000
6 C6 0.000 1.394 0.000 C.ar 1 BENZENE0.000
7 H1 1.207 3.175 0.000 H 1 BENZENE0.000
8 H2 3.353 1.936 0.000 H 1 BENZENE0.000
9 H3 3.353 -0.542 0.000 H 1 BENZENE0.000
10 H4 1.207 -1.781 0.000 H 1 BENZENE0.000
11 H5 -0.939 -0.542 0.000 H 1 BENZENE0.000
12 H6 -0.939 1.936 0.000 H 1 BENZENE0.000
@<TRIPOS>BOND
1 1 2 ar
2 1 6 ar
3 2 3 ar
4 3 4 ar
5 4 5 ar
6 5 6 ar
7 1 7 1
8 2 8 1
9 3 9 1
10 4 10 1
11 5 11 1
12 6 12 1
@<TRIPOS>SUBSTRUCTURE
1 BENZENE1 PERM 0 **** **** 0 ROOT
When I run my program whit your api function to export to mol2 file:
while ((mol = importerMol2.read()) != null){
if ((name.matches(mol.getName())) && (importerMol2.tell() == position)){
for (int i = 0; i < mol.getAtomCount(); i++){
MolAtom a = mol.getAtom(i);
a.setCharge(0);
}
System.out.println(mol.toFormat("mol2"));
}
}
importerMol2.close();
This is the ouput :
@<TRIPOS>MOLECULE
ZINC00031164
27 28
SMALL
NO_CHARGES
@<TRIPOS>ATOM
1 C1 -2.7325 7.1245 -0.1055 C.ar
2 C2 -2.7307 5.7395 -0.1760 C.ar
3 C3 -1.5705 5.0630 0.1674 C.ar
4 C4 -0.4636 5.7859 0.5710 C.ar
5 N1 -0.5000 7.1030 0.6216 N.ar
6 C5 -1.5860 7.7795 0.3019 C.ar
7 C6 -1.5186 3.5580 0.1076 C.3
8 H1 -2.5327 3.1587 0.1266 H
9 C7 -0.8234 3.1228 -1.1844 C.3
10 C8 -0.7548 1.5938 -1.2312 C.3
11 C9 -0.0127 1.0858 0.0080 C.3
12 C10 -0.7306 1.5801 1.2659 C.3
13 H2 -3.6190 7.6840 -0.3653 H
14 H3 -3.6120 5.2009 -0.4913 H
15 H4 0.4428 5.2661 0.8442 H
16 H5 -1.5793 8.8579 0.3595 H
17 H6 0.1861 3.5332 -1.2105 H
18 H7 -1.3872 3.4882 -2.0427 H
19 H8 -0.2219 1.2804 -2.1289 H
20 H9 -1.7645 1.1833 -1.2450 H
21 H10 0.0021 -0.0041 0.0020 H
22 H11 1.0097 1.4637 0.0003 H
23 H12 -1.7483 1.1900 1.2802 H
24 H13 -0.1959 1.2321 2.1497 H
25 N2 -0.7649 3.0552 1.2721 N.3
26 H14 -1.2188 3.3793 2.1342 H
27 H15 0.1956 3.4160 1.2346 H
@<TRIPOS>BOND
1 1 6 ar
2 1 2 ar
3 1 13 1
4 2 3 ar
5 2 14 1
6 3 4 ar
7 3 7 1
8 4 5 ar
9 4 15 1
10 5 6 ar
11 6 16 1
12 7 8 1
13 7 9 1
14 7 25 1
15 9 10 1
16 9 17 1
17 9 18 1
18 10 11 1
19 10 19 1
20 10 20 1
21 11 12 1
22 11 21 1
23 11 22 1
24 12 23 1
25 12 24 1
26 12 25 1
27 25 26 1
28 25 27 1
@<TRIPOS>SUBSTRUCTURE
1 noname 1
this output has differences with the example of sybyl that for programs like DOck are very important. The 2 whiteline under SMALL and the tree numbers before the number of atoms and the bonds corresponding to the amount of substructure.
Thank in advance.
ChemAxon 25dcd765a3
07-03-2008 18:37:24
Your format question will be replied by my colleague.
However I don't really understand how can you have charge at the N atom, if you have removed all?
Andras
User 0f28873a29
10-03-2008 14:53:31
Hi:
My objective is to process the zinc files. These archives are with a 3d structure and the
charge. first of all I wan to generate the smile for each mol2 file. this objective cause the
first problem with the charges. Second, I want to write the mol2 of some of the compounds
and the second problem occurs (the lines misses).
Well these are my problems .
Thank for all.
Yasset.
ChemAxon 7c2d26e5cf
11-03-2008 15:31:31
We will check both mol2 import and export.