Quetion with gz ....

User 0f28873a29

23-02-2008 15:41:28

Hi:


I have some quetion related with the chemaxon, perhaps this is a developer quetion.


- First of all I'll want to import a gz file (mol2) to my own database. I'll want to take from each molecule the


@<TRIPOS>MOLECULE


DSD01234


the field of name of molecule to insert in the database.





- I'll want to take the offset of the molecule in the file to insert.





Are these things posible.





Thanks for all.





Yasset.

ChemAxon a3d59b832c

25-02-2008 14:58:10

Hi,





I moved this question to the database forum. Can you give an example file that you would like to put in the database?





Thanks,


Szabolcs

User 0f28873a29

26-02-2008 01:02:52

This is an example of gz file. A need to know how can I read this kind of file with the marvin..





thanks for all....

ChemAxon a3d59b832c

26-02-2008 11:33:00

Currently the mol2 format is not supported in JChem databases. You need to convert them into mrv or sdf first, for example by molconvert:





Code:
molconvert sdf compounds.mol2.gz -o converted.sdf






Then you will be able to import the sdf file into the database. In the longer term, JChem will support all file formats that Marvin (and so molconvert) understands.





Unfortunately, it is not yet directly supported to import the embedded molecule name into a separate database field, but it will also be added later, and can be solved by a little programming. See: http://www.chemaxon.com/forum/ftopic3559.html





Best regards,


Szabolcs

ChemAxon 9c0afc9aaf

26-02-2008 15:41:22

Hi,
Quote:



- I'll want to take the offset of the molecule in the file to insert.
Currently this is not supported.


However the import preserves the order, and the cd_id column (primary key) values in the database table are reflecting this order (integer numbers starting from 1 incremented by 1).





There are only two differences:


- The cd_id values reflect the order of the insert, not the offset in the file. This is different when some structures are skipped either because of duplicate filtering or by the "--skip" command-line option.





- If multiple files are imported, the subsequent indexes will not start again from 1, but continue the sequence.





Does this help ?





On the long run we are considering to implement a possibility to specify an indexing expression for an auto-calculated Chemical Terms field.


We are thinking of an expression with a prefix (e.g. company name), table name + cd_id value to create a unique identifier for each imported structure. Would this be a solution to your problem ?





Alternatively you may use our programming API to write a simple import method that deals with the offset in the file, let me know if you need some tips (e.g. key classes to use ) for this.





Best regards,





Szilard

User 0f28873a29

27-02-2008 15:43:18

thank for your answer:





Now I 'm using your api to insert the molecules in our in house database. The only problem is to generate smiles format from mol2 file. For example in the case of this molecule:





@<TRIPOS>MOLECULE


ZINC00391820


34 35 0 0 0


SMALL


USER_CHARGES


N-[(4-hydroxyphenyl)methyleneamino]-2-(2-methylimidazol-1-yl)-acetamide


@<TRIPOS>ATOM


1 C1 2.5016 0.6732 -2.5295 C.3 1 <0> -0.1150


2 C2 2.9517 0.4640 -1.1066 C.cat 1 <0> 0.3298


3 C3 4.0489 -0.2472 0.6101 C.2 1 <0> 0.0090


4 C4 3.1067 0.5965 1.0727 C.2 1 <0> 0.0303


5 N1 2.4066 1.0434 -0.0153 N.pl3 1 <0> -0.4027


6 C5 1.2803 1.9801 0.0005 C.3 1 <0> 0.0811


7 C6 -0.0144 1.2089 0.0087 C.2 1 <0> 0.5138


8 O1 0.0021 -0.0041 0.0020 O.2 1 <0> -0.4867


9 N2 -1.1906 1.8669 0.0178 N.am 1 <0> -0.5742


10 N3 -2.3943 1.1500 0.0197 N.2 1 <0> -0.2467


11 C7 -3.5269 1.7835 0.0285 C.2 1 <0> 0.1491


12 C8 -4.7935 1.0291 0.0304 C.ar 1 <0> -0.0878


13 C9 -4.7770 -0.3681 0.0169 C.ar 1 <0> -0.0550


14 C10 -5.9634 -1.0691 0.0242 C.ar 1 <0> -0.1537


15 C11 -7.1744 -0.3889 0.0336 C.ar 1 <0> 0.1382


16 C12 -7.1957 0.9999 0.0417 C.ar 1 <0> -0.1511


17 C13 -6.0142 1.7092 0.0399 C.ar 1 <0> -0.0626


18 O2 -8.3415 -1.0840 0.0345 O.3 1 <0> -0.4976


19 H1 3.0473 1.5109 -2.9637 H 1 <0> 0.1240


20 H2 2.6988 -0.2285 -3.1092 H 1 <0> 0.1207


21 H3 1.4332 0.8885 -2.5446 H 1 <0> 0.1099


22 H4 4.7739 -0.7787 1.2087 H 1 <0> 0.2259


23 H5 2.9349 0.8668 2.1042 H 1 <0> 0.2231


24 H6 1.3211 2.6123 -0.8866 H 1 <0> 0.1516


25 H7 1.3381 2.6027 0.8933 H 1 <0> 0.1586


26 H8 -1.2038 2.8368 0.0232 H 1 <0> 0.3913


27 H9 -3.5416 2.8634 0.0344 H 1 <0> 0.1287


28 H10 -3.8359 -0.8977 0.0047 H 1 <0> 0.1377


29 H11 -5.9516 -2.1490 0.0179 H 1 <0> 0.1363


30 H12 -8.1396 1.5246 0.0490 H 1 <0> 0.1391


31 H13 -6.0317 2.7890 0.0454 H 1 <0> 0.1365


32 H14 -8.6831 -1.2802 -0.8486 H 1 <0> 0.3994


33 N4 3.9326 -0.3058 -0.7240 N.pl3 1 <0> -0.4817


34 H15 4.5234 -0.8686 -1.3472 H 1 <0> 0.4804


@<TRIPOS>BOND


1 1 2 1


2 1 19 1


3 1 20 1


4 1 21 1


5 2 5 1


6 2 33 2


7 3 4 2


8 3 22 1


9 3 33 1


10 4 5 1


11 4 23 1


12 5 6 1


13 6 7 1


14 6 24 1


15 6 25 1


16 7 8 2


17 7 9 am


18 9 10 1


19 9 26 1


20 10 11 2


21 11 12 1


22 11 27 1


23 12 17 ar


24 12 13 ar


25 13 14 ar


26 13 28 1


27 14 15 ar


28 14 29 1


29 15 16 ar


30 15 18 1


31 16 17 ar


32 16 30 1


33 17 31 1


34 18 32 1


35 33 34 1





With my scrip:





while ((mol = importerMol2.read()) != null){


this.molCount++;


System.out.println(mol.toFormat("smiles:u,a-H"));


}


I generate an smile like this: CC1=NC=CN1C[C+](=O)[N-]\N=C\c1ccc(O)cc1





This positive charge in the carbon atom is correct?


Is this a valid smiles?





Thank in advance

ChemAxon a3d59b832c

28-02-2008 08:29:41

Hi,





Could you upload the mol2 source as an attachment? The forum engine removed all white space and it cannot be imported into Marvin to have a better look.





Thanks,


Szabolcs

User 0f28873a29

28-02-2008 12:50:17

This is the mol2 file....

ChemAxon a3d59b832c

29-02-2008 14:18:27

OK, I can see that you already discussed this issue with our colleagues in this other topic:





http://www.chemaxon.com/forum/ftopic3578.html