Standardizer use

User 1da2e6555b

22-07-2013 05:15:22

I'm using Marvin 6.0.0, Jchem 6.0.0 and Standardizer 6.0.0. I have problems with the results file obtained after Standardizer run. If I used the smiles of sdf file format for the output file I lost the initial compound numbering which I had in the initial smiles file. More exactly, in the initial smiles file I had for each smiles notation for each compound the corresponding compound number. When I looked in the result file this numbering dissapeared. Could you help me to keep this file numbering, or where can I see it ?


 


Simona

ChemAxon 5433b8e56b

22-07-2013 07:41:53

Hi Simona,


I have moved your question to the Standarizer related forum topic, my colleagues will answer for your question soon.


Regards,
Istvan 

ChemAxon afdac7b783

22-07-2013 10:26:59

Hi Simona, 


Could you send us an example file and describe the expected result?


Best regards,


Viktoria

User 1da2e6555b

24-07-2013 11:34:56










vpalfi wrote:

Hi Simona, 


Could you send us an example file and describe the expected result?


Best regards,


Viktoria



Hi Viktoria,


I give you here an example: if I have an initial smiles file in which the compound number is mentioned, like:


C=O    1
CC(C)[C@@H](C)\C=C\[C@H](C)[C@H]1CC[C@@H]2\C(CCC[C@]12C)=C\C=C1\C[C@@H](O)CCC1=C    7
Clc1ccc(cc1)C(c1ccc(Cl)cc1)C(Cl)(Cl)Cl    20
OC(=O)c1c(Cl)ccc(Cl)c1Cl    24
O=[As](C)([O-])C.[Na+]    30
[O-]C(COC1=NC(Cl)=C(Cl)C=C1Cl)=O.CC[NH+](CC)CC    31
Oc1ccc(Cl)cc1C(=O)Nc1ccc(cc1Cl)N(=O)=O    50
CCCCOCCOCCOCc1cc2OCOc2cc1CCC    60
CC[Hg]Sc1ccccc1C(=O)O    61
CCN(CC)CC    62
CCOCCOCCOC(C)Oc1ccc2OCOc2c1    70
C1CN1c1nc(nc(n1)N1CC1)N1CC1    80



and I use Standardizer in which the output file is, also, a smiles file, I got the following result:


C=O
CC(C)[C@@H](C)\C=C\[C@H](C)[C@H]1CC[C@@H]2\C(CCC[C@]12C)=C\C=C1/C[C@@H](O)CCC1=C
Clc1ccc(cc1)C(c1ccc(Cl)cc1)C(Cl)(Cl)Cl
OC(=O)c1c(Cl)ccc(Cl)c1Cl
C[As](C)([O-])=O
[O-]C(=O)COc1nc(Cl)c(Cl)cc1Cl
Oc1ccc(Cl)cc1C(=O)Nc1ccc(cc1Cl)N(=O)=O
CCCCOCCOCCOCc1cc2OCOc2cc1CCC
OC(=O)c1ccccc1
CCOCCOCCOC(C)Oc1ccc2OCOc2c1
C1CN1c1nc(nc(n1)N1CC1)N1CC1


The number for each compound (which is important for me) written after each smiles notation disappeared and I don't know after the filtering process which compounds remained and which not. In this case few compounds were included in the initial smiles file, but In case of large databases, this is a problem for me. I would appreciate if you could help me to solve this problem.


 


Simona

ChemAxon afdac7b783

24-07-2013 14:57:12

HI Simona,


In standardize command-line, you can export the property fields of your input file with a SMIELS export option if your input file contains header.


An example commad is the following:


$ standardize -c "removefragment" forum-test.smiles -f smiles:T*
#SMILES name    ID_numb
C=O             1
CC(C)[C@@H](C)\C=C\[C@H](C)[C@H]1CC[C@@H]2\C(CCC[C@]12C)=C\C=C1\C[C@@H](O)CCC1=C         7
Clc1ccc(cc1)C(c1ccc(Cl)cc1)C(Cl)(Cl)Cl          20
OC(=O)c1c(Cl)ccc(Cl)c1Cl                24
C[As](C)([O-])=O                30
CC[NH+](CC)CC           31
Oc1ccc(Cl)cc1C(=O)Nc1ccc(cc1Cl)N(=O)=O          50
CCCCOCCOCCOCc1cc2OCOc2cc1CCC            60
CC[Hg]Sc1ccccc1C(O)=O           61
CCN(CC)CC               62
CCOCCOCCOC(C)Oc1ccc2OCOc2c1             70
C1CN1c1nc(nc(n1)N1CC1)N1CC1             80

Please, find attached the input file with your data.


Best regards,


Viktoria

User 1da2e6555b

25-07-2013 06:58:32










vpalfi wrote:

HI Simona,


In standardize command-line, you can export the property fields of your input file with a SMIELS export option if your input file contains header.


An example commad is the following:


$ standardize -c "removefragment" forum-test.smiles -f smiles:T*
#SMILES name    ID_numb
C=O             1
CC(C)[C@@H](C)\C=C\[C@H](C)[C@H]1CC[C@@H]2\C(CCC[C@]12C)=C\C=C1\C[C@@H](O)CCC1=C         7
Clc1ccc(cc1)C(c1ccc(Cl)cc1)C(Cl)(Cl)Cl          20
OC(=O)c1c(Cl)ccc(Cl)c1Cl                24
C[As](C)([O-])=O                30
CC[NH+](CC)CC           31
Oc1ccc(Cl)cc1C(=O)Nc1ccc(cc1Cl)N(=O)=O          50
CCCCOCCOCCOCc1cc2OCOc2cc1CCC            60
CC[Hg]Sc1ccccc1C(O)=O           61
CCN(CC)CC               62
CCOCCOCCOC(C)Oc1ccc2OCOc2c1             70
C1CN1c1nc(nc(n1)N1CC1)N1CC1             80

Please, find attached the input file with your data.


Best regards,


Viktoria



Hi Viktoria,


Thank you very much for your help. Is the command-line the only way to vizualize the compound ID after filtering with Standardizer ? The Standardizer stand-alone (more friendly) program cannot be used for this purpose ? I usually don't work with command-lines.


Simona

ChemAxon afdac7b783

25-07-2013 11:21:33

If your input file is a *.smiles file, which has a header (with property names, like the one I sent you earlier), then you can use the more friendly Standardizer application to run your standardization. You need to select the output filetype as MDL SDfile (*.sdf). The generated output file will contain the fields of your input file along with a new StandardizerResult field. 


I also send you the sdf output of the above mentioned input file, run in Standardizer's stand-alone application with a simple "Remove fragment" action.


Best regards,


Viktoria

User 1da2e6555b

26-07-2013 05:12:55

vpalfi wrote:

If your input file is a *.smiles file, which has a header (with property names, like the one I sent you earlier), then you can use the more friendly Standardizer application to run your standardization. You need to select the output filetype as MDL SDfile (*.sdf). The generated output file will contain the fields of your input file along with a new StandardizerResult field. 


I also send you the sdf output of the above mentioned input file, run in Standardizer's stand-alone application with a simple "Remove fragment" action.


Best regards,


Viktoria





Hi Viktoria,

Thank you for your suggestions. They were very useful.

Best regards,
Simona