Heavy atom count

User 2466ee5d97

05-07-2010 17:19:43

Is it possible to do a heavy atom count using cxcalc?


I see atomcount, counts of aliphatic or aromatic atoms but not heavy atoms.

ChemAxon e08c317633

12-07-2010 09:47:55

No it isn't but it can be done with Evaluator.


Example:


Subtract the Hydrogen atom count from atom count:


$ evaluate -e "atomCount() - atomCount('1')" "NCCCO"
5

For more see https://www.chemaxon.com/marvin/help/chemicalterms/EvaluatorFunctions.html#atomcount


Zsolt

User 2466ee5d97

12-07-2010 09:55:52

Many thanks for this,


That will do nicely.


It might still be worth adding HAC to the cxcalc options.

User 2466ee5d97

12-07-2010 16:08:59

At the moment I'm using


 



cxcalc input_file  -o  output_file logp logd -H 7.4 pka -b 1 -a 1 mass acceptorcount donorcount polarsurfacearea rotatablebondcount


 


To give a results.tab containing the data, what would be the best way to add the HAC to the file?



ChemAxon e08c317633

15-07-2010 19:09:44

If it's OK to get a semicolon separated list as output instead of a tab separated list, then Chemical Terms Evaluator will do the job:


evaluate nci10.smiles -e "logp(); logd('7.4'); apka('1'); bpka('1'); mass(); acceptorcount(); donorcount(); psa(); rotatablebondcount(); atomCount()-atomCount('1')"
1.42;1.42;;-7.61;122.12;2;0;34.14;0;9
6.22;6.22;;0.77;332.49;2;0;25.78;3;20
2.15;0.28;2.72;-8.56;218.55;5;1;111.87;2;14
0.71;0.61;7.99;5.83;145.14;4;2;81.7;1;9
2.09;2.09;;2.04;223.23;3;1;60.16;0;17
4.61;0.13;3.61;-5.43;490.1;5;2;83.83;1;27
1.54;1.54;;-0.73;235.67;3;0;37.38;1;16
3.37;3.37;;-7.42;267.24;4;0;79.96;1;20
0.41;0.29;7.89;1.11;116.12;4;2;65.18;1;8
5.11;5.11;;-8.08;262.29;0;0;0;3;19

cxcalc cannot perform mathematical operations on calculations, so it cannot subtract hydrogen count from atom count.


I hope this helps,


Zsolt

User 2466ee5d97

15-07-2010 19:51:08

Many thanks


 


Chris

User 2466ee5d97

20-07-2010 22:20:15

Hi,



evaluate myfile.sdf  -e name(); logp(); logd('7.4'); apka('1'); bpka('1'); atomCount(); mass(); acceptorcount(); donorcount(); psa(); rotatablebondcount(); atomCount()-atomCount('1') -o output.txt



Running this on a file of 30,000 structures I get this error


 


error "/Applications/ChemAxon/MarvinBeans/bin/evaluate: line 60: [: too many arguments


/Applications/ChemAxon/MarvinBeans/bin/evaluate: line 63: [: too many arguments


Exception in thread \"main\" chemaxon.nfunk.jep.ParseException: Error while evaluating expression:


name(); logp(); logd('7.4'); apka('1'); bpka('1'); atomCount(); mass(); acceptorcount(); donorcount(); psa(); rotatablebondcount(); atomCount()-atomCount('1')


    chemaxon.marvin.io.MolExportException: Name generation failed: java.lang.ArrayIndexOutOfBoundsException: 18


at chemaxon.nfunk.jep.JEP.getValueAsObject(Unknown Source)


at chemaxon.jep.ChemJEP.evaluate(Unknown Source)


at chemaxon.jep.Evaluator.main(Unknown Source)


Caused by:


chemaxon.nfunk.jep.ParseException: chemaxon.marvin.io.MolExportException: Name generation failed: java.lang.ArrayIndexOutOfBoundsException: 18


at chemaxon.jep.function.Plugin.run(Unknown Source)


at chemaxon.nfunk.jep.EvaluatorVisitor.visit(Unknown Source)


at chemaxon.nfunk.jep.ASTFunNode.jjtAccept(Unknown Source)


at chemaxon.nfunk.jep.EvaluatorVisitor.getValue(Unknown Source)


at chemaxon.nfunk.jep.JEP.getValueAsObject(Unknown Source)


at chemaxon.jep.ChemJEP.evaluate(Unknown Source)


at chemaxon.jep.Evaluator.main(Unknown Source)


Caused by:


chemaxon.marvin.plugin.PluginException: chemaxon.marvin.io.MolExportException: Name generation failed: java.lang.ArrayIndexOutOfBoundsException: 18


at chemaxon.marvin.calculations.IUPACNamingPlugin.getPreferredIUPACName(Unknown Source)


at chemaxon.marvin.calculations.IUPACNamingPlugin.run(Unknown Source)


at chemaxon.jep.function.Plugin.run(Unknown Source)


at chemaxon.nfunk.jep.EvaluatorVisitor.visit(Unknown Source)


at chemaxon.nfunk.jep.ASTFunNode.jjtAccept(Unknown Source)


at chemaxon.nfunk.jep.EvaluatorVisitor.getValue(Unknown Source)


at chemaxon.nfunk.jep.JEP.getValueAsObject(Unknown Source)


at chemaxon.jep.ChemJEP.evaluate(Unknown Source)


at chemaxon.jep.Evaluator.main(Unknown Source)


Caused by: chemaxon.marvin.io.MolExportException: Name generation failed: java.lang.ArrayIndexOutOfBoundsException: 18


at chemaxon.marvin.io.formats.name.NameExport.convert(Unknown Source)


at chemaxon.marvin.calculations.IUPACNamingPlugin.getName(Unknown Source)


... 9 more


Caused by: chemaxon.marvin.io.formats.name.nameexport.IUPACNamer$Error: Name generation failed: java.lang.ArrayIndexOutOfBoundsException: 18


at chemaxon.marvin.io.formats.name.nameexport.NamingCentral.getName(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.IUPACNamerThread.run(Unknown Source)


Caused by: java.lang.ArrayIndexOutOfBoundsException: 18


at chemaxon.marvin.io.formats.name.nameexport.GeneralFusedRingSystem.recognize(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.GeneralSpiroNamer.recognize(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.GeneralSpiroNamer.nameRingSystem(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.UnbranchedSpiroNamer.getAllNames(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.UnbranchedSpiroNamer.computePartNames(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.GeneralSpiroNamer.getPartNames(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.GeneralSpiroNamer.getModifiers(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.Part.suffixCount(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.Part.getAnionCount(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.Part.getSeniorityClass(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.ParentFinder.compare(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.ParentFinder.computeParent(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.ParentFinder.parent(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.TopologyAnalyser.getNextFragment(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.TopologyAnalyser.analyse(Unknown Source)


at chemaxon.marvin.io.formats.name.nameexport.NamingCentral.analyse(Unknown Source)


 


 


Is this a Java memory issue?

User 2466ee5d97

21-07-2010 06:52:11

If I remove the name() function it runs OK

ChemAxon e08c317633

21-07-2010 13:15:07

IUPAC name generator has some limitations, for example it cannot generate names for complex fused structures. In this case the "name()" CT function throws an exception. To ignore the error, and continue the processing with next structure use the


-g, --ignore-error                  continue with next molecule on error

command line option.


Examples:


evaluate -g -e "logP(); name()" mols.smiles

evaluate --ignore-error -e "logP(); name()" mols.smiles 2>errormessages.txt

The "2>errormessages.txt" works only on unix systems, and sends the error messages to file errormessages.txt.


Zsolt

User 2466ee5d97

21-07-2010 13:30:44

Many thanks for the rapid response :-)


I did wonder if that was the issue.


The only reason I'm using it is to provide an easy way for me to check the results for a particular record, is there any way to include the compound ID from the sdf file, or maybe a SMILES representation?


Thanks


Chris

ChemAxon e08c317633

21-07-2010 13:58:04










drc_007 wrote:

... is there any way to include the compound ID from the sdf file, or maybe a SMILES representation?



Yes, there is.


Example:


$ evaluate -e "field('id'); molString('smiles'); name()" demo10.sdf
1;CC1=CC(=O)C=CC1=O;2-methylcyclohexa-2,5-diene-1,4-dione
2;S(SC1=NC2=C(S1)C=CC=C2)C1=NC2=CC=CC=C2S1;2-(1,3-benzothiazol-2-yldisulfanyl)-1,3-benzothiazole
3;OC1=C(Cl)C=C(C=C1[N+]([O-])=O)[N+]([O-])=O;2-chloro-4,6-dinitrophenol
4;[O-][N+](=O)C1=CNC(=N)S1;5-nitro-2,3-dihydro-1,3-thiazol-2-imine
5;NC1=CC2=C(C=C1)C(=O)C1=C(C=CC=C1)C2=O;2-amino-9,10-dihydroanthracene-9,10-dione
6;OC(=O)C1=C(C=CC=C1)C1=C2C=CC(=O)C(Br)=C2OC2=C1C=CC(O)=C2Br;2-(4,5-dibromo-6-hydroxy-3-oxo-3H-xanthen-9-yl)benzoic acid
7;CN(C)C1=C(Cl)C(=O)C2=C(C=CC=C2)C1=O;2-chloro-3-(dimethylamino)-1,4-dihydronaphthalene-1,4-dione
8;CC1=C(C2=C(C=C1)C(=O)C1=CC=CC=C1C2=O)[N+]([O-])=O;2-methyl-1-nitro-9,10-dihydroanthracene-9,10-dione
9;CC(=NO)C(C)=NO;N-[3-(hydroxyimino)butan-2-ylidene]hydroxylamine
10;C1=CC=C(C=C1)P(C1=CC=CC=C1)C1=CC=CC=C1;triphenylphosphane

In the example 1st column contains the content of the "id" SDfile field, 2nd column contains the SMILES representation of the molecule, and 3rd column contains the IUPAC name. The input file is attached.


Zsolt

User 2466ee5d97

21-07-2010 14:08:19

Many thanks

User 2466ee5d97

13-10-2010 08:08:30










Zsolt wrote:










drc_007 wrote:

... is there any way to include the compound ID from the sdf file, or maybe a SMILES representation?



Yes, there is.


Example:


$ evaluate -e "field('id'); molString('smiles'); name()" demo10.sdf
1;CC1=CC(=O)C=CC1=O;2-methylcyclohexa-2,5-diene-1,4-dione
2;S(SC1=NC2=C(S1)C=CC=C2)C1=NC2=CC=CC=C2S1;2-(1,3-benzothiazol-2-yldisulfanyl)-1,3-benzothiazole
3;OC1=C(Cl)C=C(C=C1[N+]([O-])=O)[N+]([O-])=O;2-chloro-4,6-dinitrophenol
4;[O-][N+](=O)C1=CNC(=N)S1;5-nitro-2,3-dihydro-1,3-thiazol-2-imine
5;NC1=CC2=C(C=C1)C(=O)C1=C(C=CC=C1)C2=O;2-amino-9,10-dihydroanthracene-9,10-dione
6;OC(=O)C1=C(C=CC=C1)C1=C2C=CC(=O)C(Br)=C2OC2=C1C=CC(O)=C2Br;2-(4,5-dibromo-6-hydroxy-3-oxo-3H-xanthen-9-yl)benzoic acid
7;CN(C)C1=C(Cl)C(=O)C2=C(C=CC=C2)C1=O;2-chloro-3-(dimethylamino)-1,4-dihydronaphthalene-1,4-dione
8;CC1=C(C2=C(C=C1)C(=O)C1=CC=CC=C1C2=O)[N+]([O-])=O;2-methyl-1-nitro-9,10-dihydroanthracene-9,10-dione
9;CC(=NO)C(C)=NO;N-[3-(hydroxyimino)butan-2-ylidene]hydroxylamine
10;C1=CC=C(C=C1)P(C1=CC=CC=C1)C1=CC=CC=C1;triphenylphosphane

In the example 1st column contains the content of the "id" SDfile field, 2nd column contains the SMILES representation of the molecule, and 3rd column contains the IUPAC name. The input file is attached.


Zsolt



If the 'ID' is "10E-593" it seems to be treated as a number?

ChemAxon e08c317633

13-10-2010 09:20:16










drc_007 wrote:







If the 'ID' is "10E-593" it seems to be treated as a number?


It should be treated as string. Could you give an example?


Zsolt

ChemAxon e08c317633

13-10-2010 17:25:37

Chris, you are right, if the content of the ID field can be read as a number, then it's converted to a number.


We will provide a method to read the content of a field as string (around Marvin 5.4.1).


Zsolt


 

User 2466ee5d97

13-10-2010 17:39:32

Thanks for confirming it so quickly.

User 2466ee5d97

28-11-2014 13:59:32

I'm trying to find definitions of all the terms. If I use 


 


evaluate -l


 


It provides a link to http://www.chemaxon.com/marvin/help/chemicalterms/EvaluatorTables.html which seems not to exist?

ChemAxon d51151248d

28-11-2014 15:04:04

Dear Chris, 


we have fixed this issue. The correct link pointing to the function tables is :


https://docs.chemaxon.com/display/chemicalterms/Functions+by+Categories


The newer versions of our product will contain the fix. 


Best wishes, 


Daniel

User 2466ee5d97

28-11-2014 16:19:38

Hi,


 


Thanks for that, very useful.


 


I have an historical script that calls psa() I'm guessing this is now topologicalPolarSurfaceArea()?

ChemAxon e08c317633

01-12-2014 09:04:04

psa() works too. Some functions have short and long names, in this case psa() is the short and topologicalPolarSurfaceArea() is the long.

User 2466ee5d97

01-12-2014 10:44:39

Thanks


 


Chris

User 2466ee5d97

12-12-2014 10:12:17

I was trying to calculate the basic pka of 


1-Benzyl-4-methylpiperazine


http://www.chemicalize.org/structure/#!mol=CN1CCN%28CC1%29Cc2ccccc2&source=fp


Shouldn't the nitrogen bearing the methyl group be the most basic?

User 851ac690a0

12-12-2014 12:00:38

Hi,


 


Yes, the "N-methyl" nitrogen should be a little more basic than the "N-benzyl" nitrogen. 


The micro ionization constant (pKa=8.39) of the "N-benzyl" nitrogen is  a little over-predicted as it is shown on the attached figure. It should be below of the micro pKa = 8.14 of the "N-methyl" group. 


The two macro pKa values 8.59 and 2.9 are also shown on the figure.


 


Thanks for reporting this bug ,it will be fixed asap.


 


Jozsi