Help with training Marvin - ChemAxon Forum Archive

User 2466ee5d97

02-01-2010 17:10:24

I thought I might have a look at training Marvin using a data set of measured properties I have. In particular I have a list of experiment pka.

The instructions are rather sparse....

"Knowledge base for the logP calculation can be generated with the following command:

    cxcalc -T logP -t LOGP -o logPparameters.txt trainingset.sdf

The logP after the -T command line option specifies the plugin calculation for which the training data should be generated, trainingset.sdf is the input file that contains the training set, the experimental logP values are read from the SDFile property field named LOGP, and the output is written to the file logPparameters.txt.

The logP plugin reads the configuration file from the file marvin/config/logPparameters.txt. To enable the access to your knowledge base, the created logPparameters.txt file has to be copied to the marvin/config directory.

After these steps the "User defined" method in logP and logD calculation will use the trained logP parameters."

Can I train pKa? I assume I need to create an sdf with the structures (2D/3D?) and have property filed called PKA? Do I need to include multiple tautomers?

Where do I need to put the resulting file? I cannot find a marvin/config folder do I need to create it? (I'm using Mac OS X 10.5)

From the description it is not clear if ONLY the training set is used for future calculations or is the new knowledge base a combination of ChemAxon and User data?

To stop using it do I need delete/move the parameters.txt file?

ChemAxon e08c317633

04-01-2010 11:18:55

Hi,

You can find more detailed informations here: http://www.chemaxon.com/marvin/help/calculations/calc_training.html

Zsolt

User 2466ee5d97

04-01-2010 13:41:37

Thanks for this, just to be certain.

Adding extra pka data extends the in built prediction.

In contrast it sounds like using my own logP data REPLACES the in built prediction model?

ChemAxon e08c317633

04-01-2010 14:05:08

drc_007 wrote:

Adding extra pka data extends the in built prediction.

In contrast it sounds like using my own logP data REPLACES the in built prediction model?

Yes, that's right.

In case of logP training there is an "--add-built-in-training-set" cxcalc command line option, which adds ChemAxon's built-in training set to the user's training set. This is the way to extend the built-in logP prediction model.

Example:

cxcalc -T logP -t LOGP --add-built-in-training-set -o logPparameters.txt trainingset.sdf

Zsolt