I'm looking for the model basis of logP and related estimators (e.g. logPVG, logPKLOP, etc) so that I can describe what they are in a publication. I would like to know:
1) how many molecular fragments does each estimator consider (and if possible, what are the fragments?)
2) which training set/how many chemicals were used to train each model
The only description on logP is too vague for scientific reference:
"logP calculations are based on a pool of fragments predefined
in the calculator. This set is based on the data set in
references 1. Every fragment is assigned a unique name and a value.
logP plugin handled
only one fragment set until version 5.1.2, above, it was extended with two additional
sets. The sets are based on a published data set (see
reference 2) and the PhysProp© database."
Appreciate your help...
...how many molecular fragments does each estimator consider (and if possible, what are the fragments?)
The logP calculators are based on atom types. The atom types are described in this (references 1 ), approx. 100 different atom types are defined in that article. The original set of atom types were extended with a couple of ionized atom types and certain atom types were further classified. The atomic type parameter set was supplemented with structural effects such as hydrogen bond or even delocalization effects.
The size of the actual parameter set is approx. 200 , which is two times larger then the original one.
which training set/how many chemicals were used to train each model
Number of the training molecules in the models in this order : logPVS, logPKlop, logPPhys : 1500, 1600, 8000,
At the bottom of this site all references are given relevant to the above atom logP models.
Additional data given at this site about the ionized atom types.