as Alex requested more input I will compare the logP accuracy of JChem 3.2 versus KOWWIN. The values are taken from the free EPA EPI Suite (http://www.epa.gov/opptintr/exposure/pubs/episuite.htm
) and contains > 10k experimental logp values. The KOWWIN alogrithm is based on the Meylan, W.M. & Howard, P.H. (1995). Atom/fragment contribution method for estimating octanol–water partition coefficients. Journal of Pharmacological Sciences 84, 83–92.
KOWWIN is not the very best method, but a reliable and most important a free one. ClogP (Biobyte) and ACDLogP cost big $$$-$$$$$ for academia, and unless you want enhanced results for zwitterionic molecules and error bars you are fine with KOWWIN and JCHEM logp.
The following graphic shows the improvements for JCHEM from 3.14 to 3.2. Its also attached as download.
I don't include the error values for x and y, because the outcome is pretty clear. JCHEM improved alot but is not better than KOWWIN. (n=16,000)
ChemAxon Marvin 3.14 logP - R^2 = 0.7333
Chemaxon Marvin 3.2 logP - R^2 = 0.8032
KOWWIN logP - R^2 = 0.9532
Another issue for extremely large datasets is speed.
KOWWIN takes 9 seconds on this dataset.
JCHEM takes 2 minutes on this dataset, 12 times slower.
This is not JAVA dependent (disadvantage from 0-20%).
So for extremely large datasets (>10^9 this is certainly an issue)
Thank you for this test.
Do you have any information about the training set of KOWWIN?
Thanks a lot for all this information, but to tell the truth it is not fair to compare prediction methods based on published data. Most of the developers of prediction tools select molecules into the training set from these publicly available databases to make sure that these charts look good.
When you test the prediction methods with in-house data you usually get much larger errors than those on these charts.
We have received a lot of very positive feedback from users who compared our method with others on in-house data.
Let's not argue about the importance of octanol-water logP. Well, it is important for us because it generates significant revenue. :-)
I wouldn't either, you misunderstand something. Still, in-house databases usually contain structures that are not similar enough to the ones with published data. The training sets of the logP prediction methods are not independent enough from the publicly available data. As a result, the performance of the methods on in-house data is much worse than on public data. Companies who buy logP prediction software are aware of that so they compare the performance of the methods based on in-house data. We have many very significant users who had compared our method with others before choosing our software.
|I would not assume that somebody fakes the data |
One more comment: our software calculates pKa during logP prediction to determine whether the molecule is zwitterionic. In the case of zwitterionic molecules the calculation is more difficult due to the equilibria between the different ionic species. As a result the calculation is much slower. We are considering to make this optional, because this is only useful for a few percent of the structures.