Methology details of pKa calculator?

User 9cadc86c7c

17-08-2016 13:26:19

Hello to all,


After having some communications with the ChemAxon support team about details regarding the pKa algorithm, it would be a good idea to make the questions public.  So, here goes - hope you can answer them!

Which partial-charge algorithm did you use?  It wouldn’t be
Gasteiger-Marsili by any chance, if not an in-house ChemAxon algorithm?

Same for polarisabililty – how did you calculate it?

What are “structure-specific” increments and how do you define
ionisable sites?  A ChemAxon poster mentioned some simple definitions of acidic
and basic groups, but are these actually used in the regression model? 
Surely if it’s site-specific then you’d need to take into account the
functional group (as well as surrounding atoms to account for steric effects)?

What sort of compounds did you train the regression model

How does this relate to cxtrain when a “local model” is
trained?  This is very important as we have several unique chemical series
which one won’t find in public datasets.


In addition, do
you have any published (or private) examples of cxtrain-pKa being used?  My experience on our
compounds showed little benefit, but I also think that I didn’t use cxtrain
properly, yet I've found no evidence in the literature (yet) that shows cxtrain being used for pKa.  Seeing as it’s a regression model based on descriptors as
well-respected as partial charges, I find it somewhat hard to believe that
training did not work in our case...


Thanks in advance,


ChemAxon d51151248d

18-08-2016 13:30:33

Hi Ed,

Here are my answers for your questions:

  1. We use the Gasteiger partial charge calculation method, which we extended with some parameters. We had to heavily extend the original set of atom types of the Gasteiger method, and provide parameters for them. For example we can handle molecules with metal atoms, or ionic molecules, which the original method was uncapable of. We also had to tweak some of the calculations for delocalized structures. 

  2. We calculate polarizability using the methods in our reference, basically.      

    In this case we also had to extend the basic set of atom types, and in some cases partial charges have to included in the calculation.

  3.  Structure-specific increments are increments that are attributed to different structure-specific properties that should be taken into account during calculation, for example intra-molecular H-bonds.

    We define ionizable sites by the equilibrium between the conjugated acid and its base, so by the classical conjugated acid-base pair definition. Yes, we include/use in the regression model those simple acidic, basic groups.

  4.  We trained our regression model on literature data e.g. Perrin's book or drug compendiums.

  5.  When you train with an experimental dataset, ionizable sites are identified and a suitable regression equation is chosen for calculating pKa.

  6.  We only have some individual examples here, which shows how cxtrain is used, but not a full publication: />

  7.  Well cxtrain should work for your molecules as well, but without the settings and your data I can't tell more. 

I hope this helps,