User 21b7e0228c
07-04-2011 10:17:33
We set out to develop a
tool to help us assign literature pKa values to the specific deprotonation
process they are likely to correspond - see joined. For each molecule, we wish
to generate the succession of proton jumps:
>start from its
absolutely protonated µspecies at some GodForsaken pH=-10, predict the first
pKa and locate the first proton to depart,
>then calculate the
resulting deprotonated µspecies (by moving to the pH past this pKa threshold
and RECALCULATING THE NEW MAJORITARY MS - by an independent majormsplugin).
>set this as the current
µspecies , recalculate the first acidic pKa of the new current µspecies, make
the twice deprotonated µspecies the current one, etc, until there is nothing
more to deprotonate.
The reasons for manually
jogging thru the pH range, albeit the tools support some automatic pH scanning,
are twofold
A. From the doc, I never
undestood how the automatic pH scan is supposed to work. Nor, for the matter,
did I find any explanation of the differences btw "static" &
"dynamic" calculation modes - I found out by trial & error that
"macro dynamic" is the config I need to use (for I want a first
acidic pKa OF THE PROVIDED µspecies as is, not some generic first acidic pKa of
the molecule EXEMPLIFIED by this µspecies - i.e. calculating the pKa of
[O-]C(=O)CC(=O)O should return the deprotonation of the second -COOH, feeling
the effect of the neighboring anion - so the second pKa of the maleic acid
should expectedly become the "first" acidic pKa of the monomaleate.
This works fine (but still does not solve the problem of API docs - which are
as bad as if I would have written
them!!).
B. We want to output, for
each involved µspecies, its smiles numbered such as to have the reaction
(proton departure) center pinpointed consistently to the atom sequence in the
smiles!! For example, with maleic acid:
>> echo
"O=C(O)CC(=O)O" | cxcalc pKa -a 2 -b 0 -m macro -P dynamic
id apKa1 apKa2 atoms
1 2.43
5.92
3,7
This is not what we need,
because it does not say that atom 3 became [O-] when atom 7 loses its H.
However,
>> echo
"O=C(O)CC(=O)O" | java Utils/pKaAssign
1 2.428 1 OC(=O)CC(O)=O
1 5.919 1 OC(=O)CC([O-])=O
This is fine, cause it says
that the first pKa at 2.4 happens on atom 1 of the associated µspecies, and the
second at 5.9 is also at atom #1 of the associated smiles, which has been
renumbered wrt first. We do not care that atom 1 of the first process is now
atom #7 in the second process - what we want is having, for each process, the
correct marker of the proton departure center, with the other atoms in their
properly associated protonation states.
Now, the joint piece of
code works very well - for most of the molecules (and furthermore, predictions
ARE close to the experimental data, so it really helps decrypting the Who's
Who). Yet, when it fails, it does it memorably and schizophrenically, for
molecules one would not expect - and the cherry on the cake is that cxcalc does
NOT fail with the same compounds!
Consider the following
pyridinecarboxylic acid:
>> echo
"OC(=O)c1ccc[nH+]c1" | cxcalc pKa -a 2 -b 0 -m macro -P dynamic
id apKa1 apKa2 atoms
1 2.79
4.19 1,8
which is fine - we have an
acid going to COO- at 2.8 (atom 1), then Py+ going neutral at 4.2 (atom 8). Our
code, however, says:
>> echo
"OC(=O)c1ccc[nH+]c1" | java Utils/pKaAssign
1 2.786 8 OC(=O)c1ccc[nH+]c1
1 2.786 8 [O-]C(=O)c1ccc[nH+]c1
Debugging showed me that
the getpKa calls return the proper values 2.8 and 4.2 BUT... mismatched. Unlike
cxcalc, which does it properly, the API thinks that 4.2 is the carboxyl and 2.8
the Py+! So far, so good - that would be "just" a misprediction: but
let's follow up. The code decides (in light of the mismatch) that the first
deportonation at 2.8 will be the one of the Py+, so the first output line is
"consistent". Then, the code sets the pH at slighly above 2.8 (say 3
- check code for gory details) and asks "so, what's the majoritary form at
pH=3". If it were wrong, but consistent, it should have said "well,
it's the HOOC-Py, because I just lost the most acidic group, Py+". Well...
here is where schizophrenia kicks in - because the major µspecies calculator is
not fooled, and returns CO[O-]Py+!! So this will be the input of the next round
of pK calculation: which is the first acidic pKa? Guess which - well, the Py+
at 2.8 (strangely, the apparition of an anion in the neighborhood does not seem
to affect the output). Curios to know what is the pKa of the -COOH in this
configuration? Well, NaN, of course - don't you see it's an anion, you fool??
Huh!! These API routines
have well-considered personal opinions, but they strongly disagree with them!!
The adventure happened with the 5.3 release. Enjoy!!