I was using cxcalc for the generation of ensembles of reasonable protonation states of compounds for a certain pH range. So far, I could not figure out a way to do that in a staight forward manner. Because I would like to do this task for thousands or millons of compounds, I cannot do it with the GUI. I could automize the GUI, but I would rather tent do circumvent it and do it on a command line basis.
Is there a protocol to efficiently generate such an ensemble that saves the 3D structure.
Starting from a smiles string or a 3D structure of a compound, I would like to end up with a file with energy minimized 3D structures, that containes the compound in all protonation states that occure with a certain probability -- e.g. +80% -- at a certain pH, or alternatively in a certain pH range.
This can be done with cxcalc command line application.
1. Create the major microspecies (major protonation state) of molecule at a given pH:
$ cxcalc majorms -H 11 demo10.sdf
Same at pH 7.4, the output is wirtten to out1.sdf SDF file.
cxcalc -o out1.sdf majorms -H 7.4 demo10.sdf -f sdf
2. Create lowest energy 3D conformers from the previously created major microspecies:
cxcalc -o out2.sdf leconformer out1.sdf -f sdf
The output file (out2.sdf) contains the lowest energy 3D structures of major microspecies taken from out1.sdf file.
On unix/linux system the two steps can be performed together using pipes:
cxcalc majorms -H 7.4 demo10.sdf -f sdf | cxcalc -o out.sdf leconformer
Input can be in any file format we support. I used sdf in the example, but it can be also SMILES.
Terrific! What happen, when two different protonation states with roughly the same probability occur?
e.g. three possible states with 51%, 48% and 1% probability at a certain pH.
As I understood, the major-ms would generate the state according to 51%.
It is possible to generate an ensemble of structures, that occur with e.g. more than 20% OR more than 0% at least?
"majorms" calculation returns the microspecies with the highest distribution. E. g. if there are microspecies with distributions 45%, 35%, 20%, then it will return the microspecies with 45% distibution.
"microspeciesdistribution" calculation calculates the microspecies distribution. It returns the list of microspecies with distributions at given pH.
cxcalc microspeciesdistribution --pH 4.0 input.sdf