Difficulties establishing pKa training set

I am using the latest versions of Instant JChem and MarvinSketch (6.0 and 6.0.0, respectively), and am experiencing great difficulties in developing pKa correction libraries. The pKa predictor plugin in MarvinSketch functions properly, but after multiple attempts at developing a pKa correction library, I still see no available option in the drop-down menu titled "Correction library" on the predictor screen.


In essence, the "step-by-step directions" do not mesh with what I see on-screen in the GUI in JChem, and I have been unable to locate where to access the cxtrain command line in either (thus preventing me from exploring that route to library generation).


I have been able to generate tabular data with structures in JChem for Excel, and have successfully imported the data into Instant JChem with associated experimental pKa values and atom ID labels (the fields labeled "pKa1" and "ID1", respectively, as instructed), but that is as far as I have been able to progress.


Are there updated step-by-step instructions that mesh with the current version(s) available, and/or where does one - *exactly* - locate the functional command prompt to be utilized with cxtrain?


Edit: Running aforementioned software on a 64-bit, Windows 7-based machine (appropriate installations of ChemAxon software confirmed)


Edit (2): Running training module in Instant JChem affords error "[pKa correction library file name] not valid"

I have downloaded the  "marvinbeans"  installer files from the Chemaxon's site. Which is an "exe" file that can be installed on windows OS. 

After installing the "marvinbeans" I have opened the "bin" directory , this directory contains the runable scripts files  which are  "windows batch files" . See the attached figure "cxtrain1.png". There is the "cxtrain" program which can be used for training.


I have opened the windows command line editor  the "cmd". It is  given on the "cxtrain2.png" figure.


After this  I have used the pKa training command :  "cxtrain pka -i firstpKaModel d:/molekula/pKaT.sdf" .  This also shown on the 2nd figure.


Finally I opened the marvin sketch in order to check whether the newly created pKa model available or not. This is shown on the "cxtrain3.png" figure.


I  hope this info help you. 


I agree that the documentation is very pour and difficult to understand.  So , Chemaxon should pay more attention to imporve quality of calculator documentation.  



Thanks for your instructive and informative response, Jozsi. I will re-attempt creating the library with your suggestions and update the thread with those results shortly.


Leave it to bench chemists to have difficulty with computational software, hah.




Edit: Again, excellent instructions Jozsi. Thanks. I ran the cxtrain command as instructed, and MarvinSketch correctly identified the training library. However, upon utilization of the correction library, MarvinSketch predicted an aromatic C-H proton to be highly acidic (pKa approx 6), which I know does not agree with physical reality. I suspect this arose from an improper atom identification number in the input file.   To be clear (so that I understand correctly), we are to label the acidic *hydrogens* with column field name "ID1", corresponding to the experimental pKa labeled in columnar form as "pKa1", correct? (As opposed to labeling, say, the -OH oxygen in a carboxylic acid, we label the -OH hydrogen, right?) This is the impression I am under from viewing the pKa training data set from ChemAxon's website (where an acidic  -NH  hydrogen is labeled/numbered).

Finally, in the case of acidic methylene units (such as the methylene group in diethyl malonate or acetylacetone), if we are labeling atom numbers of hydrogen atoms, as described above, do we pick one of the two acidic -CH2- protons and number/report pKa only once, for one H?


Relevant to the above, the images below show: (1, "JChemXLS_data") the "raw" tabular data generated using JChem for Excel, (2, "Mol_structs") the successful-appearing export to an appropriate .sdf file, and (3 "pka_trainedpredict") the post-training anomalously acidic aryl C-H hydrogen.


Your expectation is correct. The pKa of  the aromatic "CH" is not so strong.

Again the  documentation was mistakeable.  In the column of the atom index of the "ID1...etc",the "Heavy atom" index should be specified. The "Heavy atom" is the holder of an acidic hydrogen atom. See attached figure.  Only one pKa can be assigned to a "heavy atom".