I have academic license of Chemaxon's Screen. I am using 'generatemd' command to generate hashed fingerprints. I use smiles string as an input to the command. I was wondering what kind of atom typing is used internally by 'generatemd' to produce fingerprints? And weather there is a way to obtain this information if its not proprietary?
there is no other typing than the element type of the atom. That is, all nitrogens are of the same type etc.
Does this respond your question?
Thanks for the reply. I have onother question regarding 'generatemd'. Is there a standard form of hashing scheme used to generate these fingerprints or is the information proprietary? I am writing a paper in which I use chemaxon generated fingerprints and want to describe the method used. I was wondering if there are any details that can be made available on exact hashing scheme employed for generated patterns, There is not a whole lot of information available in the user manuals on this.
We decided to make details of fingerprint generation publicly available. However, we have no suitable documentation (e.g. a white-paper) available at the moment.
The fingerprint page http://www.chemaxon.com/jchem/doc/user/fingerprint.html
will soon be extended, I'll let you know here in the forum area when it is done.
In the meantime, could you please give us some insight into your work that involves our fingerprint?
ChemAxon's chemical fingerprint, a brief overview (a more detailed and formal description will be provided elsewhere later)
type: topological hashed (folded) binary fingerprint
- each atom type is associated with a prime number
- similarly, each bond type is associated with a prime number
- these prime numbers are combined according to a recursive formula (f) along a path of given length
- various such formulae are applied depending on the number of bits to be set, the simplest is the product of the individual primes
- in practice, only two formulae are used (f1, f2), and if more than one bits are to be set per each path, then the second value (f2) is perturbed by semi-random numbers (thus giving f2', f2'' etc)
- the calculation of f results in the index of a bit position to be set in the fingerprint (if there are f1, f2 etc, all corresponding bits are set)
- then the fingerprint is folded (i.e. the remainder of f by a prime (the largest that is smaller than the predefined fingerprint length) is actually used to set the bit)
- the fingerprint of a structure is obtained by the fi bits set for all possible paths
- rings are also considered when fi values are calculated (practically, rings are also associated with another prime number which is used the same way as atom and bond primes)
- the situation is more complicated when query properties are involved (e.g. with SMARTS), this procedure is not outlined here
Hope this helps,
I have a question about a chemaxon poster from 2003: "Virtual screening using fingerprints Part I. A hybrid approach to ..." http://www.chemaxon.com/conf/A_hybrid_approach_to_pharmacophore_point_perception.pdf
In this poster you use Gasteiger charges to help define anions/cations/donors/acceptor/etc. Are you sure that these charges are actually relevant for determining whether an atom is one of the anions/cations/donors/acceptors?
I question this because:
the charge of the amine-N (pKa=8) in aniline Nc1ccccc1 is -0.35 (protonated: 0.26)
BUT the charge of the amine-N (pKa=15!) in trinitroaniline Nc1c(N(=O)O)cccc1 is similar: -0.35 (protonated: 0.26)
It seems there is no correlation between charge and pKa and/or atom type: anions/cations/donors/acceptors and that these charges are irrelevant for atom typing. Am I right?
Thanks for help,
sorry for reporting the wrong pKa values (how very sloppy of me!) earlier.
Here are the right ones (more extreme):
aniline-NH+ pKa ~5 (can be protonated)
trinitroaniline-NH+ pKa ~-9 (! should never be protonated!)
the charges are the same..
you're right: there is no correlation between charge and pKa. However, in my understanding neither the rules given on the top of the right hand side column on page 2 nor the text do not imply that, or do they?
Perhaps I miss something, where do you think the rub is in the explanation given on the poster you refer to?
Anyway, we are aware of the limitations of that simple method presented on that old poster, therefore a while ago we introduced new pharmacophore point type definitions that are based on more sophisticated calculations available in our new calculator plugins. For instance donor and acceptor properties are directly perceived by the HBDA plugin, which takes good care of protonation at given pH, and even detect intramolecular H-bonds.
Thus there is no need to play with the obscured ion charge - as you pointed out, there could be problems.
(See the attached XML file which is the configuration file of the pharmacophore point type mapper program.)
Just one more comment regarding the pKa and charge values you quoted. I reckon you did not calculate them by the Marvin Calculators, as I got different values. The pKa of the anilin N is 4.64 which agrees with the value you mentioned, however, for the trinitroaniline I got 2.43, which means that it is also protonated at pH 7. Regarding charges, those are -0.16 and -0.14, respectively.
What is the source of the figures you quoted in your post, are those pKa values experimental ones?
Thank you for the useful comments.
The poster first uses the pH and pKa to determine the charges. Then the rules on the poster imply a relationship of these charges with primarly anion/cation- and secondly with Hacc/Hdon-abilities. So, via these partial charges, an indirect relationship with pKa and the atom type is implied. It is good to know that there is an update of the method.
For my post I computed partial charges with openbabel (as I assumed these Gasteiger-Marsili charges were identical to that of Marvin) but the poster shows that these charge-determinations are clearly different: openbabel reports that N in n1ccccc1 has -0.26 (poster reports: -0.25) and that N in [n+]1ccccc1 has -0.36 (poster reports: -0.05), see attachment. Is Marvin bugged or is openbabel? (this might be a pro/pre/plus for Marvin)
Sorry that I was so sloppy with my first question: the SMILES of trinitroaniline should have been (of course):
Nc1c(N(=O)O)cc(N(=O)O)cc(N(=O)O)1 (pKa=-9). Maybe that alters your computed pKa value and partial charges of trinitroaniline (was 2.43 and -0.14)? As you point out, there apparently is a difference between the computed Marvin pKa's but it is not so big as difference between the experimentally determined values that I gave earlier (or was that because of my wrong SMILES string?).
Thank you for your clear reply, I look forward to your answers,
o.k. now I got your point. The poster uses the ion charge concept, that is, the charge of the atom in the most probable ionized (protonated, deprotonated) form of the structure. The donor rule says, that all Q-H (that is, a hetero atom with a hyrogen) and the =O atom of the corboxyl group is a hydrogen bond donor, except if the atom has a negative charge, in which case it was deprotonated.
Simple, but gives fairly good prediction in most typical cases.
Regarding prediction of pKa and charges: please feel free to try the calculator plugins online at http://www.chemaxon.com/demosite/marvin/index.html
. Draw (or paste) a structure, then click the Tools menu and select property from the list. Please report any wrong predictions you encounter!
The charge model used is an enhanced version of the Gasteiger-Marsili method. According to validation studies Marvin's charge and pKa predictions are very accurate. You may find the links below useful:
Details of the pKa calculation: http://www.chemaxon.com/marvin/chemaxon/marvin/help/pKa.html
Some aspects of the charge prediction: http://www.chemaxon.com/marvin/chemaxon/marvin/help/Charge.html
Thank you again. The charges that are computed by Marvin are simply different from those computed openbabel.
The pKa module of Marvin performs well for some compounds that I tested. unfortunately, the pKa help shows no info about how the pKa's are computed. If the pKa computation is based on the partial charges then I am sure that there is a bug in openbabel. Is this the case?
I am glad you found Marvin's predictions accurate. Good luck in using Marvin!