I am trying to make clusters with a 600 chemicals sdf file using Ward and Jarp, in order to see the differences between the two approaches.
However, things are not clear for me concerning the different parameters used in the pharma-frag.xml and cfp.xml files. Can anyone explain me? Which parameter can I change and what would be the result and why?
Moreover, I used the --T option creating my fingerprints in order to see the statistics concerning my chemicals. But I want to know if I am not wrong with my parameters. Can somebody explain what are the different terms in the statistic file? What are the lower and upper limit for each term? And why? I looked on the website and found some intersting things but I think I have not yet understood well everything.
Thank you very much in advance for all the answers you can give to me.
Thank you for your interest in our products. We advise to try the tools with the default parameter configuration first and try some customisation after gaining some experience with the software.
For instance, cluster centroids are directly comparable even using the default pharmacophore and the default topological fingerprint configurations.
Regarding your questions:
Parameters in the pharmacophore configuration file are explained in this document: http://www.chemaxon.com/jchem/doc/user/PMapper.html#ruleexample.
|However, things are not clear for me concerning the different parameters used in the pharma-frag.xml and cfp.xml files. Can anyone explain me? Which parameter can I change and what would be the result and why? |
Parameters used in the configuration of the chemical topological fingerprint is best explained here: http://www.chemaxon.com/jchem/doc/user/fingerprint.html#effect, details of the xml configuration file are discussed in this document: http://www.chemaxon.com/jchem/doc/user/GenerateMD.html#config.
You can change any parameter in the configuration file, the affect of some of them is explained in the documents referred above.
- Number of molecules: that's clear, right?
|Can somebody explain what are the different terms in the statistic file? |
- Number of bits set:
Average: average number of bits set to 1 in all fingerprints generated
A very important value, it should not exceed ~60-70% as than the fingerprints become less rich in information, they cannot properly represent distinct molecules.
Maximum: in which molecule's fingerprint was the largest the number of 1 bits and how much was that (in percentage)
Minimum: as above, but the lowest bitcount....
- Density function: the distribution of 1 bits on a finer scale. the above 2 values told the absolute range, this histogram describes how that range is populated by various fingeprints. Is that clear?
20%-30% 40.00% tells that 40% of fingerprints being generated had 1 bits between 20-30% of total fingerprint bits (e.g. 1024)
- Cell frequencies:
index: fingerprint bit index form 0 to 1023 (if your fingerprint was 1024 bit long)
freq: frequency count of that bt position: how many times (in how many fingerprints) it was set to 1
%: same as freq but not absolute frequency count but relative in percentage
Which term you mean? Please be more specific.
|What are the lower and upper limit for each term? |
Thank you very much Miklos for your answers.
I also continued informing by other means and now, all is clear.