User f52820d97e

10-07-2006 17:23:52

Hi,

Here is my latest analysis with a slightly bigger database (8146 structures) than before (see http://www.chemaxon.com/forum/ftopic1417.html).

This time I did choose the 1024-7-3 parameters for the chemically hashed fingerprints, and carried out tests with various parameters in the Jarvis-Patrick clustering. I did some statistics in terms of ratio cluster/singletons, and also some population analysis on the different clusters. Apart from a big cluster resulting from cominatorial chemistry, I have various sizes, although still a fair amount of "size 2" clusters (basically singletons?), which concerns me a bit... I guess now is the time to dig into them...

I am drawn to choose the parameters t=0.3 c=0.6 to reproduce the default parameters from Daylight (sorry to mention the competition... which I want to give up!). Although the metrics are not the same, reading the documentation I concluded that at least 0.6 is right:

What do you think of my conclusions?

Cheers,

Nicolas

Here is my latest analysis with a slightly bigger database (8146 structures) than before (see http://www.chemaxon.com/forum/ftopic1417.html).

This time I did choose the 1024-7-3 parameters for the chemically hashed fingerprints, and carried out tests with various parameters in the Jarvis-Patrick clustering. I did some statistics in terms of ratio cluster/singletons, and also some population analysis on the different clusters. Apart from a big cluster resulting from cominatorial chemistry, I have various sizes, although still a fair amount of "size 2" clusters (basically singletons?), which concerns me a bit... I guess now is the time to dig into them...

I am drawn to choose the parameters t=0.3 c=0.6 to reproduce the default parameters from Daylight (sorry to mention the competition... which I want to give up!). Although the metrics are not the same, reading the documentation I concluded that at least 0.6 is right:

- - fixed families of 16 nearest neighbours are constructed

- 2 compounds A and B cluster together if 10 out of their nearest neighbours are in common

- 9 (10 minus A) divided by 15 (16 minus B) gives 0.6, hence the parameter c...

What do you think of my conclusions?

Cheers,

Nicolas