User 1d259ba1ce
16-05-2008 14:48:43
Hi all,
I have 2 huge databases. What's the best mehod to cluster them?
Actually I've calculated the CF of the sdf files with this command:
> generatemd c input.sdf -k CF -c /usr/local/jchem/examples/config/cfp.xml -D -v -o fingerprints.txt
Now I'm trying ward (changed the HEAP_LIMIT to 1024) with this command
>ward -f 512 -g -K kelley.txt <fingerprints.txt >neighborlists.txt
Is Kelley ok to determine the best level of clustering?Is it a suitable calculation (not too time consuming...stil running after 300minutes on a Xeon 2.8GHZ)?
Is there a better/best procedure to cluster these huge databases?
Many thanks
Andrea
I have 2 huge databases. What's the best mehod to cluster them?
Actually I've calculated the CF of the sdf files with this command:
> generatemd c input.sdf -k CF -c /usr/local/jchem/examples/config/cfp.xml -D -v -o fingerprints.txt
Now I'm trying ward (changed the HEAP_LIMIT to 1024) with this command
>ward -f 512 -g -K kelley.txt <fingerprints.txt >neighborlists.txt
Is Kelley ok to determine the best level of clustering?Is it a suitable calculation (not too time consuming...stil running after 300minutes on a Xeon 2.8GHZ)?
Is there a better/best procedure to cluster these huge databases?
Many thanks
Andrea