User 55ffa2f197
29-01-2013 23:53:21
Hi, I have needs to do the clustering using Java code by calling the clustering API (JKclustor). I glanced through the API quickly did not find an example for it. I have a list of molecules as smiles which can be up to 20K, I read them in using MolImportor, wand to pass the smiles/molecule to cluster module, and get back the cluster ID, and also centeroid of the cluster if it is possible. Since the number of molecules can go up to 20K, I would like something running fast. Jarvis-partrick or Ward might be good one. Since this is going to be part of a larger promgram, i will not consider doing it in command line. Can you let me know the code sippet for doing so.
Thanks
Dong
ChemAxon 60613ab728
30-01-2013 16:07:09
Hi Dong,
Thank you for your request. We are going to provide you a
more detailed description about the currently available solutions for your
specific request by the beginning of the next week.
I also have to mention that JKlustor API is going to be redesigned
in the next couple of month, due to similar user demands like yours. As a major driving force for this process, we would like to
involve specific user requests into our development. We encourage you to
express your specific requests for these developments. It would help us a lot we
could have your feedback on our JKlustor API design plans.
Thank you,
Miklós
User 55ffa2f197
30-01-2013 16:41:07
Good to hear i am not all alone on this. I think there are well established theory and practice on clustering, the one that goes fast is Jarvis-Patrick with binary fp, such as the one implemented in Daylight Merline clustering, this is really old algrothm. The other one is RNN, which I used it in the past through Tripos toolkit, it also does reasonably well. I am aware the power of libMcs, but I think it is more for the cherry picking purpose with homogenerous mols. What I am after is realy a crude way to put mols into clusters defined losely by FP (ECFP or chemaxon FP, or even daylight). The cluster id, centroid attchached to compound id are the end result I am expecting. Actually those are the results PipeLine Pilot will produce.Oh of course if you can also provide the tanimoto distance of the members to the center that even better. I guess in the process of clustering you do have these as intermediate result, just need to define cluster object, and get them out. I think it is a matter of implmentation not a schollarly discussion. Just get something out quick
Thanks
Dong
ChemAxon 60613ab728
31-01-2013 15:09:05
Thank you for the more detailed user stories. We are going to include to most from your user stories into our plans and let you know our development plans soon.
Regarding your short term request, we are going to come back to you with the current capabilites early next week.
Thank you,
Miklos
ChemAxon 60613ab728
07-02-2013 13:23:14