Some basic help? - ChemAxon Forum Archive

User 9df74a15a4

05-11-2014 12:34:59

I'm working under Win7 (with Powershell), not Unix, so some of the example lines in the documentation don't seem to work (for me), plus I am not really a cmd line person. Also (or because of the latter) I find the options kind of confusion, esp for Jarvis-Patrick where there is a mix of jarp and generatemd; in other words, I could use some more basic guidance (not in the form of of - check the documentation; btw, the links in the sticky "JKluster, Screen...." don't work).

Anyway, what I have and like to do (aside from all GUI....):

An sdf or txt (smiles) file containg molecule and ID, no attached data (necessarily).

Cluster the contents according to a Tanimoto similarity (say 0.6) of Fingerprints; not pharmacophore or single reference compound based.

This means a Jarvis-Patrick method(?). An example from the documentation looks like it could do the trick

generatemd c input.smi -c CF -k cfp.xml -D | jarp -f 512 -t 0.1 -c 0.3 -g

but: cfp.xml (from the example folder) contains stuff not suitable for my case. Replacing with "-f 1024" (0r 512) gives an error. Also, why is it -D if jarp works with binary FPs?

Maybe I am not even looking at the correct example? Also, once this works, I guess the work-up is the next step, there are several not so obvious examples in the documentation - which might become more obvious once you can get there to play around with?). The way I would like to have the output is preferably one output file, but in the worst case a file per cluster (which though would suck if you have a large set and end up with 20+ Clusters and singletons) that can be viewed with Marvin showing the seeds of a cluster and the compounds of each cluster.

Thanks in advance.

ChemAxon d51151248d

12-11-2014 10:51:05

Hi Alexander,

Although you can use generatemd and jarp for clustering, a more convenient way is to use our JKlustor tool, which is available in many ChemAxon tools. Now I will give you a usage example based on the workflow you sent us.

  jklustor -v http://www.chemaxon.com/shared/libMCS/default.sdf  -c sphex:0.6 -d ecfp:tanimoto -o wrclus:sdf:frameworks.sdf -o "wrmols:sdf:cluster_*.sdf"

This example reads the default.sdf as an input file from our website, and performs a clustering based on the Sphere Exclusion model. The 0.6 is the minimal separation threshold for the clusters expressed in Tanimoto similarity. This option can be set with the -c option. The used fingerprint (ECFP) and the metrics (Tanimoto) for the clustering is set by the -d options. The -v sets the verbosity mode on.

To set the clustering output, I used here the wrclus and the wrmols options after -o: the first writes the cluster representants to an output file, while the second writes the cluster members to the output. In this case, the cluster representants are written to the frameworks.sdf file in SDF format, while the cluster members are written in separate files.

I hope this little example helps to get started with JKlustor. You can invoke jklustor -h to get a list of all options.

If you need further help, please contact us.

Daniel