ranking compounds by similarity - ChemAxon Forum Archive

User e67139a5fc

28-01-2011 05:16:56

Hi,

I need to be able to rank a list of small to medium sized organic compounds in order of their individual similarities to a reference compound - based on some similarity measure such as the Tanimoto coefficient.

I am new to J Chem and it would be great if someone could give me a run down on the steps needed to be able to do this.

I have downloaded J Chem base (as of yesterday) onto my computer (running windows xp) but that is pretty much as far as I could get.

Any help I could get would be greatly appreciated.

many thanks

Philip

ChemAxon efa1591b5a

01-02-2011 20:53:37

Hi Philip,

Have you tried InstantJChem yet? That's probably the easiest and most convenient tool for this kind of work. You can try it online here: http://www.chemaxon.com/products/online-tryouts/instant-jchem-via-webstart/

Does this help?

Regards,

Miklos

ChemAxon fa971619eb

01-02-2011 21:26:40

Yes, with Instant JChem you could easily create a database of your compounds and then search them with you reference compound(s) using similarity search. Other things are possible too, but I think that should meet you basic need. What you would need to do is:

1. run IJC and create a project with a local database
2. import your structures into that database
3. run a similarity search for you reference compound

For more details see these animations:

http://www.chemaxon.com/products/instant-jchem/instant-jchem-animations/

Tim

User e67139a5fc

02-02-2011 00:44:40

many thanks for that - IJC does exactly what we want...

User e67139a5fc

02-02-2011 01:07:57

have just been playing around with the similarity query...

After running a "structure - similarity" query I am assuming that the number above the compound structure in the Structure column of the resultant list is the associated Tanimoto coefficient (when using this as the similarity measure)?

However the resulting list does not strictly rank the compounds by these numbers (though there is a trend to the numbers) and compounds that are ranked lower (i.e. lower on the list) often have slightly higher numbers than compounds ranked above them (again I'm assuming these numbers refer to the associated Tanimoto coefficients).

Have I mistaken the what this number actually refers to?

ChemAxon fa971619eb

06-02-2011 17:20:13

Yes, you are right. The ordering is from most similar to least similar, and the numbers displayed are the similarity scores.

And yes, the number's are slightly inconsistent. This results from them being calculated in a slightly different way from the way used in the ordering of the search results.

We are going to try to remove this inconsistency.

Tim

ChemAxon a3d59b832c

07-02-2011 08:31:00

Hi all,

That is correct, currently the visualization routine in IJC is using a constant similarity measure. - And this is very likely to be different from the table's own similarity measure.

This situation will be solved in the next major version (5.5). The plan is that the displayed similarity score will always use the same settings as used in the table/query.

Best regards,

Szabolcs