Tanimoto index in JChem Excel - ChemAxon Forum Archive

User 949a15025b

01-11-2011 10:44:49

This is probably a stupid question, sorry for that, but I'm quite new in this world of chemical software...

I'm intrested in calculating Tanimoto indexes (dissimilarity values based on binary strings) and get the result as a number between 0 and 1. As far as I'm concerned the classic definition of Tanimoto index is as follow: T= N_AB/(N_A+N_B-N_AB), and I would like the result to based on that equation.

As I'm working with very large data sets, SMILES in Excel, I would very much like to use your dissmilarity functions in JChem Excel for simplicity reasons. Is the Chemical fingerprint function what I'm looking for? If not, what is the chemical fingerprint function based on?

Thank you very much!

ChemAxon 0e37943a96

03-11-2011 09:50:12

Hi,

you can find additional description at the following link:

http://www.chemaxon.com/jchem/doc/user/ScreenMD.html

In JChem for Excel we just use basic parameters. If you need any extra options, or have further questions, then let us know.

Best regards,

Tamas.

User 949a15025b

14-11-2011 15:51:48

Dear Tamas,

Thanks for the reply!

The important thing for me is that the Tanimoto Coefficient calculations are based on the similarity between the substructures of each molecular structure (e.a. binary fingerprints). However, I don't require the calculations to be 100% exact.

What do you mean with "In JChem for Excel we just use basic parameters"? Smaller binary fingerprints of each structure?

Thanks again!

ChemAxon 0e37943a96

17-11-2011 08:43:21

Dear Damon,

in API terms we set up the MDParameters without specifying any additional options.

https://www.chemaxon.com/jchem/doc/dev/java/api/index.html

I will redirect your questions to a different forum area, where our collegaues will be able to provide more scientific explanation.

Best regards,

Tamas.

ChemAxon a3d59b832c

17-11-2011 11:37:10

Hi,

Yes, we provide the Tanimoto coefficient. That is the default option in (dis)similarity calculation.

See more details here: http://www.chemaxon.com/jchem4excel/userguide/dissimilarity.html

http://www.chemaxon.com/jchem/doc/user/fingerprint.html

http://www.chemaxon.com/jchem/doc/dev/search/index.html#simil

Best regards,

Szabolcs

User 949a15025b

10-12-2011 11:42:28

Hi again,

I have now used the CF Fingerprint function for calculating Tanimoto coefficients for many million molecule combinations.

However, when comparing some JChem Excel calculated Tanimoto coefficients with coefficients calculated with Daylight, I found small differences between the result. Coefficients calculated with Jchem Excel was predominantely lower than when calculated with Daylight.

How can this be explained? Is the tanimoto Coefficient function in JChem Excel less sofisticated? Are less structural patterns considered when calculating the fingerprints.

I would also appreciate a brief explanation of how the Coefficient is yield from the SMILES strings in Excel when using the CF Fingerprint Tanimoto function. I can't figure this out by reading the recommended links above.

Thanks!

ChemAxon a3d59b832c

13-12-2011 14:48:57

Hi,

However, when comparing some JChem Excel calculated Tanimoto
coefficients with coefficients calculated with Daylight, I found small
differences between the result. Coefficients calculated with Jchem Excel
was predominantely lower than when calculated with Daylight.

How can this be explained? Is the tanimoto Coefficient function in
JChem Excel less sofisticated? Are less structural patterns considered
when calculating the fingerprints.

The Daylight algorithm for calculating the fingerprints is not fully disclosed, so we do not know all details of their algorithm. It is possible that our chemical hashed fingerprint contains less structural patterns. Please note that the number of patterns encoded is also dependent on the fingerprint parameters. If you would like to play around with it, you can experiment with higher fingerprint path lengths, for example.

I would also appreciate a brief explanation of how the Coefficient is
yield from the SMILES strings in Excel when using the CF Fingerprint
Tanimoto function. I can't figure this out by reading the recommended
links above.

First a Molecule object (graph) is created from the smiles string. Then all sub-graph patterns contained within this graph that correspond the fingerprint parameters are exhaustively enumerated. From each such sub-graphs, several bit positions are calculated by a hashing function. In the fingerprint, these bits are set to 1.

I hope this helps.

Best regards,

Szabolcs