Technical Support Forum Index
Technical Support Forum
Access ChemAxon scientists and developers here. For registration and login issues contact website support.

Support Ticket System is replacing forum

This forum was converted into a searchable archive. You cannot add posts here any more. For support please use our new Ticket System.

Create your first ticket
!!About Tanimoto Dissimilarity
To watch this topic for replies  Register (enables digests) or give email address:
This topic is locked: you cannot edit posts or make replies.
Display posts from previous:   
    View previous topic :: View next topic    
Author Message
dolunay

Joined: 15 Feb 2013
Posts: 15

View user's profile

Back to top
Link to postPosted: Wed Oct 30, 2013 11:24 pmPost subject: !!About Tanimoto Dissimilarity Reply with quote

Dear all, 

I am using Tanimoto dissimilarity function of yours in .NET platform to calculate similarity between two chemicals. Everything was perfectly fine, however I have realized something interesting when analyzing my data.

For the smiles codes given below with their PDB IDS, my code returns 1 as similarity score. (In other words 0 for dissimilarity score). 

PGE OCCOCCOCCO

PG4 OCCOCCOCCOCCO

PEG OCCOCCO

 

To my knowledge, similarity score is 1 when a chemical is only compared to itself. So I try to check similarity scores of the ligands given above in an online tool (http://chemmine.ucr.edu/) and similarity scores in there was like 0.5 or 0.6. 

So this is an important issue for me, this project is important for my academic study. I will appreciate it a lt if you enlighten me!

 

Thanks in advance,

Dolunay

 

Here is my code;

   CFParameters cfpConfig = new CFParameters();
string smiles1 = lg1.lSMILES;

string smiles2 = lg2.lSMILES;
ChemicalFingerprint cf1 = new ChemicalFingerprint(cfpConfig);
ChemicalFingerprint cf2 = new ChemicalFingerprint(cfpConfig);
cf1.generate(MolImporter.importMol(smiles1));
cf2.generate(MolImporter.importMol(smiles2));
 return (1 - cf1.getTanimoto(cf2));
dolunay

Joined: 15 Feb 2013
Posts: 15

View user's profile

Back to top
Link to postPosted: Thu Oct 31, 2013 1:15 pmPost subject: Reply with quote

I really need an urgent help about this issue, please. I will appreciate it a lot if you spare a time.

Gabor
ChemAxon personnel
Joined: 29 May 2005
Posts: 317

View user's profile

Back to top
Link to postPosted: Thu Oct 31, 2013 2:03 pmPost subject: Reply with quote

Dear Dolunay,

 

To my knowledge, similarity score is 1 when a chemical is only compared to itself.

In this case (binary fingerprints compared with tanimoto) the dissimilarity is 0.0 if (and only if) the compared descriptors are the same. The same descriptor usually can be generated to different structures.

So I try to check similarity scores of the ligands given above in an online tool (http://chemmine.ucr.edu/) and similarity scores in there was like 0.5 or 0.6. 

Generally different descriptors can be sensitive to different structural aspects, so the similarity scores from different descriptors can not be compared directly.

I suppose you used default CFP configuration which can not discriminate structures differing only in linear pathes longer than its set path length. For further details please see
http://www.chemaxon.com/jchem/doc/user/fingerprint.html.

If the discrimination of these structures are important i would recommend to increase path length in CFP config, try using ECFP (with large enough diameter) or consider structure based approaches (structure/substructure search).

Regards,

Gabor

dolunay

Joined: 15 Feb 2013
Posts: 15

View user's profile

Back to top
Link to postPosted: Thu Oct 31, 2013 3:08 pmPost subject: Reply with quote

Dear Gabor,

 

Thanks for the reply. 

If the discrimination of these structures are important i would recommend to increase path length in CFP config, try using ECFP (with large enough diameter) or consider structure based approaches (structure/substructure search).

I think that could work fine. However, I am not sure what  the suitable path length could be and how to modify. Could you provide me an example, please?

 

Best Regards,

Dolunay

Gabor
ChemAxon personnel
Joined: 29 May 2005
Posts: 317

View user's profile

Back to top
Link to postPosted: Fri Nov 01, 2013 3:06 pmPost subject: Reply with quote

Dear Dolunay

However, I am not sure what  the suitable path length could be and how to modify. Could you provide me an example, please?

You can set the max path length parameter from API (chemaxon.descriptors.CFParameters#setBondCount(int), see http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/descriptors/CFParameters.html#setBondCount(int)) or from the config XML (for an example see http://www.chemaxon.com/jchem/examples/config/cfp.xml).

The suitable value depends on the application. Could you describe your use cases?

 

regards,

Gabor

dolunay

Joined: 15 Feb 2013
Posts: 15

View user's profile

Back to top
Link to postPosted: Tue Nov 05, 2013 3:55 pmPost subject: Reply with quote

Hi Gabor,

 

Thanks for your interest. I think my problem has been solved. I am working with a dataset from PDB and I realize that PDB also uses ChemAxon for compound similarity. So there is no problem about consistency now!

 

Best regards,

Dolunay

This topic is locked: you cannot edit posts or make replies.
Page 1 of 1


To watch this topic for replies   Register (enables digests) or give email address  
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum