!!About Tanimoto Dissimilarity

User 3d07a7b484

30-10-2013 22:24:20

Dear all, 


I am using Tanimoto dissimilarity function of yours in .NET platform to calculate similarity between two chemicals. Everything was perfectly fine, however I have realized something interesting when analyzing my data.


For the smiles codes given below with their PDB IDS, my code returns 1 as similarity score. (In other words 0 for dissimilarity score). 


PGE OCCOCCOCCO


PG4 OCCOCCOCCOCCO


PEG OCCOCCO


 


To my knowledge, similarity score is 1 when a chemical is only compared to itself. So I try to check similarity scores of the ligands given above in an online tool (http://chemmine.ucr.edu/) and similarity scores in there was like 0.5 or 0.6. 


So this is an important issue for me, this project is important for my academic study. I will appreciate it a lt if you enlighten me!


 


Thanks in advance,


Dolunay


 


Here is my code;


   CFParameters cfpConfig = new CFParameters();

string smiles1 = lg1.lSMILES;

string smiles2 = lg2.lSMILES;

ChemicalFingerprint cf1 = new ChemicalFingerprint(cfpConfig);

ChemicalFingerprint cf2 = new ChemicalFingerprint(cfpConfig);

cf1.generate(MolImporter.importMol(smiles1));

cf2.generate(MolImporter.importMol(smiles2));

 return (1 - cf1.getTanimoto(cf2));

User 3d07a7b484

31-10-2013 12:15:34

I really need an urgent help about this issue, please. I will appreciate it a lot if you spare a time.

ChemAxon 8b644e6bf4

31-10-2013 13:03:16

Dear Dolunay,


 


To my knowledge, similarity score is 1 when a chemical is only compared to itself.


In this case (binary fingerprints compared with tanimoto) the dissimilarity is 0.0 if (and only if) the compared descriptors are the same. The same descriptor usually can be generated to different structures.


So I try to check similarity scores of the ligands given above in an online tool (http://chemmine.ucr.edu/) and similarity scores in there was like 0.5 or 0.6. 


Generally different descriptors can be sensitive to different structural aspects, so the similarity scores from different descriptors can not be compared directly.


I suppose you used default CFP configuration which can not discriminate structures differing only in linear pathes longer than its set path length. For further details please see
http://www.chemaxon.com/jchem/doc/user/fingerprint.html.


If the discrimination of these structures are important i would recommend to increase path length in CFP config, try using ECFP (with large enough diameter) or consider structure based approaches (structure/substructure search).


Regards,


Gabor

User 3d07a7b484

31-10-2013 14:08:46

Dear Gabor,


 


Thanks for the reply. 


If the discrimination of these structures are important i would recommend to increase path length in CFP config, try using ECFP (with large enough diameter) or consider structure based approaches (structure/substructure search).

I think that could work fine. However, I am not sure what  the suitable path length could be and how to modify. Could you provide me an example, please?


 


Best Regards,


Dolunay

ChemAxon 8b644e6bf4

01-11-2013 14:06:22

Dear Dolunay


However, I am not sure what  the suitable path length could be and how to modify. Could you provide me an example, please?


You can set the max path length parameter from API (chemaxon.descriptors.CFParameters#setBondCount(int), see http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/descriptors/CFParameters.html#setBondCount(int)) or from the config XML (for an example see http://www.chemaxon.com/jchem/examples/config/cfp.xml).


The suitable value depends on the application. Could you describe your use cases?


 


regards,


Gabor

User 3d07a7b484

05-11-2013 14:55:08

Hi Gabor,


 


Thanks for your interest. I think my problem has been solved. I am working with a dataset from PDB and I realize that PDB also uses ChemAxon for compound similarity. So there is no problem about consistency now!


 


Best regards,


Dolunay