Chemical Fingerprints

User ceb580837f

02-05-2006 11:07:42

Dear Sir(s),


recently I started to work with your chemical fingerprint in my research project! Unfortunately I was a little shocked because especially for sugars I get similarities (according to Tanimoto) of one for a lot of structures that are clearly not identical to nor each other nor to glucose!


I have added ten structures along with glucose in the attached smi-files!





Also based on this fact I was wondering what the maximum path length of your method is!





Thanks for your answer,


Ruud van Deursen

ChemAxon efa1591b5a

02-05-2006 11:33:28

Hi Ruud,





did you try the default configuration or your own?


Using the default I got this:





Code:
screenmd glucose_cf.smi glucose_cf.smi -k CF -M Tanimoto


        q1_CF_Tan       q2_CF_Tan       q3_CF_Tan       q4_CF_Tan       q5_CF_Tan       q6_CF_Tan       q7_CF_Tan      q8_CF_Tan       q9_CF_Tan       q10_CF_Tan      q11_CF_Tan


        0.00    0.16    0.12    0.15    0.20    0.17    0.15    0.17    0.17    0.16    0.15


        0.16    0.00    0.05    0.09    0.04    0.18    0.08    0.18    0.18    0.17    0.09


        0.12    0.05    0.00    0.04    0.09    0.14    0.06    0.14    0.14    0.13    0.04


        0.15    0.09    0.04    0.00    0.10    0.15    0.07    0.15    0.15    0.14    0.00


        0.20    0.04    0.09    0.10    0.00    0.17    0.07    0.17    0.17    0.16    0.10


        0.17    0.18    0.14    0.15    0.17    0.00    0.13    0.00    0.00    0.01    0.15


        0.15    0.08    0.06    0.07    0.07    0.13    0.00    0.13    0.13    0.12    0.07


        0.17    0.18    0.14    0.15    0.17    0.00    0.13    0.00    0.00    0.01    0.15


        0.17    0.18    0.14    0.15    0.17    0.00    0.13    0.00    0.00    0.01    0.15


        0.16    0.17    0.13    0.14    0.16    0.01    0.12    0.01    0.01    0.00    0.14


        0.15    0.09    0.04    0.00    0.10    0.15    0.07    0.15    0.15    0.14    0.00






which I think is quite reasonable.


In the default configuration the length of the fingerprint is 1024 bits, the path length is 7 and 3 bits are set in the fingerprint per each feature detected.





Which version of JChem are you using? What parameter setting did you try?





Thanks and regards,


Miklos

User ceb580837f

02-05-2006 11:52:03

Iused the following xml that is based on yours!


Basically it is identical! The only change is that I changed the tanimoto to 1 so that I get the values of all compounds even if they are completely dissimilar!

ChemAxon efa1591b5a

02-05-2006 14:12:43

Hello,


this configuration uses 512 bit fingerprints, the path length is 5 and the number of bits set per features is 2.


Apparently, this is not sufficient to distinguish between these highly similar structures!


Try to increase the path length, e.g. 7 should work.


Also, the fingerprint length should also be increased.





Could you try 1024, 7, 2 for instance? Did that work?


Thanks,





Miklos

User ceb580837f

02-05-2006 14:17:59

I changed the settings to 1024 bits and a path length of 10 and now there was a clear difference between the molecules! At least only glucose had similarity 1!

ChemAxon efa1591b5a

02-05-2006 14:22:00

Great!


Did you consider to use the BCUT descriptor? You don't need to tweak any parameters and you'll get reasonable results.


Regards,


Miklos