q1_CF_Tan info

User 22d10c9ed5

21-10-2010 09:20:49

Hi,


I'd like to inderstand better the output from a chemical similarity search. As below reported, If the threshold for the Tanimoto dissimilarity is 0.2 and in
the case of the Euclidean distance is 10, why there are not any values for q1_CF_Tan under 0.2?


The output is written into the terminal window where the command is executed.
Each row contains two number, the Tanimoto and the Euclidean dissimilarity
ratios between the query and one target structure. Note, that not all
dissimilarity values are displayed, only those that are lower than a predefined
dissimilarity threshold (this is 0.2 for the Tanimoto dissimilarity, and 10 in
the case of the Euclidean distance). If either values are below the threshold,
both are displayed, that is, the corresponding target structure is a virtual
hit.

A portion of the output of the above command is as
follows:




        q1_CF_Tan       q1_CF_Euc
0.66 9.70
0.48 8.43
0.58 9.49
0.52 9.54
0.52 9.75
0.62 9.80
0.56 9.00
0.61 9.95
0.64 9.54
0.62 10.00
0.61 9.75
0.59 9.38
0.51 9.33
0.49 8.66
0.45 8.72
0.38 8.06
0.49 9.64
0.50 9.22
0.50 9.22
0.50 9.22
0.50 9.22
0.50 9.22
0.50 9.22
0.50 9.22
0.35 7.87



One more question....in the chemical similarity search are parameters (such as LogP, PSA, LogSw...) considered???


Thanks a lot

ChemAxon efa1591b5a

22-10-2010 12:17:22

Hi Giampa,


The acceptance criteria for the two thresholds are in 'OR' relationship. That is, either the Tanimoto or the Euclidean is under the threshold the structure is accepted (as a similar one).


In order to introduce an 'AND' relationship one needs to add theĀ -m (or alternatively its longer form --metrics-and) option flag in the command line. In this case both similarity scores must meet the acceptance conditions.


Alternatively, one can specify which of the available metrics to use, e.g. Tanimoto only. To do so theĀ -M (or --metric) parameter can be used, like -M Tanimoto in the command line.


In your particular case the threshold 10 for the Euclidean appears to be too high in comparison to 0.2 for Tanimoto.


To consider chemical parameters in similarity searching one can use multiple descriptors at the same time, however, that is not yet fully supported in the command line (only via the API). An alternative and more sophisticated approach is to use the ECFP/FCFP (extended connectivity) fingerprint that will be available in version 5.4 soon (a beta version should be build within days). With FCFP one can introduce any parameters (including log P and PSA increments etc.) in the so called initial atom assignments, these are properties associated to each atom before the generation of the fingerprint.


If you are interested in this option please let me know. Then I will inform you about the availability of pre-releases and I will also send sample configuration files that show how to incorporate chemical features in the circular fingerprint.


Regards,


Miklos