screenmd ECFP Tanimoto threshold

User ce5f27518b

16-06-2011 13:52:47

Hi all,

I want to use screenmd wih ECFP descriptor but I retrieve all compounds (with dissimilarity score higher than 0.2) even if a Tanimoto threshold is present in xml file. On contrary, Euclidean threshold seems to be respected (I try to put 20 and I retrieve compounds with score lower than 20)... Moreover, if I delete the line <ParametrizedMetric Name="Euclidean" ActiveFamily="Generic" Metric="

Euclidean" Threshold="10"/>, Euclidean score is still calculated with a threshold of 10 and Tanimoto one is still not respected... What do I do wrong?

My second question is: how to use ECFP with counts? I change no to yes at the line <Parameters Length="1024" Diameter="4" Counts="no"/> but it doesn't seem to change anything in scores.

Here is the command line:

screenmd ../ref_base_myid.sdf ../CDC.sdf -k ECFP -c ecfp.xml -o table hits.txt -o sdf hits.sdf -I MY_ID

with ecfp.xml =

<?xml version="1.0" encoding="UTF-8"?>
<ECFPConfiguration Version="0.1">

    <Parameters Length="1024" Diameter="4" Counts="no"/>

        <!-- Default atom properties (switched on by Value=1) -->
        <Property Name="AtomicNumber" Value="1"/>
        <Property Name="HeavyNeighborCount" Value="1"/>
        <Property Name="HCount" Value="1"/>
        <Property Name="FormalCharge" Value="1"/>
        <Property Name="IsRingAtom" Value="1"/>

        <!-- Other built-in atom properties (switched off by Value=0) -->
        <Property Name="ConnectionCount" Value="0"/>
        <Property Name="Valence" Value="0"/>
        <Property Name="Mass" Value="0"/>
        <Property Name="MassNumber" Value="0"/>
        <Property Name="HasAromaticBond" Value="0"/>
        <Property Name="IsTerminalAtom" Value="0"/>
        <Property Name="IsStereoAtom" Value="0"/>

    <StandardizerConfiguration Version="0.1">
            <Action ID="aromatize" Act="aromatize"/>
            <RemoveExplicitH ID="RemoveExplicitH" Groups="target"/>

            <ParametrizedMetric Name="Tanimoto" ActiveFamily="Generic" Metric="T
animoto" Threshold="0.2"/>
            <ParametrizedMetric Name="Euclidean" ActiveFamily="Generic" Metric="
Euclidean" Threshold="10"/>

Thanks for helping,


ChemAxon 4a2fc68cd1

21-06-2011 07:31:41

Hi Emilie,

First of all, screenmd accepts a target structure if the threshold condition is met for at least one metric by default, i.e. it combines the metric thresholds with "or" semantic by default. However, you can easily switch to "and" semantic with the "-m" command line option. (For the detailed help message, execute screenmd -x).

> Moreover, if I delete the line <ParametrizedMetric Name="Euclidean" .../>, Euclidean score is still calculated...

Yes, this is a bug indeed. We are going to fix it soon. However, there is a simple workaround for this problem: pass the list of required metrics using the -M option, e.g.:

screenmd ... -M Tanimoto
screenmd ... -M Tanimoto Euclidean -m

> My second question is: how to use ECFP with counts?

Currently, the ECFP counts are not considered during distance calculations. So the ECFP descriptor would be different using Counts="no" and Counts="yes", but the dissimilarity values would be exactly the same.

We plan to implement count-sensitive metrics in a future release.


User ce5f27518b

21-06-2011 13:03:44

Thanks for this answer!