jc_compare vs MDSet.getDissimilarity

User 9b067d2e85

09-04-2013 17:06:59

I'm trying to figure out why I'm not getting the same answers with both these methods (they are close, but not exact). I attahced the cfp.xml file I'm using (the length, bondcount and bit count are the same).


ArrayList<Molecule> mols = // list of mols from the same DB
String cfpXML = // cfp.xml property file
CFParameters cfpConfig = new CFParameters(cfpXML);
ChemicalFingerprint cf = new ChemicalFingerprint(cfpConfig);

MDSet md = new MDSet();
md.addDescriptor(cf);

MDSet querySet = new MDSet(md);
Molecule queryMol = // some query molecule
querySet.generate(queryMol);

for (Molecule mol : mols)
{
md.generate(mol);
float dis = md.getDissimilarity(querySet);
if (dis < 0.3)
// do stuff
}

vs


SELECT id FROM [table_name_with_jchem_index] WHERE jc_compare(structure, [queryMol in smiles], 't:i dissimilarityThreshold:0.3') = 1"

Thanks.

User 9b067d2e85

09-04-2013 17:52:32

To add to this, the result from the api seems to be more stringent than from jc_compare. If i run the results from jc_compare though MDSet.getDissimilarity(), some of the molecules don't actually pass the given threshold, so I believe it's a problem with the cfp.xml file. 


How would I set up that config file to get identical results from cartridge?

ChemAxon aa7c50abf8

09-04-2013 19:18:26

Please, could you tell us which JChem version is this? (I assume you are using the same JChem version for the Java API sample and the JChem Cartridge sample.)


Please, could you also provide the SQL used to create the JChem index on the table?


Thanks.

User 9b067d2e85

09-04-2013 21:32:40

Jchem 5.11.5 - yes same version between the API and cartridge.


CREATE INDEX FOO_STRJCIDX ON FOO(STRUCTURE)

INDEXTYPE IS jchem.jc_idxtype

PARAMETERS('std_config=aromatize:b,TDF=y')

ChemAxon 4a2fc68cd1

10-04-2013 14:43:20

Hi,


Note that the dissimilarity threshold is checked with <= comparison rather than < in similarity search, i.e. molecules with exactly 0.3 dissimilarity will be retrieved by the Cartridge command, while they are not retrieved by the Java code. Could you try whether this change in the Java code solves the issue?


If you still have differences, then it is most likely related to CFP configuration. The standardizer configuration and the main fingerprint parameters in the CFP configuration XML file must be identical to the parameters of the table. In particular, you would require the following standardizer configuration in the XML file:


<StandardizerConfiguration>
    <Actions>
        <Aromatize ID="Aromatize" Type="basic"/>
    </Actions>
</StandardizerConfiguration>


to reproduce the behavior of 'aromtize:b' standardization option of the table.


Best regards,
Peter

User 9b067d2e85

10-04-2013 16:27:23

Thanks.


It was the aromatize=basic that fixed it, but it's good to know that dissimilarity is <= threshold too.