issue with CFP descriptors

User 62771462e7

14-07-2011 07:57:42

Hi,


when I run this simple program:


import java.io.*;
import java.util.ArrayList;
import chemaxon.descriptors.*;
import chemaxon.struc.Molecule;
import chemaxon.formats.MolImporter;


public class SimilarityMatrix
{
    public static void main( String[] args )
    {
        try
        {
            // Create a DescriptorGenerator object to generate ChemicalFinerprints
            DescriptorGenerator gen = new DescriptorGenerator("CFP");
            // Allocate storage to store all fingeprints to be generated.
            ArrayList<int[]> fps = new ArrayList<int[]>();
            // Import all structure form the given input file
            // Generate and store all fingerprints.
            MolImporter mi = new MolImporter(args[0]);
            Molecule mol = new Molecule();
            mol = mi.read();
            while ( mol != null )
            {
                gen.generate(mol);
                fps.add( gen.getAsIntArray() );
                mol = mi.read();
            }
            // Allocate the similarity matrix.
            float[][] sim = new float[fps.size()][fps.size()];
            // Create a similarity calculator to work with Tanimoto metric.
            SimilarityCalculator sc = SimilarityCalculatorFactory.create("Tanimoto");
            // Calculate the similarity score for all paris and store results
            // in the similarity matrix.
            for ( int row = 0; row < fps.size(); row++ )
            {
                sc.setQueryFingerprint(fps.get(row));
                for ( int col = 0; col < fps.size(); col++ )
                {
                    sim[row][col] = sc.getSimilarity(fps.get(col));
                }
            }
        }
        catch ( Exception e )
        {
            e.printStackTrace( System.err );
        }
    }
}


 


I get the following error message:


java.lang.ClassNotFoundException: CFP
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:169)
    at chemaxon.descriptors.MolecularDescriptor.getClassFromDescriptorName(MolecularDescriptor.java:149)
    at chemaxon.descriptors.MolecularDescriptor.newInstance(MolecularDescriptor.java:116)
    at chemaxon.descriptors.DescriptorGenerator.<init>(DescriptorGenerator.java:108)
    at SimilarityMatrix.main(SimilarityMatrix.java:15)
java.lang.RuntimeException: CFP
Invalid or missing Molecular Descriptor type: CFP
    at chemaxon.descriptors.MolecularDescriptor.getClassFromDescriptorName(MolecularDescriptor.java:153)
    at chemaxon.descriptors.MolecularDescriptor.newInstance(MolecularDescriptor.java:116)
    at chemaxon.descriptors.DescriptorGenerator.<init>(DescriptorGenerator.java:108)
    at SimilarityMatrix.main(SimilarityMatrix.java:15)



Do you know what happens why this problem with CFP descriptors?


Thanks a lot,
Gonzalo

User 62771462e7

14-07-2011 08:10:35

Hi,


I think I found the error: it is CF instead of CFP.


Thx,


Gonzalo

ChemAxon efa1591b5a

14-07-2011 15:40:53

Hi, yes, that's correct, CF should be used.


Miklos

User 91f8768a43

21-07-2011 11:21:57

I have also problem connected with CFP, may be some of you can help me with it?



I would like to find out how to fix the difference in chemical hashed fingerprints, which are developed by ChemAxon as I understand.


Briefly, I use the 'sphere exclusion' clustering with cfp (as I know, it has 1024 bit length by default):



jklustor -v -c sphex:0.5 -d cfp:tanimoto uniq.sdf -o "wrclus:smiles:uniq05.txt:descs"





After I load cluster centroids into InstantJChem to perform self-overlapping calculation (Tanimoto similarity =0.45). If  I'm right, cfp has length in 512 bit here, so results are absolutely different and not comparable. Some coefficients even fall in range more than 0.5, which should be impossible for cluster centroids. Is anybody can help me to resolve this issue? Thank you in advance!

ChemAxon 8b644e6bf4

03-08-2011 17:04:20

Dear Gonzalo,


 


Some coefficients even fall in range more than 0.5, which should be impossible for cluster centroids.


Could you send an example molecule pair (if they are not confidential)?


Currently further fingerprint parameters can not be adjusted in jklustor.


 


Regards


Gabor