ECFP/FCFP Fingerprints in JChem

User 2347372188

30-10-2009 17:32:42

Would it be possible to implement in JChem the ECFP/FCFP fingerprints used in Pipeline Pilot?  As I understand it, the Pipeline Pilot folks simply implemented the Morgan algorithm in order to produce their ECFP/FCFP fingerprints.  It should be possible and legal for ChemAxon to do the same thing.  These fingerprints would be a great addition to JChem!! 


-&

ChemAxon efa1591b5a

30-10-2009 17:48:59

Hi Steven,


Indeed, you're absolutely right that it'd be a great addition to JChem! We have already received similar requests, though we are not 100% sure about the legal aspects of this work.


In my understanding though ECFP is based on the Morgan algorithm but with some additional features. 


We will investigate this, though in case if you have any information, it would be great if you could share it with us (regarding either the exact algorithm or the IP issues).


Thank you for your suggestion, we will consider it in our future developments.


 


Best regards,


Miklos


 

ChemAxon efa1591b5a

01-12-2010 12:53:14

Hi All,


ChemAxon's ECFP/FCFP implementation was released as part of JChem 5.4


Related documentation:




Regards


Miklos

User b7b28a49c5

28-12-2010 16:15:54

Hi! 


I'm a newbie on Jchem and I came to this problem. I'm using JChem 5.4 libraries to calculate several descriptors, but when I try to calculate ECFP and Shape Descriptor, exceptions says that I have no license for their use and I should request academic license for those products, which I've already did. By the way, BCUT, ChemicalFingerprint, PharmacophoreFingerprint and ScalarsDescriptors works fine


Thanks for any help!!


Marko

ChemAxon 4a2fc68cd1

03-01-2011 15:05:56

Hi Marko!


I think you have an out-of-date license file, which does not contain the new license keys required by ECFP and ShapeDescriptor.


The old descriptors require the "Molecular Descriptors" license key, but the new descriptors introduced in JChem 5.4 require new, separate license keys. Namely, ECFP requires the "ECFP/FCFP" key and ShapeDescriptor requires "3D Screen" key. So it seems that your license file does not contains these keys. Could you please check it? When did you receive the license file?


Peter

User b7b28a49c5

03-01-2011 15:24:34

Peter:


I downloaded the license on December 27th; I parsed values of software fields in the license and can't see keys for Molecular descriptors or ECFP/FCFP and ShapeDescriptor...  I post them below.


Thanks for your answer,


Marko


Marvin Applets, Marvin Beans, Instant JChem, JChem Base, JChem Cartridge, Standardizer, Screen, Reactor, Fragmenter, JKlustor, Metabolizer, Markush Search, Protonation Plugin Group, Partitioning Plugin Group, Charge Plugin Group, Isomers Plugin Group, Conformation Plugin Group, Geometry Plugin Group, Huckel Analysis Plugin, Refractivity Plugin, HBDA Plugin, Markush Enumeration Plugin, Structure to Name Plugin, Name to Structure, JChem for Excel, Structure Search, IUPAC naming plugin, Structural Frameworks Plugin, Web Services Server, Calculations Pack, Structure Checker

ChemAxon 4a2fc68cd1

03-01-2011 20:40:11

Hi Marko,


I contacted the Sales team. It turned out that the academic license was not updated properly for the new release. They are fixing this problem and you will receive an email notification when the correct license file is available.


Sorry for the inconvenience.


Peter

User b7b28a49c5

03-01-2011 20:45:28

Peter:


Thanks for your help!!. I will be waiting for the notification and let you know how it goes.


Greetings,


Marko

User b7b28a49c5

06-01-2011 13:04:01

Hi Peter:


 


János Fejérvári contacted me and provided a new license; now it works well. Thank you both!!


Greetings,


Marko 


 

ChemAxon 4a2fc68cd1

06-01-2011 13:20:07

Hi Marko,


Good. Thank you for the feedback.


Peter

User bf5ddf6381

20-05-2011 09:07:52

Hi!


Is there any possibility to use ECFP fingerprints instead of Chemical Hashed fingerprints in Overlap Analysis using Instant JChem? If yes, how is it possible to perform?


 


Thanks in advance!

ChemAxon fa971619eb

23-05-2011 06:56:50

We don't yet have support for ECFP fingerprints in IJC. We do plan to add support, but for now you must stick with the built in chemical hashed fingerprints.


Tim

User bf5ddf6381

23-05-2011 08:10:58

Thank you for your reply!

User 2347372188

15-06-2011 17:14:03

I see you've added a getter and setter for "KeepCounts" in ECFPParameters.  However, there doesn't seem to be an easy way to get at feature counts.  The only way I can see to get feature counts is via the ECFPFeatureLookup class.  If there isn't an easy way, can you implement one?  Thanks.


-&

ChemAxon 4a2fc68cd1

16-06-2011 10:14:15

Hi Steven,

Common descriptor handling tools (e.g. the DescriptorGenerator class) provide simple formats for representing fingerprints, such as String or int[]. In the current implementation, we store ECFP identifiers multiple times in these representations instead of providing a list of (id, count) pairs.

For example, if the ECFC fingeprint is { (1234, 3), (4567, 1), (7890, 2) }, then the current implementations provide the following result in int[] or String format:
    1234, 1234, 1234, 4567, 7890, 7890
compared to the standard ECFP fingerprint which is
    1234, 4567, 7890

We thought that
    1234, 3, 4567, 1, 7890, 2
in a flat integer array or string representation would be misleading, as the numbers in the list play different roles
(id and count). That's why we implemented the above representation, which is less structured, but clear and compatible with the original version without counts.

The current representation (identifier list with multiplicity) can be obtained the following ways:
    - GenerateMD command line application: use -D option.
    - DescriptorGenerator class API: use getAsIntArray() or getAsString()
    - ECFP class API: use toIntArray() or toString()

In ECFP class, we can implement a query function which meets your requirements better (e.g. returning a list of FeatureWithCount objects), but it would not be available through common descriptor handling tools (GenerateMD command line application and DescriptorGenerator class API), because they only support simpler, flat formats.

If we implemented a well-structured query function, which would be available only through ECFP class API, then would it be appropriate for you?


Regards,
Peter

User 2347372188

16-06-2011 15:44:19

Thanks for the explanation.  This makes sense.  I have been using the method ECFP.toIdentifierSet().  Obviously, a Set cannot have more than one copy of a feature key.  It appears that I will have to use the methods you suggested instead. 


I might suggest adding a method like toFeatureMap<Integer,Integer> where the key is the feature key and the value is the corresponding feature count.


-&

ChemAxon 4a2fc68cd1

16-06-2011 16:28:54

> I might suggest adding a method like
toFeatureMap<Integer,Integer> where the key is the feature key and
the value is the corresponding feature count.


Thank you, it seems to be a good idea.


Peter