Technical Support Forum Index
Technical Support Forum
Access ChemAxon scientists and developers here. For registration and login issues contact website support.

Support Ticket System is replacing forum

This forum was converted into a searchable archive. You cannot add posts here any more. For support please use our new Ticket System.

Create your first ticket
CF/ECFP Fingerprint representations
To watch this topic for replies  Register (enables digests) or give email address:
This topic is locked: you cannot edit posts or make replies.
Display posts from previous:   
    View previous topic :: View next topic    
Author Message
Florian

Joined: 28 Jan 2014
Posts: 1

View user's profile

Back to top
Link to postPosted: Sat Jul 12, 2014 5:58 pmPost subject: CF/ECFP Fingerprint representations Reply with quote

I have a question regarding the different representations of the hashed fingerprints.

It was my understanding that using option -D in generatemd always results in all hash codes present in molecule, while using options -2 in a folded version of the fingerprint (specified with parameter -f ).

For the ECFP generatemd behaves like I would expect it to do: If using option -D, always the same list of hash codes is produced (independent of specified fngerprint length), while the bitstring representations are different.

But the situation is different for the CF descriptor: The total number of produced integer hash codes with option -D varies with the specified fingerprint length. But the output is the same if called multiple times with the same length parameter. Why does the fingerprint length influence the generated hash codes?

Gabor
ChemAxon personnel
Joined: 29 May 2005
Posts: 317

View user's profile

Back to top
Link to postPosted: Fri Jul 18, 2014 5:25 pmPost subject: Reply with quote

Dear Florian,

 

In case of the chemical fingerprint the individual hashes for the features are not available (like in ECFP). The decimal representation of the chemical fingerprint is the folded binary representation; packed into 32 bit integers (folding of feature hashes is done during CF generation).

I would like to note that in the new descriptors API we expose the folded binary representation by default.

Which representation (feature hashes or folded binary string) is relevant to your usage?

Regards,

Gabor

This topic is locked: you cannot edit posts or make replies.
Page 1 of 1


To watch this topic for replies   Register (enables digests) or give email address  
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum