Technical Support Forum Index
Technical Support Forum
Access ChemAxon scientists and developers here. For registration and login issues contact website support.

Support Ticket System is replacing forum

This forum was converted into a searchable archive. You cannot add posts here any more. For support please use our new Ticket System.

Create your first ticket
ECFP Feature Lookup Degeneracy
To watch this topic for replies  Register (enables digests) or give email address:
This topic is locked: you cannot edit posts or make replies.
Display posts from previous:   
    View previous topic :: View next topic    
Author Message

Joined: 13 Aug 2013
Posts: 1

View user's profile

Back to top
Link to postPosted: Wed May 14, 2014 6:02 amPost subject: ECFP Feature Lookup Degeneracy Reply with quote

Hi all,

When doing feature lookup (using the API), different identifiers that have been generated for the same molecule will encode the exact same SMARTS strings.  

For example, using generatemd to produce an ECFP_6 (default configuration otherwise) for the SMILES string "[H][C@@](N1CCC2=C(C1)C=CS2)(C(=O)OC)C1=CC=CC=C1Cl"  produces, among others, the identifiers -1216914296 and -1216914295.  When you look those up using the feature lookup API, they both return the SMARTS string *~[#6](~*)~*

This is confusing to me because in the little blurb explaining ECFP generation, it pretty explicitly mentions a duplicate removal step - "the removal of multiple identifier representations of equivalent atom neighborhoods".  If these aren't actually duplicates, then do they both correspond to the same SMARTS string?

I checked to see if it was a problem with how the molExporter class was translating fragments to SMARTS strings, but that doesn't seem to be the case.  Whatever problem/degeneracy there is exists in the molecule class objects that get returned by the ECFPFeatureLookup API.  

Another thing I've noticed (which I think is a legitimate bug) is that the ends of the fragments returned by the feature lookup API are not true wildcards, but instead any atom except H.  This results in situations where some of the fragments produced by generatemd and a subsequent feature lookup cannot be mapped back onto the molecule that produced them in a substructure search.  In the end, I had to go in and manually change the anything-but-H-"wildcards" to to true wildcards to get that to work.

Any help would be much appreciated.  Thank You!


Marvin Version: 6.2.1

JChem Version: 6.1.5 (?)

ChemAxon personnel
Joined: 29 May 2005
Posts: 317

View user's profile

Back to top
Link to postPosted: Fri Jul 18, 2014 5:30 pmPost subject: Reply with quote

Dear Stefano,


Please note that ECFP feature lookup generates SMARTS strings as an approximate visualization. It is possible that two neighborhoods have different features considered by the actual configuration but the same smarts is created. Please check the centrum atom of the two features and examine the neighborhood and ECFP config.




This topic is locked: you cannot edit posts or make replies.
Page 1 of 1

To watch this topic for replies   Register (enables digests) or give email address  
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum