"Unfolding" a fixed-length ECFP

User 25f41f0c25

23-01-2012 09:43:33

Hi guys,


I'm using a fixed-length ECFP in binary format created with generatemd. Is it possible from this binary fingerprint representation to understand which bit is set up by which atom type?


Regards

ChemAxon efa1591b5a

25-01-2012 14:16:08

Hi, 


I'm afraid that this is not possible. To get this information you will need to use the unfolded variable length decimal representation. Those are not the atom types directly that set particular bits, the situation is more complex: the local neighbourhood (with a given radius) of each atom is taken into acount. This includes (may include) the atom type as well.


You may wish to follow this link to read more about the structure of ECFP.


Regards,


Miklos

User 25f41f0c25

25-01-2012 17:13:59

Hi Miklos,


Thank you for you answer. I red the description that you gave me. According to this information this should be possible by using this class. Is this correct or I haven't understood the document?


I have some other questions. Is it possible in the config files to set up poperties like this:


<Property Name="Test prop" Value="1"><![CDATA[ valence()*charge()]]></Property>
<Property Name="Test prop" Value="1">valence()*hCount()</Property>

Is the above lines correct and if not what is the correct form for such properties?


And what is the difference between these two properties:


<Property Name="Mass" Value="1"/>
<Property Name="MassNumber" Value="0"/>

Regards

ChemAxon efa1591b5a

26-01-2012 13:03:13

 


Hi,


According to this information this should be possible by using this class. Is this correct or I haven't understood the document?

That's correct. Here's a sample code of its typical usage:


   ECFPFeatureLookup lookup = new ECFPFeatureLookup();

   lookup.processMolecule(mol);
   for ( ECFPFeature f: lookup.getFeaturesFromIdentifier(key) ) {
       System.out.println(f.getSubstructure().toFormat("SMARTS"));
   }


 



Is the above lines correct and if not what is the correct form for such properties?

The CDATA form is correct as * is an XML metacharacter (that should be "protected").



<Property Name="Mass" Value="1"/>
<Property Name="MassNumber" Value="0"/>


The first refers to the atomic weight (the second is as its name says the mass number).


 


Does this help?


Miklos

User 25f41f0c25

08-02-2012 14:19:19










mvargyas wrote:

 


Hi,


According to this information this should be possible by using this class. Is this correct or I haven't understood the document?

That's correct. Here's a sample code of its typical usage:


   ECFPFeatureLookup lookup = new ECFPFeatureLookup();

   lookup.processMolecule(mol);
   for ( ECFPFeature f: lookup.getFeaturesFromIdentifier(key) ) {
       System.out.println(f.getSubstructure().toFormat("SMARTS"));
   }


 


 


Is the above lines correct and if not what is the correct form for such properties?

The CDATA form is correct as * is an XML metacharacter (that should be "protected").

 


 


<Property Name="Mass" Value="1"/>
<Property Name="MassNumber" Value="0"/>

 


The first refers to the atomic weight (the second is as its name says the mass number).


 


Does this help?


Miklos



Thank you for your answer. It was useful.


I have some other questions about ECFP.


What about H-atoms? If I'm using molecules with implicit H-atoms and "HCount" is set up in config file how this property is calculated? By free valence of atoms in molecules or...? And what is the situation wih molecules with explicit hydrogens?


Do I have to use CDATA form when I'm using -/+ operations in property definitions?


Regards

ChemAxon efa1591b5a

10-02-2012 10:16:19

Hi,


hcount() considers both explicit and implicit hydrogens, and yes, the implicit ones are aclculated form valence.


Use CDATA always with complex expresion just to be on the safe side.


Regards,


Miklos