Annotated compound library

30-03-2005 21:00:06

Our company buy and sell chemicals for HTS and combichem. Each quarter we prepare a new CD with 500 000 chemicals.





We would like to add some information for each compound that describes its predicted activity for different targets, for example GPCRs, Kinase Inhibitors, Natural Marine chemicals, Protease Inhibitors, Nuclear Receptors, Ion Channel.





I do not know how to start the work with your scripts.

ChemAxon efa1591b5a

30-03-2005 22:28:45

Hi,





various tools available in the Screen package can be applied for library annotation.





The easiest, recommended for less advanced users' of the screen package, is the screenmd application.


Let's suppose you want to predict the activity of your library compounds on the ACE receptor. You have a set of known ACE inhibitors (let say 20 molecules) and you want to use these structures as reference compounds.





To score the estimated ACE activity of your compounds you can use the dissimilarity ratio of pairs of compounds. That is, you compare each structure in your library against known ACE actives, calculate the dissimilarity value (e.g. using Tanimoto metric) and store this value in the SDfile in a custum field.





You will also need to decide which molecular descriptor suits your need best. To start off with I recommend chemical fingerprint, later more advanced descriptors (e.g. BCUT, pharmacohore fingerprint) can also be used.





There is one more decision you need to make: how are known actives considered in dissimilarity calculation. Comparing against individual structures (that is, all 20 you have) is not practical for many reasons, thus you decide to use a hypothesis fingerprint. The simplest is the minimum, or consensus fingerprint.





A preparation step is also needed. What screen normally does is to filter out compounds that are similar to the query structures. There is a predefined threshold for dissimilarity, compounds above this limit are rejected (filetered out). However, such behaviour is undesired here, as all original compounds have to be kept. Therefore, the dissimilarity threshold has to be adjusted.


In order to do it the file cfp.xml located in the jchem/examples/config directory has to be copied to your working directory (you can keep the same name, or rename it as you wish). Then edit the copy (not the original one, it's better to keep it untouched). This is an XML file, so you can either use a text editor, or an xml editor as you prefer. You find a line





<ParametrizedMetric Name="Tanimoto" ActiveFamily="Generic" Metric="Tanimoto" Threshold="0.2"/>





that has to be modified. Namely, the Threshold value 0.2 has to be changed to 1. So it should look like:





<ParametrizedMetric Name="Tanimoto" ActiveFamily="Generic" Metric="Tanimoto" Threshold="1.0"/>





When it is done, just save the file. This configuration file will be used in screening.





Bearing all the above considerations in mind you issue the command below:





screenmd inputlibrary.sdf aceactives.sdf -k CF -c cfp.xml -M Tanimoto -H Minimum -o sdf annotatedlibrary.sdf





The output will contain all structures from the input file in the same order though with one extra field added to each compound: Minimum_CF_Tan, that is, Tanimoto dissimilarity against a minimum hypothesis using Chemical Fingerprint. Each such field contains a floating point value, which is the Tanimoto dissimilarity score.





If you have more than 20 actives, pick a random sample, there is no advantage of using too many compounds to construct a hypothesis. You can also cluster your actives using JKlustor and use centroids, or individual clusters independently to provide more scaffold specific score. Or rank your actives based on IC50 value and use those molecules that exhibit the highest activity.





Usually better results can be achieved by the use of Median hypothesis:





screenmd inputlibrary.sdf aceactives.sdf -k CF -c cfp.xml -M Tanimoto -H Median -o sdf annotatedlibrary.sdf





Only one active set can be considered in one go, though you may use various descriptors and metrics. The use of screenmd can be tedious when multiple active sets are needed - no wonder, screen is a tool for virtual screening, and not for library annotation. Before you run your library through screenmd the second time (e.g. to add D2 inhibition score) you will need to rename the custom field Minimum_CF_Tan (or Median_CD_Tan) to s.g. like ACE_SCORE. For this job standard Unix tools like sed or awk are recommended (particularly, if you SDfile is big, in which case text editors may not be able to load the file to allow you to use the Search&Replace function), e.g.





sed 's/Minimum_CF_Tan/ACE_SCORE/' < annotatedlibrary.sdf > ACEannotatedlibrary.sdf





When this is done, you can add the D2 scores:





screenmd ACEannotated.sdf D2actives.sdf -k CF -c cfp.xml -M Tanimoto -H Median -o sdf annotatedlibrary.sdf





and then





sed 's/Median_CF_Tan/D2_SCORE/' < annotatedlibrary.sdf > ACE+D2annotatedlibrary.sdf





and so on.





Perhaps it is a good idea to write an application that uses the screen API to annotate a library for multiple actives. You can develop such tool, though we may provide it in one of the next major releases of JChem.








This much for know, I leave room for experienced users of Screen to give you some bright ideas.





Regards,


Miklos