input file for screenMD - ChemAxon Forum Archive

User 5a88369158

25-05-2010 14:57:38

Hi,

Can I use a text file with SMILES strings as the input to screenMD or does I have to use generateMD to first create a fingerprint and then use this fingerprint as a input to screenMD?

Thank you

Yash

ChemAxon efa1591b5a

27-05-2010 09:45:31

Hi Yash,

Yes, you can use SMILES text file input directly, there's no need to generate fingerprints in advance.

Regards

Miklos

User 5a88369158

27-05-2010 13:28:00

Hi,

I tried the following and it did nott work:

screenMD targets.txt queries.txt -g -o output.txt

This does not work. targets.txt is a text file with smiles strings and queries.txt is also a text file with smiles strings. I get the following error message:

0
java.lang.ArrayIndexOutOfBoundsException: 0
        at chemaxon.descriptors.MDSimilarity.compare(MDSimilarity.java:443)
        at chemaxon.descriptors.MDSimilarity.compare(MDSimilarity.java:503)
        at chemaxon.descriptors.ScreenMD.compare(ScreenMD.java:333)
        at chemaxon.descriptors.ScreenMD.main(ScreenMD.java:222)
C:\Program Files\ChemAxon\JChem\bin>

If I change the command to the following:

screenMD targets.txt queries.txt -g -k CF -o output.txt

This works because I am generating chemical fingerprints first and then compares the chemical fingerprints to the smiles strings in the queries file. Is there a way to just compare the structures of the compounds in the targets file to the strutures of the compounds in the queries file? Or do I have to generate the chemical fingerprints or pharmacophore fingerprints in order to compare them?

Thank you

Yash

ChemAxon efa1591b5a

02-06-2010 09:54:37

Hi Yash,

The problem with the first command is that the descriptor type (-k option) is missing. I admit the screenmd did not produce meaningful error message / this will be corrected in the next bug fix release.

So you either -k CF or -k PF etc. The main goal of screenmd is to perform a similarity search in the target structures and thus it takes some kind of molecular descriptors to compare them (and not the bare structures) against each other to obtain similarity scores.

Can you explain how do you intend to compare the target and query structures against each other? What is your expectation, what kind of output you need as a result of such comparison?

Regards

Miklos

User 5a88369158

02-06-2010 13:55:12

Hi,

Our goal is to use screenMD to analyze a dataset of structures to see if this dataset is diverse using some statistical approach. I thought I could use the Tanimoto dissimilarity index as a measure of this. I plan to compare each compound in the dataset to itself and all other compounds and use a threshold of 0.3 as a the cutoff (below this threshold, the two compounds are similar).

I was hoping to get as an output just those two compounds who have a Tanimoto index of less than 0.3 but if not, I can manipulate the output matrix to get what I want.

I did have a hard time finding other similarity measures: I only see Tanimoto and Euclidean, though there are others listed, I cannot find what the keyword is for the screenMD statement. The reason I chose the Tanimoto index is because it has an upper bound. It is easier to compare 0.7 versus 0.1. For the Euclidean index, I got reslts that were 0.5 and others that were 900.

Thank you,

Yash

ChemAxon efa1591b5a

24-06-2010 09:02:26

Hi,

You can set the dissimilarity threshold in the parameter/configuration file of the molecular descriptor/fingerprint used for screening. For instance, if you use the chemical fingerprint (CF) then modify the default configuration cfp.xml, located in the examples/config folder in your JChem installation directory.

Find and modify this line:

<ParametrizedMetric Name="Tanimoto" ActiveFamily="Generic" Metric="Tanimoto"
 Threshold="0.2"/>

by replacing the default threshold, 0.2 to 0.3.

In order to use metrics other than the default Tanimoto and Euclidean, you may wish to use the OptimizeMetrics program. However, the purpose of that is somewhat different than your goal as it optimises the screening with respect to a set of known active structures.

Btw, you can also use the Tversky metric if you insert a line the one below in the above mentioned cfp.xml configuration file:

<ParametrizedMetric Name="Tversky" ActiveFamily="Generic"  Metric=\"Tversky\" Threshold=\"0.5\" TverskyAlpha=\"1\" TverskyBeta=\"1\"/>

where you can tweak the two factors, alpha and beta.

Does this help?

Finally, one more thought. An excellent new paper might be relevant for your work, though you might have come across it, I copy its url here: http://pubs.acs.org/doi/pdf/10.1021/ci100010v

Regards

Miklos