I am wondering that is it possible (and how) using Chemaxon tools perform screening of large compound's database stored in sdf file (or other)? Let's say that my workflow is as follows: i have a couple of milions compounds handle in sdf file. Then using cxcalc i could calculate eg. pKa, acceptorcount, donorcount and many other parameters for my database. But i want to take only compounds having right values for those descriptors. For example i define that my hits should marke all criteria:
pKa should be higher than 4
acceptorcount should be lower then 4
and so on...
So my question is how could i remove compounds did not matching my criteria?
I will be very gratefuly for any help
Yes, it can be done with Chemical Terms. Chemical Terms Evaluator is a command line application which can do this.
$ evaluate -e 'pKa("1") > 4 && acceptorCount() < 4 && donorCount() < 4' -x smiles nci100.smiles
Expression 'pKa("1") > 4 && acceptorCount() < 4 && donorCount() < 4' means:
- strongest pKa is higher than 4 (Note: it is recommended to use apKa() and bpKa() functions instead of pKa())
- and acceptorcount is lower than 4
- and donorcount is lower than 4
For more functions and details see Chemical Terms Reference Tables.
The "-x" command line option sets the extract mode.
-x, --extract <format> extract mode: write exactly those
molecules in the specified format that
satisfy the input boolean expression
The example filters those molecules from the input file which satisfy the expression. Evaluator can handle millions of input strucutres.
With Instant JChem and Chemical Terms the filtering can be done directly on databases, see these parts of IJC documentation:
- Chemical Terms Fields
- Query builder
Thank you very much for such usefull hints, this is reallly what I wanted. In fact, evaluator can save me time and it works very fast :) excellent!
By the way I have another one problem. What if I have i.e admet descriptors calculated in other software and included into sdf file. Let say that I have more than 1 M molecules in sdf. It will be very problematic and time consuming to load it to Instant IChem and use query builder and remove these compounds with unproper ADMET descriptors values. So my question is it is possible using some of your tools to screen (in batch mode, comman line, etc.) sdf file using as a query given fields name and thresholds for them?
I will be appreciate for any help.
Best regards for all
Yes, it is possible. Evaluator can refer to SDf fields, see the field() Chemical Terms function.
I attached the previously used nci100 file in SDf format; it contains "logP" SDf fields. Here is an example how you can refer to these fields:
$ evaluate -e 'pKa("1") > 4 && acceptorCount() < 4 && donorCount() < 4 && field("logP") < 4' -x smiles nci100logP.sdf
A new condition is added to the expression: && field("logP") < 4. The logP values are read from the "logP" SDf field, and only those molecules are written to the output in which the value in logP field is less than 4.
The output SMILES with logP field values:
$ evaluate -e 'pKa("1") > 4 && acceptorCount() < 4 && donorCount() < 4 && field("logP") < 4' -x smiles:TlogP nci100logP.sdf
I checked it and it works very nice.
Thanx for help.