filtering an sdf file using Smarts

ChemAxon 43e6884a7a

15-09-2009 13:58:34

A user's question:


Is there a description somewhere
on how to use Instant JChem to filter an sdf file using a Smarts script? I
would like the option to either remove flagged compounds, retrieve only flagged
compounds or just to mark the flagged compounds in an output sdf file.


User ed8790c2d3

15-09-2009 14:15:10

Hi!


I don't know how to do this in Instant JChem, but if you are interested in getting the resulting sdfile, I would go with OpenBabel and obgrep (free tools available from openbabel.org). The command


obgrep SMARTS infile.sdf > outfile.sdf 

will keep molecules matching SMARTS. Adding a -v will give the oposite, i.e.


obgrep -v SMARTS infile.sdf > outfile.sdf

Regards,


Fredrik

ChemAxon fa971619eb

15-09-2009 14:27:53

Depending on what is wanted I think there are a couple of approaches that can be used.


1. Run a normal structure search using the smarts expression (just paste the smarts into the Marvin Sketch edtior when specifying the query). A field can be added to the table to record the fact that the structure is a hit. For instance create a text field that can be used to store "Y" or "N" to distinguish whether the structure matches or not. Then when the search has been run and only the hits are showing select the whole column and paste the value "Y" into the column to set all the values. When all records are shown only those that were hits will have the value "Y" and this field can be exported to SDF along with the structures.


 


2. Use a chemical terms function to add a field that contains whether the structure is a hit. You would use a function like this:


matchCount'[!#1!#6]C1=CC=CC=C1')


This will detemine the number of times the smarts is found in the structutre (an integer field should be used). A value greater than zero indicates  a match. Again thiss field can be exported to SDF.


The main benefit of this approach is that it would always be accurate (e.g. if you added new strucutres or edited exisitng ones) whereas the first approach would need manual updating.


 


There are probably lots of other variations on these themes.


Tim


 

ChemAxon d76e6e95eb

15-09-2009 15:14:20

If you are familiar with command line, perhaps, the easiest tool structure searching and filtering in files is jcsearch tool of ChemAxon. The -q switch is followed by the query structure in SMARTS or the query file name. The example below prints out the molecules of an SDF dataset containing chlorobenzene substructure to a smiles file.


jcsearch –q "c1ccccc1Cl" in.sdf -o out.smiles 

The second example shows you how to combine a substructure searching with a Chemical Terms filter. The filter follows the -e switch. Find small molecules containing carboxyl group and having not too high logP value.


jcsearch -q "[H][O:1]C=[O:2]" -e "(mass() <= 500) and (logP() <= 5)" in.smiles 


These are just simple examples, jcsearch supports duplicate, superstructure, similarity and reaction searching as well. It works with plenty of file formats and even can search directly in JChem datases. You can combine multiple queries.


If you prefer a desktop user interface, I would suggest to use Instant JChem.

ChemAxon fa971619eb

29-09-2009 12:55:05

I created a movie to illustrate querying in IJC using smarts.


http://www.chemaxon.com/shared/tim/ijc_support/movies/smarts.mov


Tim