A user's question:
Is there a description somewhere
on how to use Instant JChem to filter an sdf file using a Smarts script? I
would like the option to either remove flagged compounds, retrieve only flagged
compounds or just to mark the flagged compounds in an output sdf file.
I don't know how to do this in Instant JChem, but if you are interested in getting the resulting sdfile, I would go with OpenBabel and obgrep (free tools available from openbabel.org). The command
obgrep SMARTS infile.sdf > outfile.sdf
will keep molecules matching SMARTS. Adding a -v will give the oposite, i.e.
obgrep -v SMARTS infile.sdf > outfile.sdf
Depending on what is wanted I think there are a couple of approaches that can be used.
1. Run a normal structure search using the smarts expression (just paste the smarts into the Marvin Sketch edtior when specifying the query). A field can be added to the table to record the fact that the structure is a hit. For instance create a text field that can be used to store "Y" or "N" to distinguish whether the structure matches or not. Then when the search has been run and only the hits are showing select the whole column and paste the value "Y" into the column to set all the values. When all records are shown only those that were hits will have the value "Y" and this field can be exported to SDF along with the structures.
2. Use a chemical terms function to add a field that contains whether the structure is a hit. You would use a function like this:
This will detemine the number of times the smarts is found in the structutre (an integer field should be used). A value greater than zero indicates a match. Again thiss field can be exported to SDF.
The main benefit of this approach is that it would always be accurate (e.g. if you added new strucutres or edited exisitng ones) whereas the first approach would need manual updating.
There are probably lots of other variations on these themes.
If you are familiar with command line, perhaps, the easiest tool structure searching and filtering in files is jcsearch tool of ChemAxon. The -q switch is followed by the query structure in SMARTS or the query file name. The example below prints out the molecules of an SDF dataset containing chlorobenzene substructure to a smiles file.
jcsearch –q "c1ccccc1Cl" in.sdf -o out.smiles
The second example shows you how to combine a substructure searching with a Chemical Terms filter. The filter follows the -e switch. Find small molecules containing carboxyl group and having not too high logP value.
jcsearch -q "[H][O:1]C=[O:2]" -e "(mass() <= 500) and (logP() <= 5)" in.smiles
These are just simple examples, jcsearch supports duplicate, superstructure, similarity and reaction searching as well. It works with plenty of file formats and even can search directly in JChem datases. You can combine multiple queries.
If you prefer a desktop user interface, I would suggest to use Instant JChem.