Tag SMARTS query hits with their labels/names

User 55ffa2f197

10-08-2010 13:18:11

For a given input structure file I like to tag it with another list of predefined SMARTS queries. The SMARTS queries are stored in a file, and it has following format:


SMARTS1, name1


SMARTS2,name2


If a given smarts matches a target mol tag the mol with the smarts name, if it matches multiple smarts concatenate the smarts names to create the tag; if a mol does not match any smarts then send the mol out without tagging.


I am looking for a simple command line solution for this, and have checked the jcsearch and evaluate, but did not quite know how to use these commands to accomplish the task. A simple JAVA code would work as well.


Thanks in advance

ChemAxon 9c0afc9aaf

10-08-2010 23:15:08

Hi,


Do you happen to be using Pipeline Pilot ?


If yes, there might be a solution W/O coding by using our components for PP.


(probably KNIME would also work in a similar way as well)


If not using either of these then some custom Java code may be needed.


 


Best regards,


Szilard

User 55ffa2f197

11-08-2010 02:04:04

no, i do not use pp, why do i need that. is it possible to do this using evaluate?

ChemAxon 9c0afc9aaf

11-08-2010 03:41:12










dlee wrote:

no, i do not use pp, why do i need that. is it possible to do this using evaluate?



I think if you are not familiar with the product already it probably does not justify the effort to set up this infrastructure just to accomplish this one task - writing a simple Java program can be quicker.


We have prepared a simple Java program that does the job (attached)


Best regards,


Szilard





User 55ffa2f197

11-08-2010 12:45:48

Thanks, that is all need, simple and efficient

User a18e201107

23-08-2010 18:33:33

Sziliard:


Sorry to take up on a old post, but I noticed you mentioned using pipeline pilot in order to search a set structures using a list of SMARTS, could you provide any insight on this?  


Much Thanks



Dennis

User 55ffa2f197

24-08-2010 17:26:21

In PLP there is a component called 'Substructure Map from Tag'. What it does is to read in a stream of molecules which can be a mix of the molecules to be searched and the query molecules which can be smarts. You need to convert the smarts into molecules using convert component, and lable them as 'IsQuery' so the map component knows which are the target or query mols. The map component has whole bunch of the parameters to control its behavior  such as if you wish to tag all the matchings you can specify keep going do not stop after the first matching, and the name of the matching substructures (smarts) are append into an array variable called QueriesMapped.

User a18e201107

24-08-2010 17:29:43

Thank you very much, that is very helpful


 


Dennis

User 55ffa2f197

31-08-2010 13:57:40










Szilard wrote:










dlee wrote:

no, i do not use pp, why do i need that. is it possible to do this using evaluate?



I think if you are not familiar with the product already it probably does not justify the effort to set up this infrastructure just to accomplish this one task - writing a simple Java program can be quicker.


We have prepared a simple Java program that does the job (attached)


Best regards,


Szilard







 Hi Szilar,


I downloaded your code to my lap top and try to compile it, but failed. I am pretty sure it is related to classpath etc. I am using following line to compile your code:


./javac -classpath "/cygdrive/c/Program Files/ChemAxon/JChem/lib/;../lib/;../jre/lib" ./TagByQuery.java


I am using cygwin, my current working dir is :


/cygdrive/c/Program Files/Java/jdk1.6.0_21/bin


The main error is:


.\TagByQuery.java:1: package chemaxon.formats does not exist
import chemaxon.formats.MolImporter;
                       ^


Thanks in advance


Dong


 


 

ChemAxon aa7c50abf8

07-09-2010 14:25:29

Hi Dong,


Yes, the problem is in the classpath.



  1. Your classpath exhibits a more general cygwin usage problem: programs which don't come with cygwin won't understand

    1. either the unix style paths in general (slashes and colons -- I realize now that you're mixing slashes [unix directory separators] with semi-colons [windows path separators])

    2. or cygwin's represtation of absolute paths in particular. 

    "Native" Windows programs expect paths in native Windows format: C:\Program Files\ChemAxon . (Don't forget to use single quotes then...). Alternatively you can use the cygpath program to convert unix style paths to Windows-style. (If you so little experienced with cygwin, you'd probably be much better off using plain old Command Prompt...)

  2. There is also a Java-specific problem with your classpath. If a Java classpath element points to a directory, that directory must be the parent of a class file hierarchy. In contrast to class files, JAR files must be distinct elements in Java classpaths -- JAR files will not be considered, if only their parent directory appears in the classpath.


Peter