jcsearch query molfile

User e34a92cce5

10-07-2006 18:12:30

Hi,


I have been trying using jcsearch to run substructure queries to see if a few identified structures are present in my database. I use:
Quote:
jcsearch -q identifiers.smarts DB:Compounds -f smiles
However, it looks like jcsearch is only using the 1st structure in my identifier.smarts to run the query and neglecting the rest of the structures in the file. Also, I have abt 150K compounds in the database table; however in the final report, it says that it has only screened abt 3k compounds.





COuld you let me know if I am missing the right syntax.


Thanks!

ChemAxon 9c0afc9aaf

11-07-2006 07:45:38

Hi,





You can find the description of the jcsearch utility with examples on the following page:





http://www.chemaxon.com/jchem/doc/user/Jcsearch.html width="90%" cellspacing="0" cellpadding="3" border="0" align="center"> Quote: However, it looks like jcsearch is only using the 1st structure in my identifier.smarts to run the query and neglecting the rest of the structures in the file By default jcsearch should only return hits that match for all of the query structures:





Code:



--and      If two or more queries are present, all are required to match.


           (Default)








With the following option one can get structures, that match any of the queries:





Code:
--or       If more than one queries are present, at least one is required to


           match.








Unfortunately it seems that multiple queries only work properly in file, and in DB mode only the first structure is taken into account indeed.


We will fix this soon, thank you for pointing this out.





Until then a possible workaround to achieve the "--or" functionality:


1. run the queries one-by-one and always append the result to a file:





Code:



jcsearch -q identifier1.smarts DB:Compounds -f smiles >> results.smi


jcsearch -q identifier2.smarts DB:Compounds -f smiles >> results.smi


...





Now you have all the hits, but there may be some duplicates.





2. Import the file into an empty database table with duplicate filtering, then export it to get a result file without duplicates.





Code:
jcman c temp


jcman a temp results.smi --nodup


jcman x temp final_results.smi


jcman d temp
Quote:
Also, I have abt 150K compounds in the database table; however in the final report, it says that it has only screened abt 3k compounds.
I guess you are referring to a similar output which can be obtained by the "-vv" (very verbose) flag:


Code:



Tue Jul 11 09:12:39 CEST 2006


Search mode: SUBSTRUCTURE


Structure table: nnn


Query: CCC


Screened: 621


Hits: 499


Cache loading: 313 ms


Cache size (this table / total): 0.01 / 0.01 MBytes


Total time: 750 ms  Screening: 0 ms


Current / peak / maximum searches per minute: 1 / 1 / 1






"Screening" is the first, very quick phase of the search, which helps to reduce the number of potential hits by using the fingerprints.


This way the more CPU intensive graph search has to process less structures (in your case only 3K), reducing the overall execution time.








Best regards,





Szilard