User e34a92cce5
10-07-2006 18:12:30
Hi,
I have been trying using jcsearch to run substructure queries to see if a few identified structures are present in my database. I use:
Quote: |
jcsearch -q identifiers.smarts DB:Compounds -f smiles |
However, it looks like jcsearch is only using the 1st structure in my identifier.smarts to run the query and neglecting the rest of the structures in the file. Also, I have abt 150K compounds in the database table; however in the final report, it says that it has only screened abt 3k compounds.
COuld you let me know if I am missing the right syntax.
Thanks!
ChemAxon 9c0afc9aaf
11-07-2006 07:45:38
Hi,
You can find the description of the jcsearch utility with examples on the following page:
http://www.chemaxon.com/jchem/doc/user/Jcsearch.html width="90%" cellspacing="0" cellpadding="3" border="0" align="center"> Quote: |
However, it looks like jcsearch is only using the 1st structure in my identifier.smarts to run the query and neglecting the rest of the structures in the file |
By default jcsearch should only return hits that match for all of the query structures:
Code: |
--and If two or more queries are present, all are required to match.
(Default)
|
With the following option one can get structures, that match any of the queries:
Code: |
--or If more than one queries are present, at least one is required to
match.
|
Unfortunately it seems that multiple queries only work properly in file, and in DB mode only the first structure is taken into account indeed.
We will fix this soon, thank you for pointing this out.
Until then a possible workaround to achieve the "--or" functionality:
1. run the queries one-by-one and always append the result to a file:
Code: |
jcsearch -q identifier1.smarts DB:Compounds -f smiles >> results.smi
jcsearch -q identifier2.smarts DB:Compounds -f smiles >> results.smi
...
|
Now you have all the hits, but there may be some duplicates.
2. Import the file into an empty database table with duplicate filtering, then export it to get a result file without duplicates.
Code: |
jcman c temp
jcman a temp results.smi --nodup
jcman x temp final_results.smi
jcman d temp |
Quote: |
Also, I have abt 150K compounds in the database table; however in the final report, it says that it has only screened abt 3k compounds. |
I guess you are referring to a similar output which can be obtained by the "-vv" (very verbose) flag:
Code: |
Tue Jul 11 09:12:39 CEST 2006
Search mode: SUBSTRUCTURE
Structure table: nnn
Query: CCC
Screened: 621
Hits: 499
Cache loading: 313 ms
Cache size (this table / total): 0.01 / 0.01 MBytes
Total time: 750 ms Screening: 0 ms
Current / peak / maximum searches per minute: 1 / 1 / 1 |
"Screening" is the first, very quick phase of the search, which helps to reduce the number of potential hits by using the fingerprints.
This way the more CPU intensive graph search has to process less structures (in your case only 3K), reducing the overall execution time.
Best regards,
Szilard