ChemAxon e274e1bada
10-11-2008 12:52:23
A user wrote:
Hi
I am reviewing your JChem package as a possible chemical substructure search library, to search a library of molecule using a given source molecule using smiles. Looking at your library I came across two distant ways of doing this using a database, or a flat file. Can you highlight the differences between the two approaches, and the differences in performance and accuracy?
Thanks
ChemAxon e274e1bada
10-11-2008 12:53:32
Hi,
Based on the JChem search philosophy, two different ways are available:
1. in case of atom by atom search you can search in various chemical structure file formats and memory. This API can provide a graph search engine, lots of search options and query features are available here.
2. in case of database search we provide a main chemical database intelligence and search engine, and the atom by atom search is also used for finding the hits.
The accuracy is same, and database technology is more efficient if you have more then about 100K molecules.
See this documents for more information:
JChem database concepts:
http://www.chemaxon.com/jchem/doc/guide/dbconcepts/index.html
Guide for structure searching in database and in flat files:
http://www.chemaxon.com/jchem/doc/guide/search/index.html
You can try all of the search modes with jcsearch application, which is part of the JChem package.
Documentation of jcsearch application:
http://www.chemaxon.com/jchem/doc/user/Jcsearch.html
Regards, Edvard
ChemAxon a3d59b832c
12-11-2008 08:02:30
Hi,
No, we do not have any benchmark data, but here is a small example using
jcsearch (Substructure hit count):
Quote: |
$ wc NCI_aug00.smiles
236617 236617 11050988 NCI_aug00.smiles
$ time jcsearch.bat -t:c -q "OCc1cccc(Cl)c1" NCI_aug00.smiles
791
real 1m17.318s
user 0m0.015s
sys 0m0.000s
|
(2.2 GHz Intel Core2 processor)
We do not use an index file for searching in files, but it is very easy to create local databases with JChem Manager or Instant JChem.
Searching in the database not only will use the efficient fingerprint pre-screening, but will also exploit multiple processors.
I hope it helps.
Szabolcs.