Question substructure search

ChemAxon e274e1bada

10-11-2008 12:52:23

A user wrote:





Hi





I am reviewing your JChem package as a possible chemical substructure search library, to search a library of molecule using a given source molecule using smiles. Looking at your library I came across two distant ways of doing this using a database, or a flat file. Can you highlight the differences between the two approaches, and the differences in performance and accuracy?





Thanks

ChemAxon e274e1bada

10-11-2008 12:53:32

Hi,





Based on the JChem search philosophy, two different ways are available:


1. in case of atom by atom search you can search in various chemical structure file formats and memory. This API can provide a graph search engine, lots of search options and query features are available here.


2. in case of database search we provide a main chemical database intelligence and search engine, and the atom by atom search is also used for finding the hits.


The accuracy is same, and database technology is more efficient if you have more then about 100K molecules.


See this documents for more information:


JChem database concepts: http://www.chemaxon.com/jchem/doc/guide/dbconcepts/index.html


Guide for structure searching in database and in flat files: http://www.chemaxon.com/jchem/doc/guide/search/index.html





You can try all of the search modes with jcsearch application, which is part of the JChem package.


Documentation of jcsearch application: http://www.chemaxon.com/jchem/doc/user/Jcsearch.html





Regards, Edvard

User 6686efeff7

11-11-2008 14:22:58

ebuki wrote:
Hi,





Based on the JChem search philosophy, two different ways are available:


1. in case of atom by atom search you can search in various chemical structure file formats and memory. This API can provide a graph search engine, lots of search options and query features are available here.


2. in case of database search we provide a main chemical database intelligence and search engine, and the atom by atom search is also used for finding the hits.


The accuracy is same, and database technology is more efficient if you have more then about 100K molecules.


See this documents for more information:


JChem database concepts: http://www.chemaxon.com/jchem/doc/guide/dbconcepts/index.html


Guide for structure searching in database and in flat files: http://www.chemaxon.com/jchem/doc/guide/search/index.html





You can try all of the search modes with jcsearch application, which is part of the JChem package.


Documentation of jcsearch application: http://www.chemaxon.com/jchem/doc/user/Jcsearch.html





Regards, Edvard
Hi


Do you have any performance statistics for file based search? does your file based search use an indexed file format for fast substructure matching similar to Babel's .fs format


Thanks

ChemAxon a3d59b832c

12-11-2008 08:02:30

Hi,





No, we do not have any benchmark data, but here is a small example using jcsearch (Substructure hit count):
Quote:
$ wc NCI_aug00.smiles


236617 236617 11050988 NCI_aug00.smiles





$ time jcsearch.bat -t:c -q "OCc1cccc(Cl)c1" NCI_aug00.smiles


791





real 1m17.318s


user 0m0.015s


sys 0m0.000s


(2.2 GHz Intel Core2 processor)





We do not use an index file for searching in files, but it is very easy to create local databases with JChem Manager or Instant JChem.


Searching in the database not only will use the efficient fingerprint pre-screening, but will also exploit multiple processors.





I hope it helps.


Szabolcs.

User 6686efeff7

28-01-2009 10:14:02

Quote:
Hi,





No, we do not have any benchmark data, but here is a small example using jcsearch (Substructure hit count):
Quote:
$ wc NCI_aug00.smiles


236617 236617 11050988 NCI_aug00.smiles





$ time jcsearch.bat -t:c -q "OCc1cccc(Cl)c1" NCI_aug00.smiles


791





real 1m17.318s


user 0m0.015s


sys 0m0.000s


(2.2 GHz Intel Core2 processor)





We do not use an index file for searching in files, but it is very easy to create local databases with JChem Manager or Instant JChem.


Searching in the database not only will use the efficient fingerprint pre-screening, but will also exploit multiple processors.





I hope it helps.


Szabolcs.
Does this involve the use of a RDBMS?





Can this be done programatically?





Is file based search licensed separately?














Thanks

ChemAxon a3d59b832c

28-01-2009 10:30:20

Quote:
Does this involve the use of a RDBMS?


Yes, by a local database I meant a RDBMS.


Quote:
Can this be done programatica
Yes, see the JChem Developers Guide:





http://www.chemaxon.com/jchem/doc/guide/


Quote:
Is file based search licensed separately?


Currently not. It is included in the JChem Base or JChem Cartridge license.





However, from version 5.2 we will introduce a new license for searching in memory


and files only, without the use of a database.





Best regards,


Szabolcs