Mofile queries with JChem Cartridge

ChemAxon aa7c50abf8

04-10-2006 10:33:13

Quote:
We are running some test searches and comparing against our existing software. We currently have the ability to search using a molfile - but for the Jchem search we are converting the mols to a smiles string and searching that way. Is it possible to search using a molfile, and providing a directory location in the search term? It is just that while my samples contain chiral flags, and converting to a smiles string strips the flags out so the search is not truly comparable. Could you provide an example search term where we can read a molfile?
You can directly use mole files as queries.





I recommend to use the JChem Cartridge operators with BLOB structure arguments in order to take account of the larger size of molfiles (larger than smiles and potentially larger than the ~2^16 characters limit of VARCHAR2 in procedural language code):





Code:
SELECT count(*) FROM cpd WHERE jc_compareb(cd_structure, :query_structure)  = 1;



where :query_structure refers to your molfile.





In a production environment, you will want to replace :query_structure with a language specific construct which best suits the Oracle connectivity tool provided by your application environment (.NET, PHP, whatever). Oracle connectivity tools for applications (like ODBC drivers for Oracle) typically allow you to dynamically load the BLOB types into SQL statements like the SELECT statement above.





For "quick-and-dirty" testing purposes, I recommend to store the query molfiles in the BLOB column of an arbitrary temporary table and use a sub-select to refer to your query:





Code:
SELECT count(*) FROM cpd WHERE jc_compareb(cd_structure, (SELECT molfile FROM tmp_query_table WHERE query_id = 'myqueryid'))  = 1;






The "trick" I normally do to store my molfile queries in a database table is simply using JChemManager's importing function. I import the query molfiles into a JChem table and the sub-select above then refers to cd_structure column of that JChem-table (the molfile will be stored unaltered in the cd_structure column and the default type of this column is BLOB):





Code:
SELECT count(*) FROM cpd WHERE jc_compareb(cd_structure, (SELECT cd_structure FROM tmp_query_table WHERE cd_id = mycdid))  = 1;






If you


(a) add an SDF tag (say, "query_id") to each of your query molfiles with a value descriptive of the particular query molefile and


(b) create the query table with an extra colunm, say, of type VARCHAR2 and called, say, "query_id" and


(c) map the SDF tag to the extra column in the JChem-table during import of the query molfile,


you can refer to any particular query mofile in the table more conveniently than by using cd_id numbers. You can then phrase your query like:





Code:
SELECT count(*) FROM cpd WHERE jc_compareb(cd_structure, (SELECT cd_structure FROM tmp_query_table WHERE query_id = 'pyrrole with [...]))  = 1;

User 48fe63575a

04-10-2006 14:58:04

Hi Peter,


Many thanks for the help on this. I followed your "trick" method and created a temporary table and used JCmanager to load the mol files. I can get hits from our target database using your search method.





A problem I am getting in probably 1 search attempt in 3, is as soon as I try to run the search query I get a disconnect from Oracle error. e.g:





Code:
SQL> SELECT count(ID) FROM mol2.cpd WHERE jc_containsb(cd_structure, (select cd_structure from temp_mol where cd_id='7')) = 1;





SELECT count(TS_CPD_ID) FROM mol2.cpd WHERE jc_containsb(cd_structure, (select cd_structure from temp_mol where cd_id='7')) = 1


*


ERROR at line 1:


ORA-03113: end-of-file on communication channel






If I re-connect and run the same search it normally works. It does not seem to matter how many results are expected.





Many thanks.

ChemAxon aa7c50abf8

09-10-2006 13:25:16

Hi Wyn,





Sorry for my late reply.





Do you have any related error message in the corresponding session trace file (typically in $ORACLE_BASE/admin/<dbname>/udump)?





Peter