structure search across multiple databases

User e34a92cce5

25-05-2005 16:39:42

Hello,


Does the recent version of JChem include the feature for searching across multiple databases? If not, are there plans to implement it? In our systems, we have about 2 million virtual compounds spread across multiple databases that we don't plan to merge. But we would like to run a similarity search always across all these databases and get the similarity value for each hit on every database. Is there an easy way to achieve this given that JChem has a limitation with selecting only 1 structure table for its search.


Thanks!


Renju

ChemAxon 9c0afc9aaf

26-05-2005 04:18:04

Hi Renju,





JChemSearch can only search in one structure table, otherwise the result would be ambiguous:


The results consists of cd_id values, so you would not know what table (and therefore structure) they are referring to.





I suggest you to search the tables sequentially with the same JChemSearch object, and store the hits separately for each table.





If you want maximum hit count and/or maximum search time to work, you should always subtract the previous hit count / search time from the limit, so the next search will be allowed to utilize only the remaining hit count / search time (or stop if the maximum has already been reached).





Best regards,





Szilard

User e34a92cce5

01-06-2005 21:00:06

Thanks for the reply, Szilard. The better alternative then seems to merge all the virtual databases. I was wondering if JChemSearch can handle class hierarchies and inheritance in databases. That's because each library in our virtual collection have data for the cd_fp columns, but then there are attributes specific to each collection as well. e.g CAS number in NCI, is specific only to NCI and so will be handled in a seperate 'sub-class' table for NCI. I believe that JChem won;t be able to handle info across super and sub class tables, would it?


Thanks!


Renju

ChemAxon 9c0afc9aaf

02-06-2005 10:02:31

Hi Renju,





JChemSearch does not address the issue of relations between database tables as it always searches 1 specified structure table.


You can specify a filter query for JChemSearch, so you can narrow down the number of searched molecule in the structure table:





http://www.jchem.com/doc/api/chemaxon/jchem/db/JChemSearch.html#setFilterQuery(java.lang.String)





To use a structure you have described, you should take care of inserting and retrieving of the additional data located in the "sub-class" tables.





A simple suggestion: why don't you add all columns to a single structure table, and set the value to NULL when it's not applicable ?





Best regards,





Szilard

User e34a92cce5

02-06-2005 15:43:35

Hi Szilard,


In adding columns and assigning them NULL for libraries that don't have any values for those respective columns, it would result in a table that will have a few columns appended for every library that gets added to it. For a table that holds about a few million entries with 20-30 libraries, I dont believe that it is a wise thing to do.


Also, what is your opinion on use Oracle's table partitioning, which essentially breaks a single large table into smaller tables keyed on some column. Will I be able to use this partitioning feature to seperate my libraries and ask JChem to search only a specific section based on a key specified in say, a libraryid column in the structure table?





Thanks!


Renju

ChemAxon 9c0afc9aaf

03-06-2005 12:10:47

Hi Renju,








With table partitioning you would also have all the columns in one table.


(personally I don't think that it would be a problem)


The only difference is that parts of the table can be stored with different physical parameters, and SQL queries using the partition index can perform quicker.


Otherwise the table looks the same to all applications.








You can use the filter query in the very same way:


Code:



JChemSearch jcs= ...


jcs.setFilterQuery("SELECT cd_id FROM structures WHERE my_partition_key ='NCI' ");









You must also be aware that you cannot partition a table that contains a LONG or LONG RAW column. You must change the type of the cd_structure column at table creation to BLOB or CLOB (these are already supported by JChem, and BLOB will be the default type from version 3.1).





Best regards,





Szilard

User e34a92cce5

03-06-2005 13:26:02

Thanks, Szilard.

User e34a92cce5

10-08-2005 18:56:03

Hi Szilard,


I have noticed that in doing a filterQuery across a selected partition, the search time increases drastically. I believe, it is more of an issue with using the setfilterQuery() class than the partition. Is there any way I can use the searcher to look into a partition rather than use the filterQuery method to restrict the search?


Thanks!


Renju

ChemAxon 9c0afc9aaf

11-08-2005 17:20:02

Hi Renju,





Please make sure that you are referring to an indexed column in the filterQuery.





It's worth to test the execution of the query from a simple SQL console.





Let me know what you find.





Best regards,





Szilard