Problems with Chemistry Index

User 0908c5ccdd

26-07-2007 09:13:16

Hi





I'm still not entirely sure about the Index on my Chemistry table. Can the index only be on one column in the table? If I am routinely searching using CD_SMILES strings should I index that column? Or should I index on the CD_STRUCTURE column?





I have an error message currently of:





Code:
SQL> create index jc_index on compounds(cd_smiles) indextype is jc_idxtype;


create index jc_index on compounds(cd_smiles) indextype is jc_idxtype


*


ERROR at line 1:


ORA-29855: error occurred in the execution of ODCIINDEXCREATE routine


ORA-29532: Java call terminated by uncaught Java exception:


java.lang.RuntimeException: The index on table 'JCHEM.COMPOUNDS' contains


obsolete data. (Current index version: 34, index version for 'JCHEM.COMPOUNDS':


-1.) Please recreate the index with the 'regenerateTable' option.


ORA-06512: at "JCHEM.JCHEM_CORE_PKG", line 47


ORA-06512: at "JCHEM.JC_IDXTYPE_IM", line 17






This may be due to the fact that I had previously created an index on the table (CD_SMILES column) but then renamed the table from JCHEMISTRY to COMPOUNDS.





Can anybody help please? I am now unable to index my COMPOUNDS table.

ChemAxon aa7c50abf8

26-07-2007 10:09:24

Quote:
If I am routinely searching using CD_SMILES strings should I index that column?
Yes.
Quote:



This may be due to the fact that I had previously created an index on the table (CD_SMILES column) but then renamed the table from JCHEMISTRY to COMPOUNDS.





Can anybody help please? I am now unable to index my COMPOUNDS table.
Did you do it with ALTER TABLE jchemistry RENAME TO compounds?





We don't support renaming a table. Supporting it would involve updating references to the structure table in the JChem meta data records associated with the table to use the new name. There should be a JChem API call designed to do this, which doesn't (yet) exist.





You can try to rename the table back to JCHEMISTRY, create --using a JChem tool-- a new COMPOUNDS table with the same properties as JCHEMISTRY, do an INSERT INTO compounds (SELECT * FROM jchemistry) and drop JCHEMISTRY, if you no longer need it. DON'T DO a CREATE TABLE compounds AS SELECT..., because then again COMPOUNDS would not be a proper JChem-table (which you would like it to be, I presume).

User 0908c5ccdd

26-07-2007 10:38:41

Many thanks for the reply. As the table was empty I took your advice and just created a new one named COMPOUNDS. The index now works fine.





I am still unclear about the indexes however. Am I correct that only one structure index is permitted on this table? If I have applications searching both CD_SMILES and CD_STRUCTURE then which should I index?





At present the PL/SQL I am using to search uses the SMILES string. However in my web application I will be using Marvin and this could search the CD_STRUCTURE column, right?

ChemAxon aa7c50abf8

26-07-2007 11:57:45

An index on the CD_STRUCTURE column doesn't semantically differ from an index on the CD_SMILES column. Conceptually, however, the index is always on the CD_STRUCTURE column -- where your original compounds are stored. The CD_SMILES column should be regarded as internal to the JChem Base/Cartridge implementation -- meaning that it should not even have been exposed to users at all. If you look at the way regular structure tables are used with JChem Cartridge, you see no CD_SMILES column: it is buried (hopefully deep) in the index data, which is opaque to the user.





The ability to index the CD_SMILES columns of JChem structure tables is to some extent incidental and has "historical reasons". (To some extent, the ability to use JChem structure tables with JChem Cartridge at all is also incidental, for that matter.) The "historical reasons" can be summarized as follows.





Since many users wanted to store Molfiles and SDFiles, a type for the CD_STRUCTURE column needed to be found long enough to hold largish Mol- or SDFiles. The VARCHAR2 type was not appropriate, due to its length being limited to 4000 characters. Because in the early days of JChem Cartridge, the Oracle LOB types were not mature enough, JChem Base started to use the LONG RAW type for CD_STRUCTURE. It took some time before JChem Base (on top of which the JChem Cartridge implementation is also built) came to support LOB types. On the other hand, the Oracle Data Cartridge framework itself doesn't support LONG parameters for operators. At the very start, the compromise was to have users pass the CD_SMILES column as column parameter to the JChem Cartridge operators. (Interestingly, from the perspective of the pure "mechanics" of JChem Cartridge, any VARCHAR2 type column (like CD_FORMULA) would have done the job, but what made the CD_SMILES column the prime candidate was that it holds the information which JChem Base/Cartridge typically uses internally for implementing structure search.)





This all is really only about cartridge operators' signature and column type support. Currently both CLOB, BLOB and VARCHAR2 columns are supported both "internally" by the JChem search engine and in terms of JChem Cartridge operator signatures.





So my advice is to index only the CD_STRUCTURE column. (Note that with JChemManager you also have the possibility to create your JChem table with CD_STRUCTURE being defined as CLOB instead of the default BLOB. In my view, CLOB is a more straightforward type for character-based structure formats than BLOB and is also somewhat easier to use with SQL*Plus and similar "poor-man's" database tools.)





For completeness: nothing prevents you from creating an index on both (CD_STRUCTURE and CD_SMILES) columns. But having an extra index means more disk space requirement and more time to generate the extra index information during INSERTs.