Structure import too fast

ChemAxon aa7c50abf8

04-10-2006 10:28:48

Quote:
We loaded our 2 million+ sample set into a new database. Each record contains a structure and ID only.


The load was very quick - several hours instead of the usual 2 days, but when I tried to index the database, the indexes appeared to be created in seconds rather than the expected hours for such a large data set. I wondered if I was doing anything wrong? I tried to create an index on the smiles and structure fields only. The two statements I used were:





create index cpd_smiles_idx01 on cpd(cd_smiles) INDEXTYPE IS jchem.jc_idxtype;





create index cpd_structure_idx02 on cpd(cd_structure) INDEXTYPE IS jchem.jc_idxtype;





Have I done something wrong? - would you recommend an index on the ID field? I thought that the database create step created this on creation.


A few hours for 2 million+ structures seems to be perfectly reasonable.





In case of JChem-tables, the index information is generated as the structures are loaded into the table. Consequently, there is not much happening during index creation apart from creating a few administrative meta-data entries.





((In case of regular structure tables (non-JChem tables), the indexing information is generated during indexing. Consequently, loading the structures into a regular structure table without any jc_idxtype index will, of course, be much faster than loading them into a JChem-table. Once the jc_idxtype index is created on a regular table, inserting will take more time than without the index, as with any other type of indexes. Indexing 12 x NCI (more than 3 million structures) in a regular table takes 5,801 seconds (~ 96 minutes) on a 3GHz dual Xeon with 2GB RAM.) )