table re-standardization performance / questions

ChemAxon 60ee1f1328

06-02-2007 17:38:48

Hello,





I have kicked off a cmd line job to re-standardize an Oracle table of ~ 3 million structures. It has been running for approx 2 weeks now and is about 2/3 way through the job, however the performance is degrading slowly but surely and I'm not sure the remaining 1/3 will complete in another week (there is no network as such involved).





I'm just wondering if there is any way I can restart the job and pick up where I left off - i.e. not lose any stuff I have restandardized and then restart only at a point (cd_id) that has not been standardized - is this possible or would I lose the restandardized records as soon as I killed the job (in which case I have to live with it)? If this is not possible might I be so bold as to put this in as a request?





Thanks for your comments,





Daniel.

ChemAxon 60ee1f1328

06-02-2007 17:56:00

further to this, presuming that a re-standardization is complete could you advise on the most efficient route to re-standardization of a subset of a given table i.e. how might I best standardize a JCHEM table using say a SSS, that is I only wish to re-standardize structures that contain a particular Substructure?

ChemAxon 60ee1f1328

07-02-2007 11:36:48

I think the way round this is to :





1. Cut out the first 2/3 and keep as table a.


2. Cut out the last 1/3 and keep as table b and then regenerate table b.


3. Combine table a and b with a create table as select a union select b.





That should do the trick I think.





Cheers,


Daniel.

ChemAxon aa7c50abf8

07-02-2007 12:57:22

Hi Daniel,
Quote:
I have kicked off a cmd line job to re-standardize an Oracle table...
Do you mean: you want to regenerate a table with the command line JChemManager? Like:





Code:
jcman r ...  --stconfig ...






The time required for the standardization appears to be unusually long. Is this long duration consistent with your expectations based on the standardization configuration or the nature of the structures in the table?





Would it be possible to share with us the new standardization configuration?





What kind of structures are in the table? Are they reactions? (Some reactions may take a very long time to regenerate. We are working on ways to enable users to put a limit on processing reactions during import/regeneration.)





Thanks


Peter

ChemAxon 60ee1f1328

07-02-2007 15:40:37

Hi Peter,





Yes it is a command line standardization of an entire table of 3m molecules in a single table. I would have used the jcman GUI however it was not clear to me exactly which standardizer.xml file I was actually using (I don't think the option to specify a file exists?).





It does seem that the job has degraded significantly over the last two weeks - it started off 3 times faster than what is is currently running at...if that continues it may not actually complete? Anyway I think it is possible to extract both the standardized and non-standardized parts and continue by a further standardization and then concatenate both results. In future I would not try and standardize a table bigger than 1M. Is it the JRE / memory that is degrading?, I don't think that TOMCAT is involved as it is not being completed in the cartridge.





The table does not contain reactions, rather just molecules from commercial SDF files - we would be OK with it if the performance did not degrade as such - workaround is possible.





I take it that if I stop the job anything that has been completed will be lost and that I cannot re-start from a particular point - this would be useful.





The standardization transformations are not too obscure just the normal stuff like Nitro groups...etc... - if you would like me to post the xml - no problem.





I don't think that I need anything further - just useful to know you recognise this as a problem.





Cheers,


Daniel.

ChemAxon 9c0afc9aaf

07-02-2007 15:53:18

Hi,
Quote:



I would have used the jcman GUI however it was not clear to me exactly which standardizer.xml file I was actually using (I don't think the option to specify a file exists?).
Actually a file open dialog pops up when you check the checkbox.


We know it's not very intiutive, maybe we will change this later.





I see no particular reason why the regeneration should get slower - it's speed should always be comparable to the import of the same amount of structures, and should not degrade.





Maybe you have much more complex structures at the part currently under processing ?





Unfortunately you cannot resume regeneration.


(Until now it wasn't a problem, even for users with bigger databases.)





Best regards,





Szilard

ChemAxon 60ee1f1328

07-02-2007 17:46:41

Well I hope my workaround proposed below will work - it should I think.


Sometimes I have problems manipulating jchem tables due to the blob column - however create table as select seems to be ok. Also I am aware of the need to make sure any "new" jchem tables are referenced in jchemproperties - I'll be doing it soon so lets see!