ChemAxon 60ee1f1328
06-02-2007 17:38:48
Hello,
I have kicked off a cmd line job to re-standardize an Oracle table of ~ 3 million structures. It has been running for approx 2 weeks now and is about 2/3 way through the job, however the performance is degrading slowly but surely and I'm not sure the remaining 1/3 will complete in another week (there is no network as such involved).
I'm just wondering if there is any way I can restart the job and pick up where I left off - i.e. not lose any stuff I have restandardized and then restart only at a point (cd_id) that has not been standardized - is this possible or would I lose the restandardized records as soon as I killed the job (in which case I have to live with it)? If this is not possible might I be so bold as to put this in as a request?
Thanks for your comments,
Daniel.
ChemAxon 60ee1f1328
06-02-2007 17:56:00
further to this, presuming that a re-standardization is complete could you advise on the most efficient route to re-standardization of a subset of a given table i.e. how might I best standardize a JCHEM table using say a SSS, that is I only wish to re-standardize structures that contain a particular Substructure?
ChemAxon 60ee1f1328
07-02-2007 11:36:48
I think the way round this is to :
1. Cut out the first 2/3 and keep as table a.
2. Cut out the last 1/3 and keep as table b and then regenerate table b.
3. Combine table a and b with a create table as select a union select b.
That should do the trick I think.
Cheers,
Daniel.
ChemAxon 60ee1f1328
07-02-2007 15:40:37
Hi Peter,
Yes it is a command line standardization of an entire table of 3m molecules in a single table. I would have used the jcman GUI however it was not clear to me exactly which standardizer.xml file I was actually using (I don't think the option to specify a file exists?).
It does seem that the job has degraded significantly over the last two weeks - it started off 3 times faster than what is is currently running at...if that continues it may not actually complete? Anyway I think it is possible to extract both the standardized and non-standardized parts and continue by a further standardization and then concatenate both results. In future I would not try and standardize a table bigger than 1M. Is it the JRE / memory that is degrading?, I don't think that TOMCAT is involved as it is not being completed in the cartridge.
The table does not contain reactions, rather just molecules from commercial SDF files - we would be OK with it if the performance did not degrade as such - workaround is possible.
I take it that if I stop the job anything that has been completed will be lost and that I cannot re-start from a particular point - this would be useful.
The standardization transformations are not too obscure just the normal stuff like Nitro groups...etc... - if you would like me to post the xml - no problem.
I don't think that I need anything further - just useful to know you recognise this as a problem.
Cheers,
Daniel.
ChemAxon 60ee1f1328
07-02-2007 17:46:41
Well I hope my workaround proposed below will work - it should I think.
Sometimes I have problems manipulating jchem tables due to the blob column - however create table as select seems to be ok. Also I am aware of the need to make sure any "new" jchem tables are referenced in jchemproperties - I'll be doing it soon so lets see!