I've been importing data associated with around half a million compounds and calculating property values. Using my laptop (Corei5-430, 3MB DDR3 RAM), the import takes up to 2 hours and calculations can be similar.
The monitor tells me that the processor is at full strech but the RAM is never more than 60% utilised. The processor is 2.2GHz which boosts to 2.5GHz.
Am I getting close to the limitations of my laptop ? Is the import / calculation using a single core or multiple cores and any suggestions for improving performance that don't rely on an external server ?
We have tried to ensure that multiple CPUs and/or multiple cores are used to the maximum extent, and as the CPU usage is at 100% then this suggests that things are probably going as fast as they can. Memory utilisation during import or property calculations is likely to be pretty low.
From what you say, if I need thngs to work faster, I should look at a faster machine. Are there other straightforward options ?
Yes, it would seem that the speed of the machine will be the primary factor.
It is possible that doing the calcuations outside IJC (e.g. using command-line apps or your own program might make small improvements, but I would not expect this to be very dramatic.
Also, you could look at the calculations and exclude any that are particularly expensive and may not be so important. There is a vast difference in the speed of different calculations (e.g. that atom/bond/ring counter are very fast, while logD is pretty slow). So posibly you could speed things up a lot by excluding the slow calculations initially, and then adding then later (perhaps overnight) once you have imported all your data and cacluated the fast ones.
A couple of follow-on questions. Does the size of the processor cache narkedly affect the rate of property calculation. Secondly, are the calculation times proportionate to the size of molecule e.g. does clogP for MWt 900 take three times that for MWt 300 ? My impression is that the speed slows disproportionately for the higher MWt compounds. Thanks
I don't have much of an idea for whether the processor cache size would have much impact on calculation performance. My guess would be that it generally wouldn't, but really that's just an uninformed guess. If anyone else has some information here then please post it.
As for the effect of the molecule size, then generally yes, the calculation would be dependent on the size of the structures. It would be somewhat dependent on the exact calculation, but generally I think it would be fair to say that the bigger the structure the longer it would take to calculate.