User 6d24b35814
11-07-2011 13:13:11
Hi
UTILITY: molconvert (commandline/toolkit)
VERSION: 5_5_0_1
OS: Linux (64 bit) Red Hat Enterprise Linux AS release 4 (Nahant Update 9)
I'm trying to use molconvert to generate inchikeys for a large number of smiles (from a .smi file).
The processing starts quite quickly but then seems to get gradually slower. I have a few million smiles to process and with this performance it doesn't seem possible. Is this a bug?
Many thanks
Ceara
ChemAxon 0a9e2a55e1
12-07-2011 09:34:25
Dear Ceara,
I have tested the speed, when the new inchi generation was added with a few 100.000 structures, but I have found the average process time faster with more structure. I will check this with millions of structures.
Could You send a smiles where You experience this problem?
Best Regards,
Peter
User 6d24b35814
12-07-2011 10:46:07
Hi Peter
I can't send you the data so I have no idea if this is related to specific chemical structures.
Processing a few hundred smiles is no problem at all - it runs very quickly.
The file that I tried to run the process on contains 4.6 million smiles strings, the process started on Saturday and I killed it this morning, during this time it'd written 54K rows of inchikeys to the output file. The process was still running this morning, the only error being generated on some smiles strings was "Omitted undefined stereo"
I've downloaded the NCI data set from this page http://cactus.nci.nih.gov/download/nci/ (file NCI_aug00_SMI) it has 250K structures and I'm running molconvert inchikey on it now (it is generating quite a lot of errors). As I write this it has output ~7K inchikeys but I think it has noticeably slowed down already, so might be a good set to test?
Kind regards
Ceara
User 6d24b35814
14-07-2011 10:19:35
Hi Peter
It's ~48hrs since I set the molconvert process off on the NCI test data set, for me it's written ~55K rows of output. The process has been picked up by our IT monitoring and I've been asked to kill it.
I assume you're running the same test? are you seeing the same performance?
Kind regards
Ceara
ChemAxon 0a9e2a55e1
14-07-2011 13:57:06
Dear Ceara,
I have tested it again, and I think I have reproduced this bug. I have tried this conversion with parts of the 250k smiles You have mentioned. Using molconvert to inchi and inchi:key the average speed was faster with more molecule, but when i called it with inchikey it become much slower after about 10.000-20.000 structures.
Is the conversion to inchi or inchi:key also slow on Your machine?
(If this is the case You have found too that seems really strange because inchi:key concats the inchi, and inchikey outputs.)
It seems to be a bug in our code not in the native, I will check and fix this.
Thanks for the bug report.
Best Regards,
Peter
ChemAxon 0a9e2a55e1
13-12-2011 09:02:49
Hi Ceara,
This bug is fixed in Marvin 5.7
Best Regards,
Peter