molconvert with inchikey option

User 6d24b35814

11-07-2011 13:13:11

 


Hi


UTILITY: molconvert (commandline/toolkit)


VERSION: 5_5_0_1


OS: Linux (64 bit) Red Hat Enterprise Linux AS release 4 (Nahant Update 9)


I'm trying to use molconvert to generate inchikeys for a large number of smiles (from a .smi file).


The processing starts quite quickly but then seems to get gradually slower. I have a few million smiles to process and with this performance it doesn't seem possible. Is this a bug?


Many thanks


Ceara

ChemAxon 0a9e2a55e1

12-07-2011 09:34:25

Dear Ceara,


I have tested the speed, when the new inchi generation was added with a few 100.000 structures, but I have found the average process time faster with more structure. I will check this with millions of structures.


Could You send a smiles where You experience this problem?


Best Regards,


Peter

User 6d24b35814

12-07-2011 10:46:07

 


Hi Peter


I can't send you the data so I have no idea if this is related to specific chemical structures.


Processing a few hundred smiles is no problem at all - it runs very quickly.


The file that I tried to run the process on contains 4.6 million smiles strings, the process started on Saturday and I killed it this morning, during this time it'd written 54K rows of inchikeys to the output file. The process was still running this morning, the only error being generated on some smiles strings was "Omitted undefined stereo"


I've downloaded the NCI data set from this page http://cactus.nci.nih.gov/download/nci/ (file NCI_aug00_SMI) it has 250K structures and I'm running molconvert inchikey on it now (it is generating quite a lot of errors). As I write this it has output ~7K inchikeys but I think it has noticeably slowed down already, so might be a good set to test?


Kind regards


Ceara

User 6d24b35814

14-07-2011 10:19:35

 


Hi Peter


It's ~48hrs since I set the molconvert process off on the NCI test data set, for me it's written ~55K rows of output. The process has been picked up by our IT monitoring and I've been asked to kill it.


I assume you're running the same test? are you seeing the same performance?


Kind regards


Ceara

ChemAxon 0a9e2a55e1

14-07-2011 13:57:06

Dear Ceara,


I have tested it again, and I think I have reproduced this bug. I have tried this conversion with parts of the 250k smiles You have mentioned. Using molconvert to inchi and inchi:key the average speed was faster with more molecule, but when i called it with inchikey it become much slower after about 10.000-20.000 structures.


Is the conversion to inchi or inchi:key also slow on Your machine?


(If this is the case You have found too that seems really strange because inchi:key concats the inchi, and inchikey outputs.)


It seems to be a bug in our code not in the native, I will check and fix this.


Thanks for the bug report.


Best Regards,


Peter

ChemAxon 0a9e2a55e1

13-12-2011 09:02:49

Hi Ceara,


This bug is fixed in Marvin 5.7


Best Regards,


Peter