Import performance with duplicate filtering (SDF format)

ChemAxon aa7c50abf8

20-06-2005 11:05:28

A common question is:
Quote:
How long does it take to import a sizable number of structures (in the order of several hundreds of thousands) in SDF format into a table containing several million structures with duplicate filtering?
Here is a good basis to answer this question:





Importing 300 thousand structures in SDF format into a table containing 2 million structures (the NCI dataset imported multiple times: http://cactus.nci.nih.gov/ncidb2/download.html) using the jcman command line utility with duplicate filtering takes 49 minutes 9 seconds on a dual 3GHz Xeon machine with Fedora Core 3, Oracle 10g and Java 1.4.2. (699 duplicates were found and discarded.)