Importing data from .sdf file is very slow

User 73ad691ca3

01-02-2013 13:00:50

Dear Chemaxon Team,


We are trying to import the structure and data from a .sdf file to the Jchem structure table using 'jchem .net API'. 


Our each .sdf file consists of 100,000 records.  We need to import the structure and other columns data to the database table in a single shot. 


While importing the .sdf file, the Jchem has taken around 2 hours and 15 minutes to import 30,000 records in the .sdf file, Still there are 70,000 compounds need to be imported from the sdf file ..., we do not know how much time the jchem is going to take to import the remaining compounds. 


The Jchem is taking too much of time to import the .sdf file using importer class.  We need to find the way to import the .sdf file in less time.


How can we achieve the import of structures from .sdf file in a very less time?  This is very urgent work to implement.  Just we need to push the structure data from .sdf file to a table in quick time...


Here is the sample code we use to import the .sdf file:


--------------------------------------------------------


  private static string databaseImport(string inputFile, ConnectionHandler connectionHandler, string structTableName, bool setHaltOnError, int linesToCheck, string connections)
        {
            try
            {


                Importer importer = new Importer();


                importer.setInput(inputFile);
                importer.setConnectionHandler(connectionHandler);
                importer.setTableName(structTableName);
                importer.setLinesToCheck(linesToCheck);
                importer.setHaltOnError(setHaltOnError);
                importer.setDuplicateImportAllowed(UpdateHandler.DUPLICATE_FILTERING_OFF);


                importer.setFieldConnections(connections);


                // Gather information about file
                importer.init();


                // Import molecules into database table
                int imported = importer.importMols();


                return imported.ToString();


            }
            catch
            {
                throw;
            }
        }


--------------------------------------------------------------------


Please look into this issue and reply us ASAP.


Please let me know if you require any other information regarding this issue.


 


--


Thanks & Regards,


D. Senthil kumar vijai.

User 73ad691ca3

01-02-2013 15:43:56

Dear Chemaxon,


I have imported the .sdf file with structures around 1,00,000 using  the Jchem Manager tool, 


The Jchem manager tool is importing the sdf file data very fast to the structure table (i.e., it took around 15 minutes to import the sdf file to jchem structure table).


But why the Jchem .Net API - importer takes more than 5 hours to import the sdf file data to the structure table.


Please look into this issue ASAP, and suggest us to achieve the import sdf file using Jchem .Net API in very less time similar to the Jchem Manager tool can be a worth full.


--


Thanks & Regards,


D. Senthil kumar vijai.


 

ChemAxon bd13b5bd77

01-02-2013 19:18:21

Dear JChemBase team, Tamas,


I checked the code and it seems it is pure JChem API, .NET team does not have experience with the direct import to table, we mostly use the Core modules to import SDF into the memory. Could you please have a look at the  Senthil kumar vijai's code what could be slow in it comparing to the code and performance that JChem manager produces?


Thanks in advance.


If it turns out that Java->NET IKVM  conversion causes the issue (I do not believe it I suppose it is a kind of settings or usage of API) please move this task back again to us and we will check the problem on ikvm side.


Viktor


 

User 73ad691ca3

04-02-2013 05:25:10

Dear Chemaxon Team,


Please identify the problem which causes the import structures very slow, and let us know what changes we need to do (settings or usage of API or others) ASAP.


 


Thanks & Regards,


D. Senthil kumar vijai.


 

ChemAxon bd13b5bd77

04-02-2013 08:42:00

We are checking the issue ...

ChemAxon a9ded07333

04-02-2013 14:01:57

Dear Senthil,


Using a randomly selected sdf file containing 25.000 molecules we couldn't reproduce the slowdown. In order to find the cause we need some more information.


Please write us
    - your JChem version number
    - type and version of your database
    - type of your databse table (any, molecule, ...)
    - parameters of your code (especially linesToCheck and connections)


Please also send the input file. If it is confidential you can send it directly to my email address (it's tcsizmazia with the usual chemaxon.com extension).


Best regards,
Tamás

User 73ad691ca3

04-02-2013 15:24:33

 


Dear Tamás, 


Thanks for your reply,


Please find below the details which you had asked.


Details:


    - JChem version number: JChem .NET API 5.10.3.437
    - type and version of your database:  type=Oracle, version= Oracle Database 11g Release 11.2.0.3.0 - 64bit Production
    - type of the database table (any, molecule, ...) : Molecules
    - parameters of the code (especially linesToCheck and connections): linesToCheck=100,


-       Jchem manager tool is installed in windows 2008 server. I have imported a sd file with 99,999 records, and it took less than 15 minutes to import the structures to the table.


-       Jchem .Net API is installed in windows 7 machine, and it took more than 4 hours  to import the same sd file.


 


For connections please refer the below code:


Sample Code:


--------------------sample code---------------------------------------


public static ConnectionHandler GetConnectionHandler()


        {


            try


            {


                ConnectionHandler conh = new ConnectionHandler();


                Connection conn = GetOracleThinConnection(); //use this for Oracle databases


                conh.setConnection(conn);


                return conh;


            }


            catch


            {


                throw;


            }


        }


 


        private static Connection GetOracleThinConnection()


        {


            try


            {


 


                string server = ConfigurationManager.AppSettings["server"]; //server hostname or IP address


                string port = ConfigurationManager.AppSettings["port"];


                string databaseName = ConfigurationManager.AppSettings["databaseName"];


                string user = ConfigurationManager.AppSettings["user"];


                string password = ConfigurationManager.AppSettings["password"];


 


                string jdbcUrl = "jdbc:oracle:thin:@{0}:{1}:{2}";


                jdbcUrl = string.Format(jdbcUrl, server, port, databaseName);


                oracle.jdbc.driver.OracleDriver drv = new oracle.jdbc.driver.OracleDriver(); //do not delete this otherwise it will try


                //to locate the driver in CLASSPATH environment variable  (Java style).


                Connection conn = java.sql.DriverManager.getConnection(jdbcUrl, user, password);


                return conn;


            }


            catch


            {


                throw;


            }


        }


 


 


 


        private static string databaseImport(string inputFile, ConnectionHandler connectionHandler, string structTableName, bool setHaltOnError, int linesToCheck, string connections)


        {


            try


            {


 


                Importer importer = new Importer();


 


                importer.setInput(inputFile);


                importer.setConnectionHandler(connectionHandler);


                importer.setTableName(structTableName);


                importer.setLinesToCheck(linesToCheck);


                importer.setHaltOnError(setHaltOnError);


                // Checking duplicates may slow down import!


                importer.setDuplicateImportAllowed(UpdateHandler.DUPLICATE_FILTERING_OFF);


 


                importer.setFieldConnections(connections);


 


                // Gather information about file


                importer.init();


 


                // Import molecules into database table


                int imported = importer.importMols();


 


                return imported.ToString();


 


            }


            catch


            {


                throw;


            }


        }


 


        public static string Import(string inputFile, string structTableName, string connections)


        {


            try


            {


                return databaseImport(inputFile, GetConnectionHandler(), structTableName, true, 100, connections); //100-mentioned by jchem


            }


            catch


            {


                throw;


            }


        }


 


Config file:


       <?xml version="1.0"?>


<configuration>


  <appSettings>


 


 


    <!-- jchem connection handler -->


    <add key="server" value="db"/>


    <!--//server hostname or IP address-->


    <add key="port" value="1521" />


    <add key="databaseName" value="dbname" />


    <add key="user" value="username"/>


    <add key="password" value="password"/>


-------------------------------------------------------------------------------


 I am really sorry, as i should not share the actual sd file.  But i have attached a sample sd file with one row.  Our sd file has 1,00,000 rows.


 


Please check the import sd file in your end, and let us know how to import the sd file very fast using the Jchem .Net API (changes in Jchem .Net API settings or others). 


My expectation is we want to achieve the time which jchem manager takes to upload the sd file.


--


Thanks & Regards,


D. Senthil kumar Vijai.


 

ChemAxon bd13b5bd77

04-02-2013 16:36:11

So you mean that you installed JChem Manager to a different machine from where you installed the dotnet API.


Why do you think that this is a correct proof of concept (JChem .NET API is slower than JChem manager) when the test environments are not the same?


Where is the Windows2008 server is sitting and where is the Windows7 sitting? In the same network? Does the JChem Manager produce the same performance from that Windows7 machine.


Our test was ~ 30000 molecules were imported within 8 mins with JChem .NET API, which menas if we extrapolate 25 mins. This is very far from hours.


 

User 73ad691ca3

05-02-2013 05:48:18

Dear Chemaxon Team,


We have done two different exercises. 


The first one is importing the sd file using Jchem .Net API, which takes more than 4 hours to import the sd file data(1,00,000 records) to jchem structure table. The Jchem .Net API is installed in a machine with windows7 OS, and the jchem connects the same oracle database.


The second excercise is we have used Jchem manager tool to import the same sd file, and it takes only 15 minutes to import the same.  This Jchem manager is installed in another windows 2008 server machine; this is also connecting the same oracle database.


All these machines are in the same network.


Please look in to the issue of importing the structures from sd file using Jchem .Net API.


Our sd file has 100000 records with each record has around 15 data fields (i.e., columns). 


Can you please send your code which you have tested with 30000 records?  I will also have a look on this and i will test with our sd file. 


You can send us the sample program to import sd file using Jchem .Net API.  I will only change the database connection details and will try to import our sd file.


 


Thanks & Regards,


D. Senthil kumar Vijai.

ChemAxon 4375c6431d

05-02-2013 10:33:51


Dear Senthil,


you can try our sample.


Regards,
Miklos


ChemAxon bd13b5bd77

05-02-2013 10:51:19

Please measure the performance on your Wndows7 with our built-in SDF.


Please install JChem Manager on your Windows7 and test your 100 000 sdf.


JChem Manager has the same code as recommneded in the API.


Please run our SDF with the JChem Manager from Windows7.


 

User 73ad691ca3

05-02-2013 16:35:35

Dear Chemaxon Team,


Thanks for your reply and for sharing the sample program.


As you mentioned, we measured the performance on our Wndows7 with your built-in SDF, and it took around 33 minutes(i.e., 1964.9848704 sec) to import the sd file into the table.  As a result, the Jchem structure table has 18042 rows.


We will try to do the other steps, and will update to you later.


 


Thanks & Regards,


D. Senthil kumar vijai.

ChemAxon eb65a25631

07-02-2013 10:59:59

I did some measurements with the SDF file and the sample code in the Speed_test project:

- Java /w MS SQL 2012 ~48 sec

- .NET /w MS SQL 2012 ~57 sec

- Java /w Oracle, JChem Cartridge: ~3min 22 sec

- .NET /w Oracle, JChem Cartridge: ~3.43 min

It means, there are no performance problems with the JChem.NET package.

Apparently, the problem causing the slowdown must be on your side, in your environment.

Regards,
Andras