JChemManager freezes when importing Chembl

User 6181a27950

16-08-2011 07:51:41

Hi, we've been trying to import a few SDF files from different databases into JChem with the use of JChemManager. Drugbank worked 100%, but when trying to import Chembl (version 11), it freezes at molecule count 2910 and says there's 21 days remaining for the process. It doesn't throw an exception or anything, it just stops doing anything without telling you (when looking at the mySQL table you can see no new structures are being imported, so it's not just a Java Swing problem). We've left it for 3 days no, but it stays frozen. Is there some kind of log file to see what's happening or what would you suggest?

ChemAxon 9c0afc9aaf

16-08-2011 16:17:53

Hi,


Please let us know your exact JChem version (Help -> About in the GUI)


Please also confirm that you are trying to import the following file, and if you have unzipped it before import:


ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/chembl_11/chembl_11.sdf.gz


Best,


Szilard


 

ChemAxon 9c0afc9aaf

16-08-2011 16:23:14

PS:


Please also paste the output of :


jcman t <table_name>

User 6181a27950

17-08-2011 06:35:58

Using JChem Manager version 5.5.0.1, and yes unzipping exactly that file.


jcman results in this:



Table type: Molecules

Table version: 5050000

Uses tautomers for duplicate search: No

Filters out the duplicate structures: No

Fingerprint settings:

Length (bits): 512
Pattern length: 6
Bits per pattern: 2

Table uses default standardization.

Column name Type name
1 CD_ID INT
2 CD_STRUCTURE MEDIUMBLOB
3 CD_SMILES VARCHAR
4 CD_FORMULA VARCHAR
5 CD_SORTABLE_FOR VARCHAR
6 CD_MOLWEIGHT DOUBLE
7 CD_HASH INT
8 CD_FLAGS VARCHAR
9 CD_TIMESTAMP DATETIME
10 CD_PRE_CALCULAT TINYINT
11 CD_FP1 INT
12 CD_FP2 INT
13 CD_FP3 INT
14 CD_FP4 INT
15 CD_FP5 INT
16 CD_FP6 INT
17 CD_FP7 INT
18 CD_FP8 INT
19 CD_FP9 INT
20 CD_FP10 INT
21 CD_FP11 INT
22 CD_FP12 INT
23 CD_FP13 INT
24 CD_FP14 INT
25 CD_FP15 INT
26 CD_FP16 INT
27 LOG_P FLOAT
28 DONOR_COUNT INT
29 ACCEPTOR_COUNT INT
30 ROTATABLE_BOND_ INT
31 LIPINSKI TINYINT
32 LEAD_LIKENESS TINYINT
33 BIOAVAILABILITY INT
34 LOG_D FLOAT
35 RING_COUNT BIGINT
36 NAME VARCHAR
37 DB_ID VARCHAR


 

ChemAxon a3d59b832c

17-08-2011 07:02:34

Hi Jeanré,


This is a known issue in 5.5.0.1 with tautomer tables. Please upgrade to JChem 5.5.1 where this bug has been fixed.


 


Let us know if it helped.


 


Best regards,


Szabolcs

ChemAxon a3d59b832c

17-08-2011 16:00:31

Hi,


 


I think I mixed up both the version numbers and the settings here.


This issue is not related to tautomerization, it seems.


 


We will check the issue in more detail and get back here soon.


I am sorry for the confusion.


 


Best regards,


Szabolcs

ChemAxon 9c0afc9aaf

17-08-2011 20:40:38

Hi,


We couldn't reproduce the problem so far.


1. What is the CPU load when the import "freezes" ? Is it close to 100% or to 0%  ?


2. On Windows JChemManager GUI opens a console. One can block the console e.g. by selecting a rectangle area with the mouse. The nezt time the process wants to write anything to the console it will be blocked indefinitely until the selection is removed - with 0% CPU utilization.


3. I see quite a few additional columns, some or most of them might be Chemical Terms columns. 


We have tried to guess the possible expressions and crate similar columns. The import slowed down due to the extra calculation (expected), but never came to a long stall.


Please send us the content of the property table (JChemProperties) so we can take a more detailed look at your settings.


 


Best,


 


Szilard

User 6181a27950

19-08-2011 09:19:07

Updating JChemManager actually did help -- it now got to 530770 molecules, but it still freezes exactly the same way (i.e. mySQL reports no new structures after a whole 24 hours), only this time the CPU load is 100 (usually about 5000 because of multiple processors on our server) for that process, and not 0 as it was before we updated.


Here is the properties table:


 


option.structureCompressionDisabled true [BLOB - 0B]
option.commitInterval 50 [BLOB - 0B]
propertytable.identifier PT_ID_b9e69d646bde49fc90a5ec55c803dade [BLOB - 0B]
cache.registration_table JChemProperties_CR [BLOB - 0B]
table.structures.chemTermColumn.LEAD_LIKENESS (mass() <= 450) &&
(logD("7.4") >= -4) && (logD("7.4") <= 4) &&
(ringCount() <= 4) &&
(rotatableBondCount() <= 10) &&
(donorCount() <= 5) &&
(acceptorCount() <= 8) [BLOB - 0B]
table.structures.chemTermColumn.ROTATABLE_BOND_COUNT rotatableBondCount() [BLOB - 0B]
table.structures.chemTermColumn.ACCEPTOR_COUNT acceptorCount() [BLOB - 0B]
table.structures.chemTermColumn.DONOR_COUNT donorCount() [BLOB - 0B]
table.structures.chemTermColumn.LOG_D logD("7.0") [BLOB - 0B]
table.structures.chemTermColumn.LIPINSKI (mass() <= 500) &&
(logP() <= 5) &&
(donorCount() <= 5) &&
(acceptorCount() <= 10) [BLOB - 0B]
table.structures.creationTime 2011-06-28 14:57:09.334 [BLOB - 0B]
table.structures.validityTimestamp 2011-08-17 09:56:11.578 [BLOB - 0B]
table.structures.absoluteStereo true [BLOB - 0B]
table.structures.tableType 0 [BLOB - 0B]
table.structures.tautomerDuplicateFiltering false [BLOB - 0B]
table.structures.JChemVersion 5.5.1.0 [BLOB - 0B]
table.structures.duplicateFiltering false [BLOB - 0B]
table.structures.chemTermColumn.RING_COUNT ringCount() [BLOB - 0B]
table.structures.chemTermColumn.LOG_P logP() [BLOB - 0B]
table.structures.fingerprint.numberOfBits 512 [BLOB - 0B]
table.structures.fingerprint.numberOfOnes 2 [BLOB - 0B]
table.structures.fingerprint.numberOfEdges 6 [BLOB - 0B]
table.structures.fingerprint.numberOfStrucFPCols 0 [BLOB - 0B]
table.structures.chemTermColumn.BIOAVAILABILITY (mass() <= 500) +
(logP() <= 5) +
(donorCount() <= 5) +
(acceptorCount() <= 10) +
(rotatableBondCount() <= 10) +
(PSA() <= 200) +
(fusedAromaticRingCount() <= 5) >= 6 [BLOB - 0B]
table.structures.switchOffAllProtections false [BLOB - 0B]
table.structures.version 5050100 [BLOB - 0B]
table.structures.ctVersion 5050000 [BLOB - 0B]
table.structures.mdVersion 5050000 [BLOB - 0B]
table.structures.chemTermColumn.NAME name() [BLOB - 0B]

ChemAxon 9c0afc9aaf

21-08-2011 17:40:30

Hi,


Updating JChemManager actually did help -- it now got to 530770 molecules, but it still freezes exactly the same way (i.e. mySQL reports no new structures after a whole 24 hours), only this time the CPU load is 100 (usually about 5000 because of multiple processors on our server) for that process, and not 0 as it was before we updated.

We still could not reproduce the problem, the structures are importing fine for us.


- Please confirm the version you have upgraded to is 5.5.1.0 (we have tried with this version)


- Please let us know your Java version by pasting the output of 


java -version


Also in the JChem Manager GUI: Help -> About


- The value of the CLASSPATH environment variable (if any)


- Do you get the same problem at the same place if you try again ? (Is the error deterministic ?)


- Do you get the same problem if importing from command-line ?


jcman a <table> <file>


- You may also try to import the structures without Chemical Terms columns first, and then add these columns one-by-one (File -> Modify). This will prompt you to calculate the values for the column. 

User 6181a27950

24-08-2011 09:29:52

We are using version 5.5.1.0


Java version:


java version "1.6.0"
Java(TM) SE Runtime Environment (build pxa6460sr7-20091215_02(SR7))
IBM J9 VM (build 2.4, JRE 1.6.0 IBM J9 2.4 Linux amd64-64 jvmxa6460sr7-20091214_49398 (JIT enabled, AOT enabled)
J9VM - 20091214_049398
JIT - r9_20091123_13891
GC - 20091111_AA)
JCL - 20091202_01


There is no CLASSPATH variable in the about box.


I have tried using the command line interface, and it does exactly the same thing, roundabout the same number of structures in the table.


When I tried to import it without any special columns, it stops much sooner, roundabout 3000 molecule count on the GUI, but the table is empty.

ChemAxon 9c0afc9aaf

24-08-2011 13:51:06


Hi,


You are using a non-supported Java version.


It is often a cause of strange behavior and problems.


Please obtain the latest JRE or SDK from here, set your PATH environment variable so that the new Java would be the default:


http://www.oracle.com/technetwork/java/javase/downloads/index.html


You should have your "java" command to point to a standard Java implementation.


A java -version outtput should look similar to this :


 


$ java -version 
java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed mode)

I hope this helps.


Best,


Szilard


 


 


User 6181a27950

31-08-2011 11:38:47

we changed our java version to the one you specified (i.e. not the latest one) and it gives exactly the same problem at some point (roundabout molecule 530703) the cpu usage drops to 100% (instead of the usual 1300% when importing) and no new structures are added after that

ChemAxon 9c0afc9aaf

31-08-2011 16:13:48










discovery wrote:

we changed our java version to the one you specified (i.e. not the latest one) and it gives exactly the same problem at some point (roundabout molecule 530703) the cpu usage drops to 100% (instead of the usual 1300% when importing) and no new structures are added after that



- please provide the output of "java -version" (just in case).


- also check fn the JChemManager GUI is using the same versio (Help -> About)


- please let us know if you were using the JChem Manager GUI, or cmdline, etc


- Where did you get the number 530703 from ?  (etc. JChemManager GUI or by other means) 


- provide the output of "echo $CLASSPATH"


- If the process is still alive (or next time it gets stuck)


> please confirm which process is using 100% (java or MySQL)


> please obtain the thread dump by pressing CTRL+Break in the console where you have started JChemManager with the "jcman" command

ChemAxon 9c0afc9aaf

31-08-2011 17:38:54

PS:


- Did you have the Chemical Terms cloumns in the table ? Do the structures import into a new database table without empty columns ?


- Please let me know  the size in bytes of the input file - to make sure we are not trying with different versions of the file.


- Please see if the attached smaller file imports fine for you (after unzipping). It would help a lot to reduce the size of the test set. 


I know it's  a lot of questions, but since everything is working fine for us we must look for all possible differences.

User 6181a27950

02-09-2011 08:17:33

java -version output:
Java(TM) SE Runtime Environment (build pxa6460sr7-20091215_02(SR7))
IBM J9 VM (build 2.5, JRE 1.6.0 IBM J9 2.5 Linux amd64-64 jvmxa6460sr7-20091214_49398 (JIT enabled, AOT enabled)
J9VM - 20091214_049398
JIT - r9_20091123_13891
GC - 20091111_AA)
JCL - 20091202_01

JChemManager is using the same version.

I've tried both JChemManager and cmdline, as you instructed.

530703 is the number of molecules in the table I'm importing structures into when the process stops, i.e. it has imported 530703 successfully from chembl_11.sdf.

$CLASSPATH is not defined, but JChemManager works fine.

It is the java process that uses 100% CPU.

I assume you mean "CTRL-C" to break the process? Pressing that ends the process, no other output is given. CTRL-Pause/Break does nothing.

What do you mean with "Chemical Terms columns"? The structures are imported into an empty table with no empty columns, i.e. all of them get filled in by JChem.

The input filesize is 2698596183 (chembl_11.sdf).

When using that file it does exactly the same thing, but only after 10772 molecules successfully imported.

ChemAxon 9c0afc9aaf

02-09-2011 15:44:07

 


java -version output:
Java(TM) SE Runtime Environment (build pxa6460sr7-20091215_02(SR7))
IBM J9 VM (build 2.5, JRE 1.6.0 IBM J9 2.5 Linux amd64-64 jvmxa6460sr7-20091214_49398 (JIT enabled, AOT enabled)
J9VM - 20091214_049398
JIT - r9_20091123_13891
GC - 20091111_AA)
JCL - 20091202_01





Hi,

You are using a non-supported Java version.

It is often a cause of strange behavior and problems.

Please obtain the latest JRE or SDK from here, set your PATH environment variable so that the new Java would be the default:

http://www.oracle.com/technetwork/java/javase/downloads/index.html

You should have your "java" command to point to a standard Java implementation.

A java -version outtput should look similar to this :

 

$ java -version 
java version "1.6.0_25"
Java(TM) SE Runtime Environment (build 1.6.0_25-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.0-b11, mixed mode)