Technical Support Forum Index
Technical Support Forum
Access ChemAxon scientists and developers here. For registration and login issues contact website support.

Support Ticket System is replacing forum

This forum was converted into a searchable archive. You cannot add posts here any more. For support please use our new Ticket System.

Create your first ticket
"Wall time limit reached" error while indexing
To watch this topic for replies  Register (enables digests) or give email address:
This topic is locked: you cannot edit posts or make replies.
Display posts from previous:   
    View previous topic :: View next topic    
Author Message
xinbo

Joined: 08 Aug 2016
Posts: 22

View user's profile

Back to top
Link to postPosted: Tue Sep 13, 2016 9:45 amPost subject: "Wall time limit reached" error while indexing Reply with quote

Hello, 

I have met following error while indexing a large table (about 100 million records )

 

ERROR:  ChemIndex.cpp:61 OperationAborted:

*** class com.chemaxon.zetor.api.exceptions.OperationAbortedException

        Wall time limit reached

 

And I noticed that at first the index calculation was spread on different CPUs (each 20% - 30%), but after some time it became only 1 CPU with high occupation (near 100%) while others were all idle, then above error came out.

 

Does anyone has any clue how this error comes from and which configuration can be done to avoid that ?

Also, since indexing such big table is very time consuming, I would like to know if there's any tips that can fasten the indexing ?

 

Thanks,

William

Krisztina
ChemAxon personnel
Joined: 27 May 2011
Posts: 375

View user's profile

Back to top
Link to postPosted: Tue Sep 13, 2016 2:56 pmPost subject: Reply with quote

Hi William,

Unfortunately, the 100 million records are above the current scope of our developments and testings. We are aware of the increasing time request of the indexing process in case of such tables sizes and are working on finding a good solution for the performance issues.

At the moment, we can recommend the following workaround:

  1. Drop chemical indexes.
  2. sudo service jchem-psql stop
  3. Modify the file jchem-psql.conf in folder /etc/chemaxon .
    Change the setting 'mapdb' to 'rocksdb' in the row:
    com.chemaxon.jchem.psql.env.scheme=mapdb
  4. sudo service jchem-psql init
  5. sudo service jchem-psql start
  6. Create chemical indexes.

Please let us know your findings.

Best regards,

Krisztina

xinbo

Joined: 08 Aug 2016
Posts: 22

View user's profile

Back to top
Link to postPosted: Wed Sep 14, 2016 12:29 amPost subject: Reply with quote

Hello,

 

Thanks for your reply, but unfortunately the same error still came out when I set "com.chemaxon.jchem.psql.env.scheme=rocksdb" and re-index the table

 

Regards,

William

Roland
ChemAxon personnel
Joined: 13 May 2016
Posts: 2

View user's profile

Back to top
Link to postPosted: Wed Sep 14, 2016 9:45 amPost subject: Reply with quote

Hello William,

The JVM may run out of memory while indexing this large table.

You can increase the JVM maximal memory size in file: /etc/default/jchem-psql .

The example in the file (-Xmx14g) sets maximal memory to 14 GB.

It is advisable to use rocksdb as backend in case of such a large table.

After setting the backend you should initialize the jchem-psql service. (sudo service jchem-psql init)

After setting the maximal memory size you should restart the jchem-psql service.

 

Best regards,

Please let us know your findings.

Roland

xinbo

Joined: 08 Aug 2016
Posts: 22

View user's profile

Back to top
Link to postPosted: Thu Sep 15, 2016 1:06 amPost subject: Reply with quote

Hello, 

Yes, I did set Xmx to 20g and restarted the jchem-psql service.

 

And in /etc/chemaxon/jchem-psql.conf, I set "com.chemaxon.jchem.psql.env.scheme=rocksdb", should i also change "com.chemaxon.jchem.psql.main.scheme" and "com.chemaxon.jchem.psql.idx.scheme" ?

 

Are there any specific setting for rocksdb like other backends(mapdb, mvstore, hashed, cassandra)? Cause I didn't find in the configuration file.

 

Also, I noticed that usually the indexing is multithreaded, but at some point jchem will get stuck in a single thread task for a long time (only 1 processor with nearly 100% occupation and others are all idle), is that normal ?

 

Thanks,

William

Roland
ChemAxon personnel
Joined: 13 May 2016
Posts: 2

View user's profile

Back to top
Link to postPosted: Thu Sep 15, 2016 12:21 pmPost subject: Reply with quote

Hi William,

Setting 'com.chemaxon.jchem.psql.env.scheme' to 'rocksdb' in /etc/chemaxon/jchem-psql.conf is sufficient as long as the another two options mentioned are commented out (with '#' sign).

Unfortunately there are no specific settings for rocksdb in the configuration file at the moment.

Regarding indexing at some points executes in a single thread for a long time. This is a known behavior we also observed. It does not indicate a bug or failure.

Best regards,
Please let us know your findings.

 

Roland


xinbo

Joined: 08 Aug 2016
Posts: 22

View user's profile

Back to top
Link to postPosted: Thu Sep 15, 2016 1:43 pmPost subject: Reply with quote

Hello, Roland

 

Thanks for your reply, and here's another question, is there any significant performance difference between JChem PostgreSQL Cartridge and JChem Oracle Cartridge ? 

 

Thanks,

William

Volfi
ChemAxon personnel
Joined: 07 Jun 2004
Posts: 996

View user's profile

Back to top
Link to postPosted: Thu Sep 15, 2016 3:38 pmPost subject: Reply with quote

xwang_01 wrote:

Thanks for your reply, and here's another question, is there any significant performance difference between JChem PostgreSQL Cartridge and JChem Oracle Cartridge ?

Yes definitely, JChem PostgreSQL Cartridge (JPC) is faster for queries returning only small number of hits (like under few thousand). But there is an other major difference. JPC has higher memory footage then JChem Oracle Cartridge (JOC), however if the memory needed to cache all the structures is not available then JOC just cannot work, while JPC can still work. So as you can see there are multiple factors to consider.

best

xinbo

Joined: 08 Aug 2016
Posts: 22

View user's profile

Back to top
Link to postPosted: Thu Sep 15, 2016 4:01 pmPost subject: Reply with quote

Thanks, so here's what I need basically:

  1. Approximately 100 million smiles (within a single table or multiple tables with sharding, depends on the performance)
  2. Will do exact search, substructure search and tanimoto similarity search (number of hits can be quite different depends on the search query smile)

 

Since my previous test were all focus on JPC, I would like to know if it's possible that JOC might have great advantage in my use case ?

 

Thanks a lot,

William

xinbo

Joined: 08 Aug 2016
Posts: 22

View user's profile

Back to top
Link to postPosted: Sun Sep 18, 2016 4:41 amPost subject: Reply with quote

Hello, 

I reproduced the previous "wall time limit reached" error, this time was copy 10 millions smiles into an indexed table, I tried many times, the error always comes out at certain point (for me is copy the batch starts from 5535000), so could it be caused by some invalid smiles ? But I didn't get any further error information.

 

ERROR:  ChemIndex.cpp:61 OperationAborted:

*** class com.chemaxon.zetor.api.exceptions.OperationAbortedException

        Wall time limit reached

CONTEXT:  COPY jchem_10m_mol, line 5535000

 

Thanks,

William

Krisztina
ChemAxon personnel
Joined: 27 May 2011
Posts: 375

View user's profile

Back to top
Link to postPosted: Mon Sep 19, 2016 10:35 amPost subject: Reply with quote

Hi William,

Yes, unfortunately one erroneous / invalid smiles can produce this error. Could you identify and send us this molecule in smiles ? If the molecule is confidential, you can send it to jpc-support _at_ chemaxon.com.

Best regards,

Krisztina

xinbo

Joined: 08 Aug 2016
Posts: 22

View user's profile

Back to top
Link to postPosted: Tue Sep 20, 2016 1:25 amPost subject: Reply with quote

Hello, Krisztina

 

My whole dataset have 90,878,834 smiles, and the COPY execute in batch so I'm not sure exactly which smiles are invalid.

 

I did following operations like suggested in manual

  • CREATE TABLE jchem_mol(inchi_key text, smiles text);
  • COPY jchem_mol FROM 'xxx.csv' (FORMAT CSV);
  • CREATE TABLE invalid_mol AS SELECT * FROM jchem_mol WHERE NOT is_valid_molecule(smiles);

 

However, 0 smiles are found as invalid, so could you please share if there's any other methods to locate the invalid smiles?

 

Thanks,

William

Krisztina
ChemAxon personnel
Joined: 27 May 2011
Posts: 375

View user's profile

Back to top
Link to postPosted: Tue Sep 20, 2016 2:54 pmPost subject: Reply with quote

Hi William,

We think that the erroneous molecule is between lines 5530000 and 5535000 because indexing runs in batches of 5000 molecules, by default.

Would you copy these 5000 lines (5000 smiles) in a new text file and try to import and index them separately, but before starting the create index process, please run

set chemaxon.index_creation_batch_size to 1; 

This way, the batch size will be changed to 1 in session level.

 

An other independent idea is to increase the wall_time_limit by 

set chemaxon.search_wall_time_limit to 1200000; 

The default is 600000 (= 10 min).

See the documentation.

 

All of these setting can be modified in the /etc/postgresql/9.5/main/postgresql.conf file as well, but in that case after the modification postgresql service must be restarted.

Best regards,

Krisztina

xinbo

Joined: 08 Aug 2016
Posts: 22

View user's profile

Back to top
Link to postPosted: Wed Sep 21, 2016 9:16 amPost subject: Reply with quote

Hello, Krisztina

 

Thanks for your help, I successfully located the smiles that caused the problem

 

InchiKey : "TYAGLVAIEVGVDE-UHFFFAOYSA-N"

Smiles : "CC(=O)CC1C2C13C24C35C46C57C68C79C81C92C11C22C11C22C11C22C11C22C11C22C11C22C11C22C11C22C11C22C1C2"

 

Hope this can help, and is there any pattern for these "invalid" smiles ?

 

Regards,

William

Krisztina
ChemAxon personnel
Joined: 27 May 2011
Posts: 375

View user's profile

Back to top
Link to postPosted: Thu Sep 22, 2016 2:30 pmPost subject: Reply with quote

Hi William,

Thank you for the molecule. Unfortunately, it really freezes the indexing in PostgreSQL Cartridge. Additionally, this molecule freezes the indexing in JChem Oracle Cartridge as well. Now we start to investigate what causes this behavior and will let you know when the issue is fixed.

Until then, as a workaround, we can only recommend to delete this molecules from the dataset.

Best regards,

Krisztina

 

This topic is locked: you cannot edit posts or make replies.
Page 1 of 1


To watch this topic for replies   Register (enables digests) or give email address  
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum