Problems with java, substructure query and blank dialogs

User e54ac3b52b

24-03-2009 07:06:34

Hi,





 I'm trying tu run some queries on a table containing ~9 million compouns. The problem is than when I try to build a SMARTS query the sketch window often pops out blank. Not always but most of the times. This behaviour also appears with other dialogs, annoyingly often they pop out sized huge but blank. I kind of suspect this is a bug because I don't see how this is supposed to help the user.





Another problem with the same query is that the first SMARTS pattern is processed fine, but the second remains hanging. The first one took about 10-15 minutes, on the second one i lost patience after two hours. I'm using the 2.4.3.1 version if IJC, but I encountered the same problems on 2.4.3, before updating.





I tried to overcome these problems by pointing IJC to my system java instead of the bundeled one,  but judging by the output on startup it still uses it's own. I think I'm missing something, how can this be done?





Does anybody know how to make IJC work?





kaliif

ChemAxon fa971619eb

24-03-2009 07:58:13

With such a large database you will definitely need to allocate additional memory to IJC. As a rough guide, for 9 million structures you will need to allocate at least 1GB memory, and maybe more. With this size of database you will see some performance degradation, but it should still be possible to run searches.





For instructions on setting and monitoring memory usage look here:


http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/tips_and_tricks/memory_usage.html


http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/changing_user_settings.html


http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/tips_and_tricks/performance_tips.html





When you run searches the first search will take some time to run, but after that subsequent searches should be faster.





Also, when you see the dialog that shows the progress of the hit list retrieval when you first open a view, or run a query you can click on the "Stop" button at any stage to halt the results retrieval at that stage. This avoids the need to retrieve all 9 million strucutres. This does not affect the results of any subsequent searches.





To change the version of java being used edit the configuration file:


<ijc_install_dir>/etc/instantjchem.conf


(memory settings can also be set manually in that file).





Tim

User e54ac3b52b

24-03-2009 10:37:33

I have allocated 2Gb of memory and the performance is ok. The first query takes 10-15 minutes. I'd assume the next one would not take longer but it can run for hours and not produce anything useful. Also I noticed the same thing when trying to load a list - after couple of hours nothing had happened.





The query of other parameters I have calculated seems fine and the times are reasonable.





About stopping the current action by clicking the "Stop" button, well I would but the dialogs are blank and the whole IJC window is unresponsive. The cancel button down right corner also doesn't work. I  don't think this is the desired behaviour.





I'm using IJC on 64-bit Ubuntu Linux, but since IJC uses it's own java it should not be a problem, should it? My co-worker is using IJC also on 64-bit system (windows though) and he's not having these problems.

ChemAxon fa971619eb

24-03-2009 10:57:41

I'll try to reproduce this and see what happens.


Are you using a local database?





Can you try these things:





1. use the monitor toolbar to monitor memory usage. If it gets to the 2GB limit you have allocated then this is not enough.


See: http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/tips_and_tricks/memory_usage.html





2. Look in the log file (View -> Instnat JChem log file) for any errorrs or warnings.


Send us the log file if it contains anything of interest.





Thanks


Tim

ChemAxon fa971619eb

25-03-2009 08:38:48

I've tried something similar on a 64 bit Linux system (Fedora Core 8 in my case) and found that things work OK as long as you set enough memory to IJC.


I did this:





1. Import 4 million structures into a local database.


2. Set max memory (the Xmx setting) to 800MB (500MB was not sufficient).


3. Ran searches.





The first structure search took about 30 seconds to complete as the structure cache was being loaded (this is what needs the large amount of memory).


Subsequent searches complete in about a second.





If you also have search terms for non-structure fields then the times are much slower as you might expect as the fields are not indexed. However, adding indexes seems to make matters worse, not better. This is probably a tuning problem with the Derby database. We will investigate this.





I also compared the Java version that is installed with IJC (32 bit) with a 64 bit version of Java and did not see any significant differences.





I hope this information helps you sort out the problems.





Tim

ChemAxon fa971619eb

26-03-2009 18:34:40

Here are the results of some further research.


The performance on searches on non-structure fields can be speeded up by adding appropriate indexes to the database. The following is based on the assumption that you are using a local database (please say if not) but similar things probably apply to Oracle and MySQL.


This is quite a complex subject, and adding an index can make things much worse in some cases, so this should be tested carefully.


But by judicious use of indexes I managed to be able to execute combined structure and data queries on a database of 4 million structures in just a couple of seconds on a 64 bit linux system. Of course each query and database will be different, and some might be much worse.





Rather than describe this in detail here I added extra information to the on-line documentation. Look here for starters:


http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/tips_and_tricks/performance_tips.html#indexes





Let me know if this helps (but make sure you solve the memory issue first as unless you have sufficient memory allocated then you will never see good performance).





Tim