Experience with a large local database

User 773d472e7f

17-05-2013 17:15:51

I will collect under this thread my experience building a large local database.


Computer: 64bit, 8 GB RAM, Win8 64 bit (with software to look and feel like Win7).


I started with clean SDFiles with PASS (Pa>0.4) predictions. This means there are a lot of data with each record.



 


It takes about 2-3 hrs to load 1 million compounds.


It takes ca. 3 hrs to load the structure cache.


I needed to make an index on the AKOS number. This failed because the VARCHAR was set to 1000. I copied the data into a new field with VARCHAR 13. This took overnight, more than 12 hrs. I could create in about an hour an index over the AKOS number in the field with VARCHAR 13.


Don't clear the query and go to browse. IJC tries to reload all structures. The option "Load as needed" was used.



Cancelling queries takes too long for me to wait. It goes faster exiting IJC.


I think the overall performance is very good.


Alex

ChemAxon 2bdd02d1e5

17-05-2013 17:24:26

Alex, thank you for sharing this information. It is also a valuable feedback for us.


Filip

User 773d472e7f

10-06-2013 14:55:07

Situation: Fast desktop computer (Win8, 64bit, 8 MB RAM), database ca. 13 million structures, load time for the cache = time to do the first SSS is ca. 45 minutes. The cache is about 3.3 GB.


No changes are made to the database.


Question: Why can't one load the cache from file into RAM. One should be able to do this in seconds instead of 45 minutes.


Workaround: I do a SSS every 24 hrs. ICJ crashes sometimes doing somthing else, and a reboot is required. OK, I restart IJC, do a search, so Iam ready when I really need it.


Alex

ChemAxon aa7c50abf8

12-06-2013 08:01:39

Hi Alex,


How "local" is your database? Is it Derby? Are you the sole user of the database?


Which JChem version is this?


Thanks


Peter

User 773d472e7f

12-06-2013 08:05:12

I use Derby, are the sole user, and it is presently IJC 5.12.3.1. I have not updated to 6. Which I will do, soon.


Alex

ChemAxon aa7c50abf8

13-06-2013 11:41:34

Does your Java VM have enough memory? If you're close to full utilization, increased GC-ing may slow the process significantly. Please, could you check the return value of the Runtime.freeMemory(), Runtime.maxMemory(), Runtime.totalMemory() functions after the structure cache has been fully loaded?


I have not updated to 6. Which I will do, soon.

I am interested in your findings with 6.0. I started to get OutOfMemoryError-s while doing search benchmarks with 38 million pubchem using 6.0 even if I used twice as much memory as I did with 5.12.1.


Question: Why can't one load the cache from file into RAM. One should be able to do this in seconds instead of 45 minutes.

Well, we ultimately do this now, since the cache data is stored in database files. :-)


Peter

User 773d472e7f

16-06-2013 10:35:29

IJC 6.0.0 ran out of memory.


Changin the heap size to 5020, IJC 6.0.0 crashed after using all the computer memory (8GB) with an empty error mesage window.


IJC 6.0.0 does not work for my setup. It seems to need more memory than ICJ 5.


Please fix:


When one changes the heap size, one has manually add again the "\"s in D:\Program Files\ChemAxon\InstantJChem600\etc\instantjchem.conf.


Alex

User 773d472e7f

16-06-2013 15:56:26

Finally I managed to increase the Heap size to 4462 MB.


# This is the Instant JChem configuration file.

# TODO RELEASE
# ${HOME} will be replaced by JVM user.home system property
default_userdir="${HOME}/.${APPNAME}/6.0"
default_mac_userdir="${HOME}/Library/Application Support/${APPNAME}/6.0"

# options used by the launcher by default, can be overridden by explicit
# command line switches, more details on
# http://www.chemaxon.com/instantjchem/ijc_latest/docs/admin/startup_options.html />#
# To increase maximum memory allocation change the -J-Xmx value e.g. from -J-Xmx512m to -J-Xmx1024m
#
default_options="--branding instantjchem -J-Xms128m -J-XX:MaxPermSize=4096m -J-Dderby.system.home=derby -J-Dnetbeans.logger.console=true -J-ea -J-Dorg.netbeans.ProxyClassLoader.level=1000 -J-DuseGtk=false -J-Dorg.netbeans.core.TimeableEventQueue.report=86400000 -J-Xmx5020m"
default_mac_options="${default_options} -J-Xmx4096m"

# default location of JDK/JRE. Uncomment and edit as appropriate
# can be overridden by using --jdkhome <dir> switch
#jdkhome="/path/to/java6jre"

jdkhome="D:\Program Files\ChemAxon\InstantJChem600\jre"


The memory shown as about 3500/4462MB, after a few searches  it was 2499/4462MB. Do I assume correctly that the cache was first 3500 MB? Why would teh cache get smaller?


However, I realized that IJC became very sluggish. The SSS that took 1 sec with IJC take now ca. 10-15 sec. Sometimes IJC 6 freezed, It wakes up after about 20-30 seconds.


I also have to index the fields again that were indexed in IJC5. I assume this is normal, but still it will cost a lot of time.


Alex


 

User 773d472e7f

16-06-2013 16:05:12

After the cache was successfully formed in the previous run. I closed IJC6 and restarted with an SSS. It is disappointing, it takes again an estimted 1-3 hrs to build the cache.


The cache seems NOT to be read from a file, contrary to what you wrote in the previous section.


I assume the performance would not be dependent if I use MySQL instead of Derby.


Alex

ChemAxon aa7c50abf8

16-06-2013 16:23:13

One of your remarks is easy to comment on, we will get back to you on your other findings soon:


The cache seems NOT to be read from a file, contrary to what you wrote in the previous section.

I wrote: "we ultimately do this now, since the cache data is stored in database files."


I am pretty sure that both Derby and MySQL store the data in files. (Well, where would they?)


Peter

User 773d472e7f

16-06-2013 16:45:04

I have the correct memory data, as displayed inIJC after the cache was formed.


IJC 6.0.0: 4125/4638


IJC5.12.3.1:  3104/3875


Sorry, I misinterpreted the English, "ultimately do this now".


Concerning MySQl and Derby. I assume the building of the cache is in both system similar in time.


Alex




 


 

ChemAxon aa7c50abf8

17-06-2013 13:49:30

Seeing the 3.5GB utilization of 4.5 total heap space, I'd say that heap space is unlikely be a bottle-neck for cache loading. The decrease in heap utilization is likely to be due to garbage collection.


Did you observe general activity of the computer? Did CPU or disk seem to be the bottle neck?


We don't currently have very sophisticated tools to diagnose performance problems during cache loading, but we will try to come up with some thing to support this problem.


Peter