Do cached cpds resulted from a JChemSearch ever expired?

19-08-2005 15:03:32

We have roughly similar amount of the structures to search against, and my test PC is probably an average computer.

19-08-2005 15:50:33

Could you tell me how do you use Tomcat ?

Are writing a servlet then ?

Did you reload the context of your application in Tomcat or did you perform any administrative operations on Tomcat between the searches ?

Best regards,

Szilard

19-08-2005 16:35:30

has already been set from the beginning. And the following are the pertinent logs:

19-08-2005 20:43:20

Most likely, I have been restarted Tomcat right before search #3. So, the re-caching happened in search #3 has nothing to worry about.

As for search #7 and search #10, I did not restart the Tomcat.

20-08-2005 12:28:02

Yes, it does.

If the content of the table has changed the structure cache must be updated accordingly, so we can get correct search results.

The problem in your case is the unusually slow update of the cache.

Typically the update time after the import of a small number of "unidentified structures" (see later, method 2 of UpdateHandler usage) should be around 7-8 times faster than the cache loading time. In your case it's even slower for an unknown reason.

I have performed tests in a similar scenario, and the updates were always much faster for me:

For 1 million structures the cache loading was around 150 seconds, while the cache update only took about 20 seconds.

Some supplementary information about UpdateHandler (not directly related to the slow cache update in your system):

You may argue, that the 20-second update time may seem long after the insert of only a handful of structures.

Currently there are 2 ways of inserting structures with UpdateHandler

(I must confess our documentation is scarce on this issue)

1. Inserting individual structures. (A typical scenario when the chemist draws a structure in a GUI and inserts it.)

In this case only a single structure is inserted by an UpdateHandler object with UpdateHandler.execute(true), and the UpdateHandler is closed without inserting any more structures.

In this case the execute(true) insures that we determine the cd_id of the inserted structure, and because we have only one structure we store the cd_id of the new structure in the property table.

(of course this single insert can be repeated many times)

When the next search begins, the cache can be updated rapidly, since we know which are the new structures

2. Several structures are inserted with one UpdateHandler (e.g. during the import of structure files). In this case we must get all the cd_id vales from Oracle, determine which ones are new. This can take some time (but still should be almost 10x faster than on your system).

Please confirm with tests if the insert of 4 (or less) structures mentioned by you is sufficient to produce such an unreasonably slow cache update.

If it is, please illustrate the API usage for the inserts with code snippets.

Please also tell me about your Oracle and JDBC driver version.

Best regards,

Szilard

22-08-2005 21:08:03

Hope the code snippets will be helpful,

Donald

23-08-2005 12:47:07

BTW why did you downgrade to 3.0.2 ?

3.0.2 is a pretty old version (Dec 7 2004), there have been several improvements and bugfixes since then.

Do you have some special reason for downgrading ?

23-08-2005 13:54:49

I was surprised too :) Actually it was on a test table, which I accessed in some unconventional ways (trying to investigate the problem), so the cause of that unnecessary cache reload might even be different.

(I could not investigate that case further, because afterwards it always worked OK)

I will let you know if I find out something.

Best regards,

Szilard

23-08-2005 14:06:38

Thanks for the understanding. Nobody else has reported this bug expect us does not necessarily mean that it is rare in general, because it is possible that we are the only ones who are seriously banging on JChemSearch.

Do you think the JDK version could play a role in this issue?

As Ben mentioned, please share with us your findings, no matter wheather are conclusive or not, via posting or via emailing.

Thanks,

Donald

23-08-2005 16:17:59

I don't think so :)

Maybe not a lot of other users use 3.0.2 though.

23-08-2005 17:22:05

I hope upgrading to 3.0.14 will magically burry the discussed issue.

I am not sure when we will do the upgrade.

Don

24-08-2005 15:09:59

Then I ran the script from command line, and got the following output:

24-08-2005 15:41:15

This is because you have called UpdateHandler.close() after your search.

This method stores information about the updates, so only searches after the close() call will notice that the cache has to be refreshed.

Best regards,

Szilard

24-08-2005 18:04:57

This one spend 6281ms to do the update(there was an insertion happen right before this search.).

24-08-2005 22:35:16

The recacheing was due to the restart of Tomcat, so no concern here. Then we inserted a structrue, and did another search immediately after the insertion.

25-08-2005 14:11:43

Even the log shows it is [Cache Update]? If the indication of the lengthy cache update is actually a cache reload, then the [Cache Update] in the log is a bit misleading, isn't it?

25-08-2005 15:44:48

Currently the cache uses a very compact storage method to minimize the memory footprint. This unique storage method doesn't allow to free up the space previously allocated for deleted or updated structures, so when they reach a certain percentage (after large number of deleted / updated structures), the cache reloads.

So far our customers like the low memory footprint, and the reload is not a problem, since modification / deletion of the structures are very rare.

25-08-2005 16:42:28

I think I have not made my question clear to you. What I was trying to ask is the following senario(in the order of time):

1). an insertion happens --> a update log has been put into the property table.

2). a seach is issued from T2 --> incurred the cache-updating on T2, and the removal of the "update log" from the property table.

3). a search is issued from T1 --> since it can not find any update logs in the property table, it will do a re-cacheing.

4). Here is my question --> , a search is issued from T2 again, will T2 engage a recaching or a cache-updating or neither?

25-08-2005 20:10:33

Neither, as the cache in T2 is already up-to-date.

26-08-2005 16:42:08

To tell the truth I was originally thinking about changing 1 constant to a more reasonable value.

Changing the API requires a new release by policy, so I think we should rather put this change into JChem 3.0.15.

(Changing previous releases can cause confusion anyway.)

We will release 3.0.15 early next week.

26-08-2005 17:29:27

We will be looking forward to 3.015 then. Thanks.

26-08-2005 18:17:58

Sorry, I forgot this part.

No, it has none of these so far.

The same improvements (1-3 in your list) will appear in JChem 3.1.1 as in JChem 3.0.15.

We also expect to release JChem 3.1.1 in two week's time or sooner.

Szilard

26-08-2005 19:19:47

Szilard, we will be expecting the 3.1.1 as well as 3.0.15.

thanks,

Don

31-08-2005 17:34:40

Included.

31-08-2005 17:45:00

Yes.

Szilard

05-10-2005 16:03:26

02-11-2005 08:15:45

1000