Jserver threads used when creating an index

User 9f6f294e9f

17-02-2010 10:37:40

Hi Peter

I hit a problem today when I created several Jchem indexes in different schemas and then hit 'maximum sessions exceeded' for the database in question. On investigating it appears that the jserver is taking out one session per cpu during the index creation, and not then closing the sessions once the index is built.

This particular database is located on a 32 cpu linux server.

I can't find anything in the doco. that tells me how to limit threads used for an index build, but I'm sure that there must be a way. Please will you advise me.

Other than by bouncing the Jserver, is there any way to terminate the Jserver connections on completion of an index build ?

Regards

Ant

ChemAxon aa7c50abf8

17-02-2010 15:35:54

Hi Ant,

One thing which might be obvious, yet worth mentioning: the database sessions remaining open in JChem Server are held in a pool and will be reused, once required again -- eventually by the same number of concurrent operations.

The indexingThreadsPerCall property can be set in the jcart.properties file to limit the number of threads used for any given indexing operation. Reducing the number of indexing threads will reduce the performance of indexing roughly linearly.

We currently provide no way of closing the connections open in JChem Server, but it wouldn't be difficult to provide one.

Apart from that, we potentially have other (better) options to deal with this problem:

We could provide an option for the indexing threads to share one single database session per indexing operation. This would have much less negative impact on indexing performance than limiting the indexing threads themselves. The exact impact still needs to be investigated.

A connection pool other than the default (Oracle DataSource) could be used, such as Apache's DBCP which accepts fine-tuning options like the maximum number of idle database connections in the pool. This could result in degradation in responsiveness under circumstances, as the whole point of keeping open database connections in a pool is to minimize response time when they are needed again. Also, this alone wouldn't prevent many connections from being opened again, when they are eventually required by (many) concurrent operations.

The question begging for being asked: Are you unhappy with setting the number of maximum allowed open Oracle sessions to a higher value? And if so, why? I am asking this to better understand your problem and be able to more judiciously assign priorities.

Thanks

Peter

ChemAxon aa7c50abf8

18-02-2010 11:20:09

One refinement on my previous post.

Reducing the number of indexing threads will reduce the performance of indexing roughly linearly.

The above statement is not true in its general form, if multiple indexing operations are executed concurrently. If, for example, you start two indexing operations on a 32 core machine, you can reduce the number of threads per operation to 16 without significantly increasing the time it takes for the two indexing to complete.

Arguably, to better support such scenarios, another option (the fifth) could be to allow setting the number of threads as an indexing parameter.

Peter

User 9f6f294e9f

18-02-2010 14:23:50

Hi Peter

Thanks for your responses. I like your idea of an option for the indexing threads to share one single database session per indexing operation - if you're able to take that idea forward I think it would be a valuable enhancement.

The reason that we aren't keen to just raise the maximum number of sessions is that each connected session consumes resource in terms of pga memory and also imposes a small overhead as oracle polls a session periodically to check that it is still alive.

As you say connection pools do have their uses, but I often see pools that are far too large - many of the connections being hardly used at all. Of course in practice many pools drop unused connections fairly quickly. although not in this case.

My ideal for the Jchem index creations would be for the Jserver to take out new connections (limited by indexingThreadsPerCall) and to use these connections solely for the requisite index creation, terminating them once the work is complete. Having said that, if the single database session approach is likely to perform almost as well, this is probably the path to follow.

Regards

Ant

User 9f6f294e9f

18-02-2010 16:58:45

Hi again

You may be interested in knowing what the impact on our 32 cpu linux system of setting indexingThreadsPerCall to 4 was. To date I've dropped and created two indexes.

For the first (86,000) rows creation time rose from 1 min 43 secs to 2 mins 02 secs

For the second (7 million rows) creation time rose from 29 minutes to 1 hr 43 minutes

The key point is that the performance degradation isn't anything like as linear as might have been expected. Good news really - for me 1 hr 43 mins on a 7 million row table is still acceptable.

Regards

Ant

ChemAxon aa7c50abf8

18-02-2010 17:11:12

Thank you, Ant, for this info. It is very interesting indeed. There must be a factor (at least one) I didn't take into account when predicting linear increase. I'll try to find it out.

Regards,

Peter

ChemAxon aa7c50abf8

22-02-2010 10:14:57

Hi Ant,

The following index parameters will added in JChem 5.3.1:

threadCnt=<number-of-indexing-threads> to specify the number of threads used for indexing. This setting overrides the value of the indexingThreadsPerCall property in the jchem/cartridge/conf/jcart.properties file. If neither is set, as many indexing threads will be used as the number of cores available on the JChem Server host.

insSessCnt=1 to specify that only one database session (connection) is used to insert the generated index data into the database. If this parameter is not specified or is set to 0 (zero is the default value), each indexing thread will use a separate database session to insert the index data they generate.

Starting with JChem 5.3.1, you'll also have the ability to set Oracle cache properties in JChem Server:

If JChem Server is using the default data source implementation (`chemaxon.jchem.cartridge.util.cpool.Oracle10gDataSource` with 10g or 11g JDBC drivers), the connection cache properties can be specified in the `jchem/jchemsite/cartridge/conf/jcart.properties` file, by prefixing their names with the `oracle.connection.cache.` character string. For example, in order to close idle connections after 10 seconds you have to specify in the`jchem/jchemsite/cartridge/conf/jcart.properties` file, the following:

    oracle.connection.cache.InactivityTimeout = 10

    oracle.connection.cache.PropertyCheckInterval = 10

(The name of the PropertyCheckInterval property is somewhat misleading: in addition to the interval between checks on property values, it also specifies the interval for enforcing the cache properties' settings. Its default value is 900 seconds, so it should be adjusted for any Oracle-connection-cache-related time-out settings specifying a time-out less then 900 seconds.)

Regarding indexing speeds:

In my test environment (4 indexing threads on a quad-core Phenom with the Oracle data files stored on a single consumer-grade SATA hard disk), using the insSessCnt=1 index parameter actually made the indexing about 5-6% faster than it is with default settings. I expect the mileage to vary depending on (a) the speed of the storage subsystem and (b) on the number of indexing threads the single inserter database session has to serve. I assume that for any given hardware configuration with a sufficiently large number of processing cores, there will be a threshold of indexing threads above which the storage part of the indexing will be a bottle-neck. I am unsure, though, of whether increasing the number of inserter database sessions will make the storage part less or more of a bottle-neck. Namely, in the case of multiple concurrent inserter database sessions, the Oracle server has to synchronize the inserts between the sessions -- or (if all indexes on the index table will be created after the inserts have completed) at least maintain the private versions of each session as part of multi-versioning. (This is in addition to physical disk write latency.)

It is not impossible that it is the decreasing need for database-related concurrency handling which makes the slow-down less than linear with decreasing numbers of the indexing threads (when all threads use their own database sessions for inserts). I have measured indexing with 4 threads and 1 thread (and otherwise default settings) and I found that slow-down is indeed less than linear (even though the deviation from linear is much less pronounced than in your test):

4 index threads: 00:29:18.85

1 indexing thread: 00:01:48.55

Regards,

Peter