Does HSQLDB support the binary AND operator?

User f5e6ccf034

17-12-2005 00:20:09

I can't find out. It would be useful if you would collect the answer for all supported dbs in a table and put it in your FAQ or manual.





Thanks,





-- O.L.

ChemAxon 9c0afc9aaf

17-12-2005 19:21:03

Hi,





Yes, HSQLDB has a function called BITAND:





http://hsqldb.sourceforge.net/doc/guide/ch09.html#N1224D





Similar functions are only utilized in the screening phase when the structure caching is turned off.


We do not recommend running searches without structure caching in general.





This is especially true for HSQLDB, where a severe shortcoming of the database system results in unreasonable slowdown in the graph search phase.





For example the following returns instantly:


Code:



select cd_id from mytable where cd_id=1;






This one however can take ages on a big table (it seems the index of the primary key is not used in this case by HSQLDB):


Code:



select cd_id from mytable where cd_id in (1);






We fetch the source of the screened structures from the database in batches, also using a "where cd_id in (...)" clause.


We could use prepared statements as a workaround just for this 1 RDBMS type, but I'm not sure that will happen: we are planning to abolish the non-cached search mode anyway.





Best regards,





Szilard

ChemAxon fa971619eb

17-12-2005 19:56:52

You might want to look at HyperXtremeSQL™ (http://www.hxsql.com/). This version uses a different indexing architeture, and fixes many of the problems that HSQL has with operations that logically should use an index, but in practice does not.





For most simple purposes this makes no diference at all, but for some types of operations that can be improved by smart use of indexes the performance improvement is vast. For instance I used a SELECT DISTINCT ... type of statement on an indexed column that should be able to just look up the values from the index. In hxsql the result was instananeous, but with hsqldb it did a full table scan, which for a large table took several seconds.





HyperXtremeSQL™ is the commercial version, but the cost is minimal. In my view the minimal cost is well worth paying.


However I'm not sure where ChemAxon stand on support for this variant. the API is identical, but AFAIK only the HSQQLB variant is formally supported.





Tim





p.s. I'm not connected with HyperXtremeSQL™ in any way!

User f5e6ccf034

18-12-2005 03:49:37

Yes, we were planning to use hxsql but I don't know if it speeds up the particular query Szilard mentioned; I will check. In any case, I just switched to a pre-release of hxsql that is in synch with hsqldb 1.8 and I got "Unknown or unsupported database: HXSQL DATABASE ENGINE". Since the API is supposedly the same, I hope ChemAxon can release a minor version of JChem that removes this restriction: that should be completely trivial.





On the other hand I'm a bit worried about cached mode causing the app to have an unreasonable memory footprint... Furthermore, in cached mode, what does the db bring to the table? Why not simply pull the data from flat files?





Regards,





-- O.L.

ChemAxon 9c0afc9aaf

19-12-2005 20:07:12

Quote:
Since the API is supposedly the same, I hope ChemAxon can release a minor version of JChem that removes this restriction: that should be completely trivial.
As a temporary solution we can treat HXSQL as HSQLDB, we will make that change for the next JChem release.


On the long run however we will probably have to distinguish the two engines in some parts of the code for optimal performance.


(e.g. workarounds for poor performance or other problems)
Quote:
On the other hand I'm a bit worried about cached mode causing the app to have an unreasonable memory footprint...
JChem uses a highly optimized method of caching data that is relevant for the search process.


It has a much smaller memory footprint than caching the same data in the database.


For typical drug-like structures less than 100MB is needed for 1 million compound.


Is that acceptable for you ?


We believe that this caching is necessary for acceptable performance on large tables, and for small tables the memory footprint is also small.
Quote:



Furthermore, in cached mode, what does the db bring to the table? Why not simply pull the data from flat files?
Usually JChem is only a part of a company database, where a lot of other data is stored. We only cache data that is relevant for substructure search (the structure as compressed cxsmiles and the fingerprints). Also the DB means that a central repository, provides means of authentication, transaction management, etc.





Regards,





Szilard

User f5e6ccf034

19-12-2005 20:56:47

Quote:
As a temporary solution we can treat HXSQL as HSQLDB, we will make that change for the next JChem release.


Thanks. That is very appreciated. How will I know that it is out? I still don't get email whenever a new version is released.
Quote:
JChem uses a highly optimized method of caching data that is relevant for the search process. It has a much smaller memory footprint than caching the same data in the database.


Ths issue is specific to HSQLDB/HXSQL. I don't know what the people who requested HSQL support before me use it for but the way I see it, HSQL is only interesting because of its in-process modes, which allow you to offer a db-less experience to the user (although sadly not to the developer): in server mode there are so many more performant and fully-featured options, why bother? Now, in in-process mode one already has a full copy of the data in one's process and then you want to put in a second one in there. Unless that second copy is a really small fraction of the first one, the waste is problematic: speed is in the eye of the beholder but, even allowing for virtual memory, memory is much more of a hard limit and most user PCs have as little as 256M of it.
Quote:
Usually JChem is only a part of a company database, where a lot of other data is stored.


Yes but in my scenario (embedded db) none of that plays any role and if JChem could populate its cache from, say, one big cssdf file, that would really have enormously simplified the development of this particular application. Maybe a feature for a future version?





Best regards,





-- O.L.

ChemAxon 9c0afc9aaf

20-12-2005 10:06:15

Quote:
How will I know that it is out? I still don't get email whenever a new version is released.
You can get notification about major JChem versions by subscribing to the following topic:





http://www.chemaxon.com/forum/ftopic287.html





Unfortunately this only applies to major versions (e.g. 3.0 or 3.1).


I will discuss this with my colleagues if it's possible to use this for every version.


Until then I can promise you that I will notify in e-mail you when our next version (3.1.5) comes out.
Quote:
Now, in in-process mode one already has a full copy of the data in one's process
This is only true for MEMORY and TEMP tables, but not for CACHED tables.


Please see the following link:





http://hsqldb.sourceforge.net/doc/guide/ch01.html#N1023C





It states:
Quote:
CACHED tables are created with the CREATE CACHED TABLE command. Only part of their data or indexes is held in memory, allowing large tables that would otherwise take up to several hundred megabytes of memory.
You can also influence the size of the cache with the hsqldb.cache_scale and hsqldb.cache_size_scale properties.


(By default it is around 50MB)


See this page for further information:


http://hsqldb.sourceforge.net/doc/guide/ch04.html
width="90%" cellspacing="0" cellpadding="3" border="0" align="center"> Quote:


Yes but in my scenario (embedded db) none of that plays any role and if JChem could populate its cache from, say, one big cssdf file, that would really have enormously simplified the development of this particular application. Maybe a feature for a future version? It's very unlikely.


This would require rewriting most of JChem's code (huge work), and it is a very special request (nobody else has asked for it).


Embedded HSQLDB / HXSQL should work just fine for this purpose.





Best regards,





Szilard

User f5e6ccf034

20-12-2005 13:22:45

Re. CACHED spec, I see now. Somehow I had gotten the idea that CACHED simply meant persistent and that all the data would eventualy end up in memory as you read it. So when using the cached feature of JChem, one should set the HSQLDB cache size to a very small value, right? Do you have a rule of thumb for what's the minimum cache size that will work?





Regards,





-- O.L.

ChemAxon 9c0afc9aaf

20-12-2005 14:56:33

Quote:
So when using the cached feature of JChem, one should set the HSQLDB cache size to a very small value, right? Do you have a rule of thumb for what's the minimum cache size that will work?
I'm afraid there is no general advice for this setting.


In cached mode JChem is not really affected by database performance.


I suggest to try with a very low limit (e.g. 8 MB), and increase if necessary.





Szilard