Obscure substructure search error

User 21b7e0228c

16-10-2015 09:46:14

I was not expecting this, since I had not made any changes on this old piece of soft we used to query our JChem/MySQL database... the only thing that changed is database size: now we have some 10M compounds in the struct_stereo table.


The joined sssDB.java accesses a previously created temporary MySQL table harboring a list of queryable cd_id values, for the research is triggered on a (small) subset of struct_stereo. Therefore, we first pass the therein listed cd_id series as an array of integer to setFilterIDList, as needed. However, the search machinery insists on cannibalizing a lot of memory (6G - the Xmx and Xms were set a posteriori, but they don't make any difference) even though the searchable subset includes some meager 841 candidates, not the entire 10M! Then, I get that bizzare error,(joined in sss.err)  never seen by Google search... for a CD_ID value NOT in the list of 841, but well present in struct_stereo, where it matches a harmless polyaromatic dicarboxylic acid!


Why "10" should be an unexpected value for a byte (??) and what the SMILES compressor had to do with all this I cannot tell. Java config is also listed in sss.err, while jchem 6.1.7 is the incriminated release!


 


Cheers,


Dragos

ChemAxon 9991eff751

16-10-2015 14:37:04

Hi,


Before you had your problem it have done something like this:


when you start a search against your table, jchem loads the structure index for that table into memory (this eats up your memory), executes the search (Jchem prepares for many consecutive searches, that's why it loads the cache into live memory).


The odd behaviour you are experiencing could be because of some kind of bug, or corrupted data. Can you check the content of your DB with the following command:



select cd_smiles,cd_smarts,cd_flags,cd_structure from struct_stereo where CD_ID=10880395;



You were curious about how can a SmilesCompressor come into play: it encodes the index data with a loose-less encoder to save up memory space.


I would recommend to try it with a more recent version of jchem. If i recall correctly there were improvements in memory consumption around 6.2/6.3. and we fixed a lot of bugs ever since.


I have checked your program whether it is API compatible with the latest version, just a minor change would be needed:


      JChemSearchOptions searchOptions = new JChemSearchOptions(searchtype);


however, i haven't tried it.


note: I've run thru the code, and I must say: I like your error messages ;)


regards.kirk