Well, it all started with what should have been a harmless demo for chemoinformatics students: similarity search using hashed fingerprint - default setup, no tinkering at all - in InstantJChem
I have to admit I never was curious to run a similarity searches in InstantJChem on my own ... but, on the ocasion, I upgraded to v 6.1
Now, the query is the top molecule in SCREEN1.png screenshot table. I asked for a Tanimoto search at 0.3 of threshold. Now, that the molecule identical to the query comes up at score 1.0 as show - great, would not have expected any better. But... then, there are many other hits showing up at 1.0 as well... and they're not identical, not to mention similar (see SCREEN2.png). Actually, if I raise my similarity threshold to 0.5, I only get the self-hit.
Therefore, I suppose the calculation behind the scenes is correct, but the display of the score in the table is weird Actually, Alexandre Varnek also tested this - he gets sometimes "hits" labelled at score 0.341... whilst his cutoff was at 0.5 (!?)
By the way - is there a possibility to capture that wild score in the table, and sort with respect to it?
Seems to work fine for me on Windows with local Derby databases...
I tried to reproduce the issue, but it was working ok for me. Did you use Derby local DB or Oracle/MySQL database?
Regarding the sorting of the hits, the query result is automatically sorted according to the score. I will ask our developers about a possibility to get the values for you (probably groovy script?).
All the best
Huh - this is positively weird! I tried again.. same project, same query - and now, indeed, it works properly with all metrics... but NOT with TVERSKY, at default 0.5:0.5 - should be same as Dice, but returns 1.000 for the whole database instead!
Yesterday I fumbled around with Tversky as well, except that, when I came back to Tanimoto, it kept behaving weirdly... maybe the devil's in Tversky, after all!.
As for the data base, it's Derby indeed (default windows 8 - I never touched anything in there)
Your right. TVERSKY gives false results. We'll look into fixing this.
Sorry for late response & thanks a lot for your report.
The problem is in the "Screening options: " instead of "0.5;0.5" it should have syntax like "0,5;0,5".
We have improved a bit the dialog of similarity search options in future IJC 6.3. But one has to carefully watch if using dot or comma as decimal separator. Basically if you see dot there, use the dot and other way around.
To set up locale you wish, you can just put this startup option "--locale cs" (you would probably use fr instead of cs ;) to instantjchem.conf file (or JWS or simply start with it from command line). If you have non english locale you probably use comma as decimal separator in that dialog. Note that setting different locale affects other part of application also. E.g. importing will also use "locale" decimal separator when importing decimal numbers; grid view will use it as well
So better not touch it if all works..