Help! Someone's mad - either Tanimoto or myself...

User 21b7e0228c

15-10-2013 12:55:18

Well, it all started with what should have been a harmless demo for chemoinformatics students: similarity search using hashed fingerprint - default setup, no tinkering at all - in InstantJChem


I have to admit I never was curious to run a similarity searches in InstantJChem on my own ... but, on the ocasion, I upgraded to v 6.1


Now, the query is the top molecule in SCREEN1.png screenshot table. I asked for a Tanimoto search at 0.3 of threshold. Now, that the molecule identical to the query comes up at score 1.0 as show - great, would not have expected any better. But... then, there are many other hits showing up at 1.0 as well... and they're not identical, not to mention similar (see SCREEN2.png). Actually, if I raise my similarity threshold to 0.5, I only get the self-hit.


Therefore, I suppose the calculation behind the scenes is correct, but the display of the score in the table is weird  Actually, Alexandre Varnek also tested this - he gets sometimes "hits" labelled at score 0.341... whilst his cutoff was at 0.5 (!?)


 


By the way - is there a possibility to capture that wild score in the table, and sort with respect to it?


Cheers!


Dragos

User 8a7878ec6d

16-10-2013 12:25:56

Seems to work fine for me on Windows with local Derby databases...


Evert

ChemAxon 26d92e5dcd

16-10-2013 15:24:48

Dear Dragos,


I tried to reproduce the issue, but it was working ok for me.  Did you use Derby local DB or Oracle/MySQL database?


Regarding the sorting of the hits, the query result is automatically sorted according to the score. I will ask our developers about a possibility to get the values for you (probably groovy script?).


All the best


 


David

User 21b7e0228c

16-10-2013 17:54:03

Huh - this is positively weird! I tried again.. same project, same query - and now, indeed, it works properly with all metrics... but NOT with TVERSKY, at default 0.5:0.5 - should be same as Dice, but returns 1.000 for the whole database instead!


Yesterday I fumbled around with Tversky as well, except that, when I came back to Tanimoto, it kept behaving weirdly... maybe the devil's in Tversky, after all!.


As for the data base, it's Derby indeed (default windows 8 - I never touched anything in there)

ChemAxon 2bdd02d1e5

05-12-2013 09:48:38

Dragos,


Your right. TVERSKY gives false results. We'll look into fixing this.


Sorry for late response & thanks a lot for your report.


Filip

ChemAxon 2bdd02d1e5

05-12-2013 15:07:44

The problem is in the "Screening options: " instead of "0.5;0.5" it should have syntax like "0,5;0,5".

ChemAxon 2bdd02d1e5

06-03-2014 15:08:57

We have improved a bit the dialog of similarity search options in future IJC 6.3. But one has to carefully watch if using dot or comma as decimal separator. Basically if you see dot there, use the dot and other way around.


To set up locale you wish, you can just put this startup option "--locale cs" (you would probably use fr instead of cs ;) to instantjchem.conf file (or JWS or simply start with it from command line). If you have non english locale you probably use comma as decimal separator in that dialog. Note that setting different locale affects other part of application also. E.g. importing will also use "locale" decimal separator when importing decimal numbers; grid view will use it as well
So better not touch it if all works..  


Regards...