smilarity search calculation

User e34a92cce5

10-11-2010 17:47:30

Hi,


I have attached 2 structures and our chemist believes that they should be reported as 85% or more similar when running similarity search. In fact, Benchware Dataminer that uses Unity fingerprint does that. However, all tools in Chemaxon (tried jcsearch, compr, instant jchem, jchem web services) report that they are less than 20% similar. Any ideas on why this is happening..


Thanks

ChemAxon 42004978e8

15-11-2010 10:31:42

Hi,


The 39048.mol structure is aliphatic while the other one is aromatic using general aromatization method.


If the aliphatic one is changed regarding aromaticity (see the attached structure.), then a high similarity value is achieved. 


Using basic aromatization all the structures are considered aliphatic and the two original structures have a similarity of 78%. Which aromatization method are you using?


Bye,


Robert

User e34a92cce5

15-11-2010 18:07:34

I do not specify any aromatic definitions when using JChem (during import, structure searches etc). So, it is using whatever comes with it as the default. I am thinking that the default standardizer when using jcman to import structures into the database should have aromatized this structure. I know this has caused issues for us in the past when we upgraded JChem and that upgrade would change the default aromaticity. So, the question is how do I overcome this problem. Should I use standardize to specify an aromaticity definition? What are the drawbacks in doing that? Any pointers would be helpful.


Thanks

ChemAxon 42004978e8

16-11-2010 08:52:32

Hi,


 


If you are not specifying any special aromatization, then the default one is used which is general aromatization. This is generally suitable, however in your case different aromaticity is returned for you two structures. 


You can read about aromatization here:


http://www.chemaxon.com/jchem/marvin/help/sci/aromatization-doc.html


Using basic-aromatization the two structures have the same aromaticity and thus have higher similarity.


You (or your chemists) can read the documentation above which explains the differences. Although in this case basic aromatization yields a better result for you, I really suggest to read the docs to know which method is generally preferable for you.


Ways for specifying the aromatization method are listed here:


http://www.chemaxon.com/jchem/doc/user/query_standard.html#aromatization


Bye,


Robert

User e34a92cce5

16-11-2010 21:39:37

Thanks for your response, Robert. I think JChem changed their default aromatization from 'basic' to 'general' after the 3.2 release. So, I am assuming that since I currently work with v 5.3, all my structures (even those that had been in the compounds table prior to 3.2 release) are standardized to the general method. Am I right?


Now, the documentation also states that 


"All transformation methods work only in structures
which are in non-aromatic representation. If the molecules are in partially
aromatic form (containing any aromatic bond) the transformation method may
fail."


 Since we have structures from many different
vendors with different representations of aromaticity, I wonder if we first
need to run Standardizer with the de-aromatize option and then run it again
with one of the aromatization options to insure consistent representation of
aromatic rings. Please advise

ChemAxon a3d59b832c

17-11-2010 07:49:18

Hi Renju,


Yes, the general way to overcome this problem is to first dearomatize and then aromatize the molecules. If some of your sources may already be aromatized by an unknown aromaticity model, it may make sense to de-aromatize indeed.


But I think that the documentation is only referring to very complex cases where fused ring systems are partially aromatized.


 


You do not need to run Standardizer twice for this. It is possible to specify both actions in the same standardizer configuration. ("dearomatize..aromatize" using the short configuration.)


 


Furthermore, you can assign the standardization directly to the JChem table or JChem index.


See more details here:


https://www.chemaxon.com/jchem/doc/user/query_standard.html#standardizationDB


 


Another note is that descriptor tables provided by JChem Screen offer many different similarity calculations. However, they all start from the standardized form, so the desired aromatization method needs to be found first.


You can find more information about the other available similarity descriptors and metrics here:


https://www.chemaxon.com/products/screen/


 


Best regards,


Szabolcs

User e34a92cce5

18-11-2010 04:15:02

I ran the dearomatize..aromatize:b option with standardize as suggested and it looks like both the structures have been converted into their aliphatic forms. However, in their aliphatic forms the similarity of course does not match the 78%. How did you manage to convert the above structure (39048.mol) into the aromatic form? I have attached the structures after I ran the standardizer.

ChemAxon 42004978e8

19-11-2010 19:51:44

Hi,


 


The attached two structures yield me a similarity of ~78% (dissimilarity 21.68%) when using basic aromatization and 21% similarity (79% dissimilarity) when using general aromatization.


How did you calculate the similarity value? If you do the similarity searching in memory and the calculation performs standardization then you need to specify the aromatization method, otherwise the general aromatization is performed:


e.g. jcsearch:


jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv


shows 79% dissimilarity


jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv -S "aromatize:b"


shows 21%  dissimilarity.


Does this help you?


Bye,


Robert

ChemAxon efa1591b5a

21-11-2010 00:45:33

Hi, 


Do you know which similarity metric does Benchware Dataminer use? That can also make a difference. 


I checked the similarity score of these two using ECFP (soon to be released) and while the ChemAxon fingerprint resulted in 0.88 dissimilarity, ECFP showed 0.79 (so not much different). I used Tanimoto.


Regards


Miklos

User e34a92cce5

21-11-2010 00:56:57

Benchware uses UNITY fingerprints with Tanimoto comparison. So, in response to Robert's response: when you compare the aliphatic forms of 2 compounds you get approx 20% similarity and when you compare its aromatic forms you get 80% similarity. My question is why is that the case? Shouldn't it be  within 5-10% range?


Now, the 2 structures I supplied with the _STD in their names have been first dearomatised and then aromatised using the basic option. But if you look at the structure they both look to be in their aliphatic forms. So my next question is why did they not convert into their aromatic forms when I ran them through the standardizer?


Also, Robert specified the basic aromatize option when he ran jcsearch. Is there a way to specify that when using the web services. I use the beginsearch (https://www.chemaxon.com/webservices/soap/JChemSearchWS.html#beginsearch) SOAP service to run my searches. Can the 'vaguebond' option in beginsearch be used to specify the aromaticity type?


Thanks a lot for helping me get a handle on this.

ChemAxon 42004978e8

22-11-2010 08:19:56

Hi


High similarity is always achived if the rings in the two molecules have the same aromaticity.


Regarding second aromatization: The dearomatize-aromatize with basic returned a molecule with double/single bonds because using basic aromatization both molecules are aliphatic.


Bye,


Robert

User c1ce6b3d19

22-11-2010 10:22:44

Renju,


You can use the standardize web service to run standardization rules on a molecule before running the search.


https://www.chemaxon.com/webservices/soap/StandardizerWS.html#standardize


 


Jonathan Lee


 

User e34a92cce5

22-11-2010 14:56:26

Jonathan and Robert,


Thanks for responding. So, if I understand right the similarity between two structures that are in their aliphatic forms could be completely different and have no relationship to the simiarity when they are are in their aromatic forms. My understanding earlier was that if similarity value between 2 structures in their aliphatic form is X and between their aromatic form is Y, then X and Y should be withing the 10% of each other. It looks like that that is not the case.


Jonathan, how will your solution to standardize work? If you follow the thread here, I actually used standardize to dearomatize+aromatize the structures in question. That does not help me achieve the desired similarity. What does is the fact that what aromatization you specify when running the search. If you look at Robert's post earlier, he specifies the basic aromatization optin here:
jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv -S "aromatize:b" 


That's when the standardized structures achieve 78% similarity. So, I don't think I am looking for solution on how to standardise the structures before running the search. And anyways, if I had to do that I would obviously run standardizer on my compounds table rather than running it at search time. My question is if the vaguebond option is analogous to  the -S "aromatize:b" option in jcsearch.


Thanks again..

ChemAxon 25dcd765a3

22-11-2010 16:17:17

The discussion has passed already the important question:


"Are 36440.mol and 39048.mol aromatic?"


No they are not aromatic.


But ChemAxon has two aromatization method: basic and general (both has advantages and disadvantages). The basic aromatization method correctly leaves the two molecules unchanged. The general method converts the rings in 36440.mol to aromatic form (which is not correct). What happens in this case is that the two ring cannot be aromatic alone, but the ring system can be converted to aromatic form according to the general aromatization (see attached picture).


I hope this help


Andras

User e34a92cce5

22-11-2010 20:49:56

Thanks for responding. I think I am now confused on your suggestion to go back to general aromatization. So, bottom line - what must I do to get high similarity between these two structures. I used the JChem Web Services to run all my structure searches. From your suggestion, it is clear that the standardizer is not going to help me. If I standardize using basic aromatization, it converts them both to aliphatic and that does not yield good similarity. If I do general, it considers 36440 aromatic and 39048 aliphatic, which still does not solve the problem.

ChemAxon a3d59b832c

23-11-2010 08:03:52

Hi Renju,


Volfi did not suggest to go back to general aromatization, he only explained why this method works differently.


From the above discussion, it seems that you need to stick with basic aromaticity, and maybe explore other descriptors and / or similarity metrics. (See more information here: https://www.chemaxon.com/products/screen/ )


 


Regarding your question about vague bond options: Vague bond option is only available for substructure, full structure, full fragment and superstructure searches. It is not working in case of similarity search.

ChemAxon 42004978e8

24-11-2010 11:58:35

Hi,


 


Some points are not clear yet.


In case of general aromatization the two structures are aromatized differently and therefore have small similarity. This is clear for both of us.


In order to avoid this the two structures were dearomatized and aromatized using basic method again leading to the "_STD" structures. These are aliphatic because basic aromatization doesn't turn them to aromatic.


What's unclear that's the fact that you received small similarity value for these. There should be a similarity above 70%.


Please check:


jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv -S "aromatize:b"


Please note that this command will print out dissimilarity as default, which is around 21% in this case, meaning 78% similarity.(jchem 5.3.8) Do you obtain these values?


If you use a tool for determining the similarity between the two "_STD" structures, that performs standardization then you have to ensure that this is the basic aromatization otherwise they are aromatized again with general aromatization. This will result in low similarity again (as explained earlier). e.g.:


jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv


which yields 79% dissimilarity, meaning 21% similarity.


 


Hence using the basic aromatized versions for similarity search in a way that a general aromatization isn't carried out again leads to high similarity values as your chemists desire.


Please consider the differences between general and basic aromatization, it may be true that in this special case basic aromatization is more suitable but otherwise general aromatization is better.


 


Bye,


Robert

User e34a92cce5

24-11-2010 17:54:27

Hi Robert,


Thanks for thinking ahead and clarifying the issue that was looming in my mind. So, clearly running the standardizer to deromatise and aromatize does not help me achieve the high similarity, because jcsearch uses general aromatization when doing the comparison. What does, is the fact that you specify the basic aromatization WHEN RUNNING THE SEARCH. Now, I see that even I get a high similarity when doing that. I use 5.3.1 and even I get 78% similarity when running :


jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f mrv -S "aromatize:b"


So now, my only question is how to do this using Web Services. Jonathan Lee, from Web services suggested running the standardizer. As you clearly noted above, running the standardizer before running the search does not help us achieve this high similarity. We need a way within Web Services search program to specify the basic aromatization option AT RUN-TIME to get this result. Is there a way to do this? I asked about the vaguebond option since this is a run-time option within JChemSearch; but unfortunately that does not work with similarity searching as Szabolcs noted above. So, is there a way at all to get this result using Web Services? The software program that I use is entirely web-based and I don't like running command-based programs on the server to get this result.


Renju



User c1ce6b3d19

25-11-2010 15:57:20

Renju,


If you want to use the JChem Search Web Service, you must create a database table and import the molecules to search through.  Then include the query moleculte to the JChem Search Web Service. 


Upon creating the database table (using JChem Manager, for example) you should select a standardization configuration (e.g. aromatize:b).  When using the JChem Search Web Service, the query molecule will be standardized using the configuration before being compared with the targets in the table (which are standardized upon import).  During the comparison, if both query and target are aliphatic, then it will achieve a high similarity score. 


Please remember to be careful about the standardizations that your molecules will go through.  A molecule that has general aromatization and undergoes a basic aromatization will not be similar to a dearomatized molecule that undergoes a basic aromatization.  This is because the first molecule already has a aromatization when trying to aromatize again.  So as a precaution, it might be beneficial to include a dearomatization and a basic aromatization in the standardization configuration you include upon creation of the database table.


Jon

User e34a92cce5

10-12-2010 19:26:54

Hi Jon,


To test your suggestion, I created a table within Instant JChem with the dearomatize + aromatize:b option in the standardizer. I then imported the two structures in question and was unable to achieve the high similarity. If you refer to this thread, you can see that in their aliphatic forms the two structures do not report high similarity. That's why Robert had to to specify basic aromatization during the search even after these structures ran through the custom standardizer with the dearom + arom:b option. For the structures to achieve high similarity, they need to be searched in this way:


jcsearch -t:i:0.9 -q 36440_STD.sdf  39048_STD.sdf -f
mrv -S "aromatize:b"


And that's what my question is. Is there a way to do this within web services. Running them through a standardizer during import is not helping, because as I said earlier that only helps to convert them into their aliphatic forms, which does not give high similarity..


Thanks for your help.

ChemAxon 42004978e8

13-12-2010 19:53:12

Hi,


 


The search on a table created with dearomatization-basic aromatization option should work.


To test this try:


jcman c simtest --stconfig dearomArom.xml   (attached here)


jcman a simtest 36440.mol    (your structure attached earlier in this topic)


jcman a simtest 39048.mol


jcsearch -q 36440.mol DB:simtest -t:i:0.9 -f mrv


This last command should dump an mrv with the two structures one with zero the other with 21.6% dissimilarity, which means a high similarity. During searching on a DB table you don't need to specify standardization configuration, because it's the st. configuration of the DB table that will be used.  Entering these commands do you obtain the same results?


We are still invetingating why instant JChem doesn't yield the desired results.


Robert

User e34a92cce5

17-01-2011 20:49:36

Yes, I ran this using JChem Base UI. I am able to get the high similarity. Thanks for your patience. So, I guess Instant JChem does not use the custom aromatization settings when running the search..

ChemAxon 8407015329

18-01-2011 14:13:39

Hi All,


The problem has been found. It is not in the search mechanism, but rather in the tool that displays the results.


This tool is recalculating the similarity value, but it uses different fingerprint and standardization settings. Fix will be available in version 5.5


 


Kind regards,


Vencel