problems with substructure search

User 70c125d390

17-06-2009 07:56:28

 



We encountered a problem when doing a substructure search in IJC on a mysql database with 2000 compounds. When trying to search for all compounds with a certain scaffold none of the compounds was returned as hit, while there are  quite a few examples in the database.


To figure out what is happening I tried different more simple queries and I encountered some rather strange behavior. Doing a search with 'O=S=O' returns 412 hits but NS(=O)=O returns 0 hits while more 90% of the 412 compounds contain this motif. Further exploration reveals that even simple things as S-N or S-C do not return any hits. Strangely enough, C-N returns 1 hit in which also the other patterns exist. 


 


I have already regenerated the structure table using JChemManager which ran without any problem.


 


The system settings are:


Instant JChem 2.5.1; Java HotSpot(TM) Client VM 1.5.0_16-133; Mac OS X version 10.5.7 running on i386; JChem 5.2.2 


Many thanks for your time and help.


Gert Thijs



 

ChemAxon a3d59b832c

17-06-2009 08:51:40

Hi Gert,


Do the missed motifs relate to aromatic rings?


If yes, in that case the aromatic or single/aromatic bond types could be used in queries.


See: http://www.chemaxon.com/jchem/doc/user/query_features.html#genbond


and: http://www.chemaxon.com/jchem/doc/user/query_standard.html


See also the vague bond search options: http://www.chemaxon.com/jchem/doc/user/query_searchoptions.html#vaguebond


 


Let us know if these help.



Best regards,


Szabolcs

User 70c125d390

17-06-2009 10:23:03

I do not think it is related to aromaticity since even 'c1ccccc1' and 'C1=CC=CC=C1' as query do not return any result.

ChemAxon a3d59b832c

17-06-2009 10:31:41

OK. Do you have a custom standardizer configuration on the table?

User 70c125d390

17-06-2009 11:28:19

We do indeed use a custom standardizer configuration. I have attached the standardizer file. 


 

User 70c125d390

17-06-2009 14:33:27

I have done some further testing on this matter.


I have exported the structure table into an SD file using the standardized compounds. Then I created a new table within the same project by importing the SD file. When I now run the substructure search with the same scaffold as before it returns all 125 compounds containing the substructure.


So, my guess is that somehow the original data table seems to be corrupted. Is there a way to check this and fix it?

ChemAxon a3d59b832c

17-06-2009 15:43:41

I think that this action is causing the trouble:


<AddExplicitH ID="AddExplicitH" />


 


Explanation: It is also executed for the query molecules, and it affects searching behaviour.


(See: http://www.chemaxon.com/jchem/doc/user/query_features.html#explH )


 


I suggest to remove it from the configuration.


 


Furthermore, the clean2D action has no effect at all (other than slowing down import), since we store standardized structures in cxsmiles format, without coordinates.


I suggest to remove that action, too.


 


Best regards,


Szabolcs

User 70c125d390

18-06-2009 08:26:57

I solved the issue. With JChemManager I have reset the standardization to the default one and regenerated the table. After that, we were able to perform the substructure searches.


Tnx,


Gert


 

ChemAxon a3d59b832c

18-06-2009 09:32:56

Good. I am glad it is working now.


 


Probably we can improve the system by ignoring H addition action on the query side.  (H removal is already ignored there.)


 



Best regards,


Szabolcs