Duplication filter and the custom standardization problem

User e911fe03b4

26-07-2011 08:35:34

Hi,


I have updated and recalculated JChem table with custom standardization options (e.g. 2D clean) using Jchem 5.5.0 and the latest (5.5.1) version. I found that the structure duplication filter was not working. The process inserted a second molecule, thus causing a duplication in the molecule table. The two structures differ in the positions of the double bonds in the benzene ring, please see attached files(I attached two examples).
If I recalculate Jchem table with default standardization the duplication filter worked well.


Has anyone experienced anything similar?


Best wishes,


Norbi

ChemAxon a3d59b832c

26-07-2011 15:10:14

Hi Norbi,


Most probably your Standardizer configuration does not contain aromatization. Please note that it is required both for proper searching and duplication control. See:


http://www.chemaxon.com/jchem/doc/user/query_standard.html


Special care has to be taken when you are assembling a custom standardization
configuration. In this case the "aromatize" action should be present in the
configuration, and it is safest to put it first.

 


Furthermore, the main purpose of standardization in the database is to assist searching. In this regard, cleaning is not necessary for a standardization in the database table. Furthermore, it also does not any effect in the end, because the standardized structures are stored in cxsmiles format that does not contain coordinates. So I recommend to remove cleaning from the standardization configuration.


 


Best regards,


Szabolcs

User e911fe03b4

27-07-2011 11:03:12

Hi Szabolcs,


Thanks for your reply.


Actually I didn't want to clean in the database table, this was just an example. In fact I need to standardize the some structures which are stored in different forms in the database. I applied the molecule transform functions (transform nitro , sulfon, diazide group, etc.) .You wrote that  the aromatization is required for the proper searching and duplication control and the problem that I wouldn't like to aromatize the compounds.
Interestingly if my standardizer configuration contains aromatization and dearomatization as well, the duplication filter is working after the recalulation. Does that make sense?


Best regards,


Norbi

ChemAxon a3d59b832c

27-07-2011 11:37:22

Hi Norbi,


You wrote that  the aromatization is required for
the proper searching and duplication control and the problem that I
wouldn't like to aromatize the compounds.


The standardization only affects the internal (searching) representation that is stored in the cd_smiles column.


You can still use the original (probably non-aromatized) compound from the cd_structure column for visualization, etc. Export functions also use cd_structure.


If aromatization is not included, then aromatic bonds will not give any hit and also different resonant forms (e.g. of multisubstituted benzene) will not be recognized as equivalents.


 


Interestingly if my standardizer 
configuration contains aromatization and dearomatization as well, the
duplication filter is working after the recalulation. Does that make
sense?

No, it is still not a good practice.


By chance it is possible that dearomatization gives similar resonance forms for the two different resonant structures. However, it is not guaranteed. It may depend on the ordering of atoms in the molfile, and other things. For this reason with such a standardization configuration it is not even guaranteed that the same Kekulé resonance form is found as duplicate of a re-ordered version, even if the drawing looks the same!


(When there is no aromatization and dearomatization at all, then it is at least guaranteed that the same resonant form is recognized.)


Best regards,


Szabolcs

User e911fe03b4

27-07-2011 12:06:18










 

No, it is still not a good practice.

I agree with you. :-)


By chance it is possible that dearomatization gives similar resonance forms for the two different resonant structures. However, it is not guaranteed. It may depend on the ordering of atoms in the molfile, and other things. For this reason with such a standardization configuration it is not even guaranteed that the same Kekulé resonance form is found as duplicate of a re-ordered version, even if the drawing looks the same!

 


(When there is no aromatization and dearomatization at all, then it is at least guaranteed that the same resonant form is recognized.)


Ok, I understand and accept your justification absolutely.


I really appreciate your help.



Best regards,


Norbi