Tip: aromatic compounds, Hueckel, aromaticity during import

User 677b9c22ff

04-07-2007 08:48:58

Hi,


just some comments for people who import data from different molecular databases and different vendor sources like Daylight or MDL ISIS or PubChem or ZINC or Accelrys or Tripos or or or. [deprecated -see my comments below]





I just saw the very interesting PPT that MDL and JCHEM handle aromatic systems in a different way. I did not know that. Basically it says that


* MDL® treats 5-membered heterocycles as non-aromatic (Kekulé structure)


* ChemAxon aromatizes any 4n+2 ring system





So it is a good practice (GLP) during the import of SDF or SMILES files into Instant-JChem and JChem Base to use the Standardizer which is currently too much hidden within Instant-JChem for my taste. I think people have to be forced to use it, even for the lack of usability and annoyance. I found out that I need to use the standardizer after importing several 100k structures and doing some expensive calculations on them.





I thought (simple minded as I am) that the "remove duplicate structures" feature always magically works. And it does not (of course). So if the substances are not canonized via INCHI, FICUS, uuuuu or any other canonizer, the system can not detect that the compounds are actually the same.





So it is very important to use the right-click


New JCHEM database table and define a Standarizer and then click the remove duplicate box. It is *not* a good practice to just import the SDF or SMILES file if you want to use the duplicate or overlap filter. (Here I am not quite sure if the duplicate filter performs an iternal canonization including aromatization and mesomeriszation but this is not important if the standardizer is used anyway.)





For aromatic systems this would include:


* Clean


* Dearomatize


* Aromotize


* Mesomerize





Kind regards


Tobias Kind

ChemAxon a3d59b832c

04-07-2007 14:17:50

Hi Tobias,
TobiasKind wrote:
Hi,


just some comments for people who import data from different molecular databases and different vendor sources like Daylight or MDL ISIS or PubChem or ZINC or Accelrys or Tripos or or or.





I just saw the very interesting PPT that MDL and JCHEM handle aromatic systems in a different way. I did not know that. Basically it says that


* MDL® treats 5-membered heterocycles as non-aromatic (Kekulé structure)


* ChemAxon aromatizes any 4n+2 ring system
In other words this means that for example pyrrole depicted with aromatic bonds are found by JChem but not by MDL. (I remember the discussions with the authors of that presentation.)
TobiasKind wrote:
So it is a good practice (GLP) during the import of SDF or SMILES files into Instant-JChem and JChem Base to use the Standardizer which is currently too much hidden within Instant-JChem for my taste. I think people have to be forced to use it, even for the lack of usability and annoyance.
Certainly Standardizer is a useful tool, but please note that JChem Base (and so Instant JChem also) uses a default standardization when no custom standardization configuration is specified. It contains aromatization (needed for the proper searching) and explicit hydrogen removal for the database structures (primarily to enhance speed).





See documentation:


http://www.chemaxon.com/jchem/doc/user/Query.html#standardization





We would be happy to hear your comments about the "lack of usability and annoyance" of Standardizer, or perhaps some suggestions to make it better.
TobiasKind wrote:
I found out that I need to use the standardizer after importing several 100k structures and doing some expensive calculations on them.
That should not be a problem if only aromatization is concerned, see above. Furthermore, I am not aware of that aromatization changes the results of calculations. (I mean calculations that are available in Chemical Terms. Please correct me if you were referring to third party calculations.)
TobiasKind wrote:
I thought (simple minded as I am) that the "remove duplicate structures" feature always magically works. And it does not (of course). So if the substances are not canonized via INCHI, FICUS, uuuuu or any other canonizer, the system can not detect that the compounds are actually the same.
Again, from the above it follows that aromatization is handled correctly during duplicate removal. Certainly there may be other issues like nitro representation where Standardizer can help, but these usually only matter if the molecules come from multiple different sources.
TobiasKind wrote:
So it is very important to use the right-click


New JCHEM database table and define a Standarizer and then click the remove duplicate box. It is *not* a good practice to just import the SDF or SMILES file if you want to use the duplicate or overlap filter.
Yes, you have to turn duplicate filtering on if you want it to happen during import.
TobiasKind wrote:
(Here I am not quite sure if the duplicate filter performs an iternal canonization including aromatization and mesomeriszation but this is not important if the standardizer is used anyway.)
Yes, it uses the same default standardization described above.
TobiasKind wrote:
For aromatic systems this would include:


* Clean


* Dearomatize


* Aromotize


* Mesomerize
Cleaning is unnecessary in the database, because the internal representation does not use coordinates. I am also not sure about the purpose of dearomatization in the above list. Finally, aromatization is necessary for the proper searching and duplicate removal, and it is safest to put it after mesomerization that changes aromatic rings to a resonant form. (You may also consider using the canonical tautomer rule - tautomerize - somewhere before aromatization.)





Best regards,


Szabolcs

User 677b9c22ff

05-07-2007 04:33:31

Hi Szabolcs,


ok I repeated my experiments and JChem+Instant-JChem correctly remove duplicates. So my tip just becomes obsolete. After carefully thinking I also recognized that if there


would be no correct duplicate remover it would be a great mess.





However I still found it interesting that ISIS and JChem handle aromatic structures in a different way.





My confusion came out of a simple process (besides it was 2 am in the morning). During import of a 202 structure mol file Instant-JChem says: 202 rows imported (which is true), with some comments which quickly scroll down that there are some structures removed. However at the end it comes to the message: 202 rows imported, which is actually true. But it does not report how many are now truly in the database or does not show any structure (see my forum post before). So I assumed that id did not any duplicate filtering, because I could not open any structure by refreshing the grid view (only later I found out that I need to press "Show all" (molecules) which is not the fact if I just import some SDF file without duplicate removal. In the mean time I read all the PPTs from the last ChemAxon UGM in June. And this culminated in my obsolete tip.





BTW. I like the Standardizer GUI, but in Instant-JChem it is hidden in the "Create new JCHem database table" So it costs several "annoying" mouse-clicks to create one - this was not a comment on the Standardizer itself!





Kind regards


Tobias

ChemAxon a3d59b832c

05-07-2007 08:10:04

Hi Tobias,





Thanks for the clarifications. We will check the post-import report in Instant JChem.
TobiasKind wrote:
However I still found it interesting that ISIS and JChem handle aromatic structures in a different way.
Aromaticity is a complex phenomenon, and chemists themselves debate on this issue a lot. (A crystallographer may define aromaticity differently than an organic chemist doing synthesis.) We have two different aromatization methods to suit different needs:


http://www.chemaxon.com/marvin/doc/user/aromatization-doc.html





However, both methods treat pyrrole as aromatic. ;)





Regards,


Szabolcs

ChemAxon fa971619eb

05-07-2007 08:23:31

Quote:
We will check the post-import report in Instant JChem.
See:


http://www.chemaxon.com/forum/ftopic2950.html





Tim