imidazole dearomatization??

User 21b7e0228c

09-11-2010 13:11:32

We are developing a standardization procedure, pushing molecules through their Kekulé forms prior to submission to tautomer enumeration/analysis - which has the merit to clearly show what double bonds and H are migrating. However, we quickly run into trouble with... imidazole, which would NOT be kekulized whatever you do - unless you add an explicit H on a nitrogen

echo "c1ncnc1"| standardizer -f smiles:u-a -c "dearomatize"  returns "c1ncnc1" ??


echo "c1n([H])cnc1"| standardizer -f smiles:u-a -c "dearomatize"  returns "N1C=CN=C1"  - which is what we want!

However, it's a pity that one needs to explicitly add this H, which is not needed (it's not like there were some net negative charge in c1ncnc1, so the box should know that a H is missing!!). I temporarily fixed the issue by forcing a standardizer "hydrogenation" of a N atom prior ro the "dearomatize" command, and the eventual "dehydrogenize"...


ChemAxon 25dcd765a3

09-11-2010 19:20:06

The problem with the dearomatization process originates from the fact that an implicit Hydrogen is missing from one of the Nitrogen atom. One Nitrogen atom has an implicit Hydrogen atom while the other has not. Generally, SMILES import is not able to decide which Nitrogen atom needs the implicit Hydrogen. In this case it is not important which Nitrogen atom gets the implicit Hydrogen as the molecule is symmetric but generally it is not obvious to figure it out.

After the Hydrogen atom is assigned to any of the two Nitrogen atom (implicitly or explicitly) the dearomatization process can be accomplished.

User 21b7e0228c

10-11-2010 11:11:12

Well, in a 5-membered ring with an arbitrary number of N atoms, all the rest being C (having O or S lifts the ambiguity),one of the N atoms must have an exocyclic substituent (heavy OR H). If this is not the case, a H should be added to one of the Ns (on which? - we don't care FOR NOW: that's the job of tautomer management, later on in the standarization process). So I found the following SMARTS to be a useful H appender to 5-membered heterocycles:


Compare, for example,

echo "c1ncnn1" | $CHEMINFO_BIN/standardize -c "[c,nX2]-,:1-,:[c,nX2]:[n;X2:1]:[c,nX2]:[c,nX2]-,:1>>[H][n:1]:1:[c,nX2]-,:[c,nX2]-,:[c,nX2]:[c,nX2]:1..dearomatize" | mview -


echo "c1ncnn1" | $CHEMINFO_BIN/standardize -c "dearomatize" | mview -

So, the dearomatization can be achieved if a prior standardization is performed... and if  this ain't the absolutely correct SMARTS I'm sure there are SMARTS geeks out there who will find it! Dearomatization is needed prior to tautomer analysis, so it would be cool to have it work all by itself, not having the user to learn SMARTS (a neuron-killing process ;-)

User 21b7e0228c

10-11-2010 11:20:04

Actually, my above-posted SMARTS does not work with negative ionized rings, such as tetrazoles - one will have to enforce the formal charge on the atoms to be zero!!

User c31567e5e3

13-11-2010 03:14:34

Looks like you're working on structure standardization.  I've recently gone through this painful process myself and would love to compare notes with you:

ChemAxon 25dcd765a3

16-11-2010 09:46:06

This is very nice!

How fast is the whole standardization process (import, standardization, export)?

Am I understand right that this standardization generates unique smiles without conversion to aromatic form?

User c31567e5e3

17-11-2010 03:37:04

This is very nice!

Thanks.  It still needs a lot of work.  I'd love to get feedback on cases where it's not working correctly.  I've done my best to compare with InChI and PubChem's standardizer, but there are just too much nuances for me to have any hope.

How fast is the whole standardization process (import, standardization, export)?

It's not very fast, given that I had to reimplement some of the basic capabilities that are only available as plugins (e.g., tautomer, mesomer, Reactor/SMIRKS, etc.).  You can try the command line version by downloading the following jar files: (this is version 3.2.12)

and invoking it as follows:

java -cp jchem.jar:standardizer.jar gov.nih.ncgc.util.MolStandardizer FILES...

The output format is the canonical SMILES with the corresponding hashkey (similar to InChI's) encoding the molecule in three different resolutions.

Am I understand right that this standardization generates unique smiles without conversion to aromatic form?

This would certainly be the right/InChI way of doing it, but since I'm not that smart, I had to settle for various combinations of smiles:u, smiles:q, aromatize, and dearomatize.  I think it's a testament to JChem that there are only a handful of classes that one really needs to do just about anything (though I can't speak for 3D, since I don't know anything about it):

Molecule, MolBond, MolAtom, MolHandler, MolSearch, MSketchPane, MViewPane, MolPrinter, MolImporter

ChemAxon 25dcd765a3

17-11-2010 08:36:38

Right now I don't have time to test the correctness, but I will definitely play with it in January.

It is still not clear for me if it generates unique smiles?

User 21b7e0228c

17-11-2010 09:17:19

Definitely, java does persecute me - I can't make it work in command line mode either, albeit I downloaded both jars, added them to the $CLASSPATH and then entered the specified commands - to get the frustrating (but, I agree, quite STANDARD ;-) class not found errors. Where are the good old times when Fortran was a programming language? (sigh)!

User c31567e5e3

17-11-2010 15:08:54

I've produced a (minimal) self-contained jar file that should work.  I've tried it with various jvm (gij, OpenJDK, etc.) and it works fine.  Please grab this file:

Now use your favorite jvm, e.g., 

java -jar standardizer-v7.jar FILES...

gij -jar standardizer-v7.jar FILES... (very slow)

Please let me know (either via email or this forum) if you're still having problems.