ChemAxon fa971619eb
16-11-2005 12:41:44
What is the best way to generate structures with the "correct" display of explicit hydrogens? By correct I mean the standard representation with hydrogens on terminal atoms and non-carbon atoms...
The -H option in Molecule.toFormat() removes explict hydrogens. The +H option adds all hydrogens back, but neither option adds them only where the are needed (e.g. adds only the hydrogens that would typically be displayed.
See, for instance the following code:
Code: |
String s = "N1C=CC2=CC=CC=C12";
Molecule mol = MolImporter.importMol(s);
int i = 0;
System.out.println(i++ + " " + s + "\n"); // 0
System.out.println(i++ + " " + mol.toFormat("smiles")); // 1
System.out.println(i++ + " " + mol.toFormat("smiles:-H")); // 2
System.out.println(i++ + " " + mol.toFormat("smiles:+H")); //3
System.out.println("");
mol.hydrogenize(true);
System.out.println(i++ + " " + mol.toFormat("smiles")); // 4
System.out.println(i++ + " " + mol.toFormat("smiles:-H")); // 5
System.out.println(i++ + " " + mol.toFormat("smiles:+H")); // 6
System.out.println(""); |
which outputs the following:
Code: |
0 N1C=CC2=CC=CC=C12
1 N1C=CC2=CC=CC=C12
2 N1C=CC2=CC=CC=C12
3 [H]N1C([H])=C([H])C2=C([H])C([H])=C([H])C([H])=C12
4 [H]N1C([H])=C([H])C2=C([H])C([H])=C([H])C([H])=C12
5 N1C=CC2=CC=CC=C12
6 [H]N1C([H])=C([H])C2=C([H])C([H])=C([H])C([H])=C12 |
Why does #5 not have a hydrogen displayed on the nitrogen atom?
On a similar topic, there are standardizer actions to manage the display of hydrogens, but, again, I can't see a way to add only those that should be present andd remove those that shouldn't.
Tim
ChemAxon 25dcd765a3
16-11-2005 18:40:34
Hi!
Actually there is no export option to selective convert implicit Hydrogens to explicit ones.
This is what you would like to do, or do I misunderstand something?
I don't know if there is any standardizer action which can do this...
Quote: |
Why does #5 not have a hydrogen displayed on the nitrogen atom? |
The implicit H is not written to the SMILES as it follows from the normal valence assumptions. So it is not neccessary to write explicitly.
All the best
Andras
ChemAxon fb166edcbd
17-11-2005 01:30:03
About Standardizer:
the ImplH action makes H atoms implicit selective but the selection is based on the type of H atom: by default, only bound, non-isotope, neutral, non-radical, non-mapped hydrogen atoms are removed. This can be altered by setting certain attributes, see
http://www.chemaxon.com/jchem/doc/user/StandardizerConfiguration.html#implhsec
http://www.chemaxon.com/jchem/doc/user/Standardizer_files/examples/Examples.html#06
but it is not possible to implicitize/explicitize H atoms based on their position in the molecule graph.
But it seems to me that your question is not really about explicit and implicit representation of H atoms. You would like to write H atoms (implicit or explicit) at only terminal atoms while hide them for intermediate atoms. This is not possible, since the SMILES format does not include the implicit H atoms. This is enabled in Marvin GUI (msketch/mview) as a display option which is not saved in any molecule format.
ChemAxon fa971619eb
17-11-2005 17:13:48
I've tracked the problem down a bit more. Consider the following code:
Code: |
package foo;
import java.io.*;
import chemaxon.struc.Molecule;
import chemaxon.formats.MolImporter;
public class ExportImport {
public static void main(String[] args) throws Exception {
String orig = "N1C=CC2=CC=CC=C12";
String s;
Molecule mol0 = MolImporter.importMol(orig);
int i = 0;
System.out.println(i++ + " " + orig + "\n"); // 0
s = mol0.toFormat("mol");
Molecule mol1 = MolImporter.importMol(s);
System.out.println(i++ + " " + mol1.toFormat("smiles") + "\n"); // 1
s = mol1.toFormat("mol:-H");
Molecule mol2 = MolImporter.importMol(s);
System.out.println(i++ + " " + mol2.toFormat("smiles") + "\n"); // 2
s = mol1.toFormat("mol:a-H");
Molecule mol3 = MolImporter.importMol(s);
System.out.println(i++ + " " + mol3.toFormat("smiles") + "\n"); // 3
}
}
|
mol3 seems to have lost the hydrogen attached to the nitrogen atom. Interestingly, if you export as smiles this doesn't happen.
ChemAxon fb166edcbd
19-11-2005 12:14:28
I do not understand this: for me it seems that mol3 has implicit H count 1 both before and after the SMILES export. See the attached test code (your code + it outputs the implicit H count on the N atom):
Code: |
java ExportImport
0 N1C=CC2=CC=CC=C12
1 N1C=CC2=CC=CC=C12
2 N1C=CC2=CC=CC=C12
H count on N atom: 1
3 c1ccc2[nH]ccc2c1
H count on N atom: 1
|
Could you explain the problem in more detail?
ChemAxon a3d59b832c
21-11-2005 14:48:24
Tim,
We discussed this in the car, I just post here too to finish this thread:
Yes, the mol format is not able to store the number of implicit hydrogens. It is OK for structures in Kekule format, but pyrrole-type aromatic N-s lose the implicit H when aromatic bonds are connected to them.
Smiles is able to represent implicit hydrogens, and it does so when a nonstandard number of hydrogens are present, like: c1cc[nH]c1
To fix this loss of information for aromatized molfiles, we introduced an extension which can store the number of implicit hydrogens in this ambiguous case. For details, see:
The "Implicit hydrogens on aromatic nitrogen" section in:
http://www.chemaxon.com/marvin/doc/user/mol-csmol-doc.html
and
http://www.chemaxon.com/forum/ftopic814.html
Best regards,
Szabolcs
User 870ab5b546
22-11-2005 14:39:13
Hi,
I found this discussion quite interesting, so I drew a compound containing two pyrroles, aromatized them, and looke at the MOL file:
Code: |
Marvin 11220509292D
11 12 0 0 0 0 999 V2000
-2.0625 -0.2225 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.7769 1.0150 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.3480 1.0150 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-1.3480 0.1900 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-3.3603 1.5984 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-2.7769 0.1900 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-0.7647 1.5984 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-0.0502 1.1859 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
-0.7647 2.4234 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.6642 2.4234 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.6642 1.5984 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 6 4 0 0 0 0
1 4 4 0 0 0 0
6 2 4 0 0 0 0
2 3 4 0 0 0 0
3 4 4 0 0 0 0
2 5 1 0 0 0 0
3 7 1 0 0 0 0
8 7 4 0 0 0 0
7 9 4 0 0 0 0
8 11 4 0 0 0 0
9 10 4 0 0 0 0
10 11 4 0 0 0 0
M STY 2 1 DAT 2 DAT
M SAL 1 1 6
M SDT 1 MRV_IMPLICIT_H
M SDD 1 0.0000 0.0000 DR ALL 0 0
M SED 1 IMPL_H1
M SAL 2 1 8
M SDT 2 MRV_IMPLICIT_H
M SDD 2 0.0000 0.0000 DR ALL 0 0
M SED 2 IMPL_H1
M END
|
Can you parse this information for me? Specifically, what do the entries in the STY and SDD lines mean?
-- Bob
ChemAxon a3d59b832c
22-11-2005 16:15:10
Hi Bob,
We store this extra information as attached data. (Data sgrooup.)
Please see the molfile specification document for the exact details of the STY, SDD, etc. tags:
http://www.mdli.com/downloads/public/ctfile/ctfile.jsp
(Pages 18-23 for V2000)
Best Regards,
Szabolcs