User d68ef9d5a9
18-08-2005 19:05:10
Hi,
I am inserting this molecule to my database. We need to store Daylight aromatized structure mol file into our cd_structure of the structure table. The import file format is sd file containing Kecule structure form. These are the basic codes to make the insertion.
MolHandler mh=new MolHandler(structureString); //sdf file attached
// mh.addHydrogensToAromaticHeteroAtoms();
// this would not help at all.
Molecule mole=mh.getMolecule();
mole.dearomatize();
System.out.print(mole.toFormat("smiles")+"; ");
//correctly “N1C=CC2=CC=CC=C12”
mole.aromatize(MoleculeGraph.AROM_DAYLIGHT);
System.out.println(mole.toFormat("smiles"));
// correctly “c1ccc2[nH]ccc2c1”
// this.uh.setInputMolecule(mole); //See my comments
this.uh.setValuesForFixColumns(101, mole.toFormat("mol"));
this.uh.setValueForAdditionalColumn(1,new Integer(6001), Types.INTEGER);
this.uh.execute();
Now if you look into the cd_smiles field in Structure table, it became “c1ccc2nccc2c1”. This smile is incorrect according to our discussion in one of my previous posts. The difference from “c1ccc2[nH]ccc2c1” to “c1ccc2nccc2c1” is 1 H molecular weight.
Yes, this problem can be resolved by add molecule object into the UpdateHandler (see the commented out code). In this way, the smiles (“c1ccc2[nH]ccc2c1”) can be generated correctly. But the cd_structure contains mol file that is not consistent with the cd_smiles. It looks like that the fingerprints in the table were created based cd_smiles, not cd_structure when the molecule is set into UpdataHandler (I can tell this when I try certain searches).
The real problem is inconsistency of information between cd_structure and cd_smiles. Although this situation is better than losing 1 H unit, but it still creates problem in structure display. To compromise the structure orientation required from our chemists, we display structure in Marvin or MarvinSketch with mol file instead of smiles. Therefore if a user copy a Daylight aromatized structure in Marvin and paste in another interface, and tries to search this structure, unless user deliberately add an H to the aromatic N, it will not hit the right structure because the direct intepretation from the Daylight Aromatized structure to smiles will be “c1ccc2nccc2c1” that is not equal the cd_smiles.
I think the real problem here is the Daylight aromatization in the level of mol file. The needed function is to reinstall the H on Daylight-aromatic heteroatoms at mol file level whenever the aromatization is called, and this information has to be remembered regardless how user wants the structure to be displayed.
I am using JChem base 3.0.14 in testing this. Let me know if I did not explain the problem clearly.
Ben Li
Neurogen Corporation
I am inserting this molecule to my database. We need to store Daylight aromatized structure mol file into our cd_structure of the structure table. The import file format is sd file containing Kecule structure form. These are the basic codes to make the insertion.
MolHandler mh=new MolHandler(structureString); //sdf file attached
// mh.addHydrogensToAromaticHeteroAtoms();
// this would not help at all.
Molecule mole=mh.getMolecule();
mole.dearomatize();
System.out.print(mole.toFormat("smiles")+"; ");
//correctly “N1C=CC2=CC=CC=C12”
mole.aromatize(MoleculeGraph.AROM_DAYLIGHT);
System.out.println(mole.toFormat("smiles"));
// correctly “c1ccc2[nH]ccc2c1”
// this.uh.setInputMolecule(mole); //See my comments
this.uh.setValuesForFixColumns(101, mole.toFormat("mol"));
this.uh.setValueForAdditionalColumn(1,new Integer(6001), Types.INTEGER);
this.uh.execute();
Now if you look into the cd_smiles field in Structure table, it became “c1ccc2nccc2c1”. This smile is incorrect according to our discussion in one of my previous posts. The difference from “c1ccc2[nH]ccc2c1” to “c1ccc2nccc2c1” is 1 H molecular weight.
Yes, this problem can be resolved by add molecule object into the UpdateHandler (see the commented out code). In this way, the smiles (“c1ccc2[nH]ccc2c1”) can be generated correctly. But the cd_structure contains mol file that is not consistent with the cd_smiles. It looks like that the fingerprints in the table were created based cd_smiles, not cd_structure when the molecule is set into UpdataHandler (I can tell this when I try certain searches).
The real problem is inconsistency of information between cd_structure and cd_smiles. Although this situation is better than losing 1 H unit, but it still creates problem in structure display. To compromise the structure orientation required from our chemists, we display structure in Marvin or MarvinSketch with mol file instead of smiles. Therefore if a user copy a Daylight aromatized structure in Marvin and paste in another interface, and tries to search this structure, unless user deliberately add an H to the aromatic N, it will not hit the right structure because the direct intepretation from the Daylight Aromatized structure to smiles will be “c1ccc2nccc2c1” that is not equal the cd_smiles.
I think the real problem here is the Daylight aromatization in the level of mol file. The needed function is to reinstall the H on Daylight-aromatic heteroatoms at mol file level whenever the aromatization is called, and this information has to be remembered regardless how user wants the structure to be displayed.
I am using JChem base 3.0.14 in testing this. Let me know if I did not explain the problem clearly.
Ben Li
Neurogen Corporation