User 6ef33138f9
29-06-2005 16:54:44
Hello, I have a simple question about SMILES export using the -H option to remove explicit hydrogens. It does not seem to work for any atoms that have atom map numbers. For example, if I import [C:1]C and export it using -H, I get [CH3:1]C, not [C:1]C as expected. A code sample is below.
Thanks,
Chris
Code: |
String smiles1 = "CC";
Molecule marvinMolecule = MolImporter.importMol(smiles1);
ByteArrayOutputStream os = new ByteArrayOutputStream();
MolExporter exp = new MolExporter(os, "smiles:-H");
exp.write(marvinMolecule);
os.close();
String smiles2 = os.toString().trim();
assertEquals(smiles1, smiles2); // OK
smiles1 = "[C:1][C]";
marvinMolecule = MolImporter.importMol(smiles1);
os = new ByteArrayOutputStream();
exp = new MolExporter(os, "smiles:-H");
exp.write(marvinMolecule);
os.close();
smiles2 = os.toString().trim();
assertEquals(smiles1, smiles2); // fails
|
ChemAxon 25dcd765a3
30-06-2005 16:39:10
Hi Chris!
In the SMILES string [CH3:1]C the 'H' means implicit H.
The molecule with explicit H would be:
[H][C:1]([H])([H])C([H])([H])[H]
But if you export to SMARTS string, the presented molecule would look like: [C:1]C just what you would like.
All the best
Andras
ChemAxon 25dcd765a3
30-06-2005 19:02:47
One more thing I have just seen.
This is not a valid SMILES: "[C:1][C]"
You may think one of this valid SMILES: "[CH3:1][CH3]" or "[CH3:1]C"
or your original string but imported as SMARTS: "[C:1][C]"
In this latter case you have to specify in the MolImporter constructor that you want to read the string as SMARTS or set "smarts" option for the MolImporter.
Code: |
MolImporter.setOptions("smarts"); |
(The first string is a valid SMILES: "CC")
All the best
Andras
User 6ef33138f9
30-06-2005 21:05:18
Thanks, Andras. I understand now that the atom map syntax is SMARTS only, not SMILES. For some reason I had thought it was supported by SMILES.
I'm still confused about the expected behavior with the different import and export options, though. I tried various tests importing "C[C][C:1]" as SMILES and SMARTS, and then exporting as SMILES and SMARTS.
1) If I read "C[C][C:1]" as SMILES and export it as SMILES, it exports "CC[CH3:1]". Why is the hydrogen count added only when the atom map number is present?
2) In the above example, since [C:1] is not valid SMILES, I assume that it's automatically importing and exporting as SMARTS even though the options say "smiles". If that's correct, then why is the result in #1 above different from the result in #3 below (when explicitly exporting as SMARTS)?
3) If I read "C[C][C:1]" as SMILES and export it as SMARTS, it exports "[#6]C[#6:1]". What determines when the atomic number is used instead of the symbol? Is there a way to get it to export "CC[C:1]" instead?
4) If I create an importer, call setOptions("smarts"), and read the molecule, I get exactly the same results as above: "CC[CH3:1]" when exported as SMILES, "[#6]C[#6:1]" when exported as SMARTS. Does setting the import option make any difference (between SMILES and SMARTS)?
5) If I create the importer using 'new MolImporter(inputStream, "smarts")', I get an exception that "marts" is not a valid format when reading the molecule. Is it expecting a different syntax for the options in this case?
Thanks,
Chris
ChemAxon 25dcd765a3
01-07-2005 15:40:00
Hi Chris,
I think now you know quite a lot of things, so I'll be short.
Quote: |
I guess Marvin in this version will automatically correct the improper valence for atoms in brackets when importing as SMILES? (But in 4.0 will allow the improper valences to remain as-is?) |
Exactly. Quote: |
- If I want to import and export SMILES, I should use H correctly for bracketed atoms. Then I can import "C[CH2][CH3:1]" and export it as "CC[CH3:1]" (which is syntactically different but semantically the same).
- If I want to import and export SMARTS, I should use "smarts:" format and call setQueryMode(true) when importing to get the expected result.
|
Correct.
All the best
Andras