User c4e58ee8b2
27-01-2011 22:10:57
Hi, I have three smiles string:
C[C@H](N
- )C(
- )=O |$;;;_R1;;_R2;$|
C[C@H](N
- )C(
- )=O
C[C@H](N*)C(*)=O
and I expect to get the same result if I pass through the Molecule.toFormat("smiles:u") function, but I got different result. What should I do to assure unique result regardless what the input is?
The following is my test code and run results.
Thanks
Tianhong
---------------------------------------
public class UniqueSmilesTest {
public static void main(String[] args) {
try {
String smiles1 = "C[C@H](N
- )C(
- )=O |$;;;_R1;;_R2;$|";
toUniqueSmiles(smiles1);
String smiles2 = "C[C@H](N
- )C(
- )=O";
toUniqueSmiles(smiles2);
String smiles3 = "C[C@H](N*)C(*)=O";
toUniqueSmiles(smiles3);
} catch (Exception e) {
e.printStackTrace();
}
}
private static void toUniqueSmiles(String smiles) throws IOException {
System.out.println("Input: " + smiles);
InputStream is = new ByteArrayInputStream(smiles.getBytes());
MolImporter importer = new MolImporter(is);
Molecule molecule = importer.read();
String smilesU = molecule.toFormat("smiles:u");
System.out.println("Output: " + smilesU);
}
}
------------------------------------------
Input: C[C@H](N
- )C(
- )=O |$;;;_R1;;_R2;$|
Output: C[C@H](N
- )C(
- )=O
Input: C[C@H](N
- )C(
- )=O
Output: C[C@H](N*)C(*)=O
Input: C[C@H](N*)C(*)=O
Output: C[C@H](N*)C(*)=O
User c4e58ee8b2
28-01-2011 13:20:16
Volfi, thanks for your response.
What really concerns me is that the second structure is the cannonical form of the first structure, but it can be further cannonicalized to the third structure.
It seems that I need to call the exporting function twice to get the 'true' cannonical form, which is really I am looking for.
Tianhong
ChemAxon 25dcd765a3
28-01-2011 13:31:50
Dear Tianhong,
You are right, the second structure is unique smiles of the first structure. We loose information during unique smiles export, in this case the Rgroup info (which cannot be stored at smiles format). But instead of double export and loose information during the export you may consider not to use the Rgroup information during the import which is the solution I've suggested, but it is you to choose.
Volfi
User c4e58ee8b2
28-01-2011 13:38:05
Even if I specify the file format as 'smiles' in my importer, the first structure still produces the second structure, not the third. I am using MarvinBeans version 5.0.
Tianhong
ChemAxon 25dcd765a3
28-01-2011 13:48:46
Dear Tianhong,
At the first sight I thought you have found a bug, but I have re-checked it again and it works as expected.
I have attached the test code.
java Test
Input: C[C@H](N
User c4e58ee8b2
28-01-2011 14:39:16
This first one still produces smiles with
ChemAxon 25dcd765a3
28-01-2011 15:00:39
Dear Tianhong,
You are right the problem is that 5.0 is already 3 years old and we have fixed plenty of issues since then.
Please update your Marvin.
All the best
Volfi