SMILES starting with a parenthesis

User 6ef33138f9

07-02-2006 19:23:58

We've observed some unexpected behavior when importing some SMILES that start with parentheses (using either MarvinSketch or MolImporter directly). In most cases it treats the input as SMARTS, not SMILES.





Consider the following examples. (These are contrived examples; obviously there are other ways of writing these molecules without the leading parentheses.)


Code:



(OC)C           Treated as SMARTS


(C)C1CC(N)CC1   Also treated as SMARTS   


(N)C1CC(C)CC1   Treated as SMILES





All the examples should be valid SMILES, at least according to Daylight.


http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html#RTFToC18





So, two questions:


1) Why are the first two examples above treated as SMARTS?


2) What's different about the third example that makes it work?





Thanks,


Chris

ChemAxon 25dcd765a3

08-02-2006 08:09:50

Hi Chris,





I have tried with Marvin 4.0.4. I couldn't reproduce the problem with the following code which prints out the input format (part from SomeImport.java):


Code:
   InputStream is = new BufferedInputStream( new FileInputStream( args[ 0 ] ) );


        MolImporter molimp = new MolImporter( new MolInputStream( is ) );


   MolImporter strimp = new MolImporter();


   Molecule m = new Molecule();


   Molecule molFromSmi =  new Molecule();


   int n = 0;


   while(molimp.read(m)){


       System.err.println(m.getInputFormat());


   }


   System.err.println();


    }








Saving the 3 molecule strings to '1.txt' and running the code shows that all the 3 string is imported as SMILES:


Code:
java SomeImport 1.txt


smiles


smiles


smiles








How did you import the strings?





All the best


Andras

User 6ef33138f9

08-02-2006 15:15:50

Hello Andras,





We've tried several things. This is what I did for the tests above:





1) Create a .smi file for one of the molecules above, using a text editor


2) Open the .smi file in MarvinSketch


3) Look at how MarvinSketch draws the image: if it imported as SMARTS, it draws an explicit "C" in some cases


4) Go to File > Save As... > SMILES. If it imported as SMARTS, you'll get an exception like the following (for (OC)C, the first example above):





Code:
Chemaxon.marvin.util.MolExportException:


Some features of [#6]C[#8]) cannot be converted to smiles/cxsmiles. Use the smarts or cxsmarts format.






I did these tests after observing similar things in our code, with more complex molecules. Our import code is similar to yours. I didn't explicitly check the input format (as you did); I just observed that it failed with a MolExportException when I tried to convert the Molecule back to SMILES, with mol.toObject("smiles").





I added the check for the input format, and you're right: it says "smiles". But the molecule can't be converted to SMILES!





Code:



Molecule mol = MolImporter.importMol("(OC)C");


System.err.println(mol.getInputFormat());     // prints "smiles"


System.err.println(mol.toObject("smiles"));   // throws exception








Thanks,


Chris

ChemAxon 25dcd765a3

08-02-2006 22:48:41

Hi Chris,





I have found what makes the problem:


SMILES string started with a parentheses '(' is interpreted as SMARTS with component level grouping. We fix this bug.


Thank you for the report.





Andras

User 6ef33138f9

09-02-2006 20:54:06

Thank you, Andras!





Do you know when the fix will be released?





Chris

ChemAxon 25dcd765a3

12-02-2006 20:51:29

Hi Chris,





The fix is ready. The new release will contain this fix also.


I hope it will be available in a few weeks.





All the best


Andras