jc_standardize does not validate output

User 7f33ec9a5c

06-11-2012 21:59:30

Hi,


I am looking for a way to validate SMILES and SMARTS in JChem Cartridge.  


In the event of invalid SMILES, jc_standardize does not perform as expected.  Please see example of using the jc_standardize operator below:



SQL> SELECT jc_standardize('BOB THE BUILDER', 'sep=~ config:removeexplicitH....a


romatize~outFormat:smiles:0') AS example


  2    FROM DUAL;


 


EXAMPLE
--------------------------------------------------------------------------------
BOB


 


SQL>


With this input, I would expect JC standardize to throw an error, instead of trimming the string to find a valid smiles?

Any guidance on this would be appreciated.


~mike


User 7f33ec9a5c

06-11-2012 22:01:45

SQL> SELECT jc_standardize('bob the builder', 'sep=~ config:removeexplicitH....a
romatize~outFormat:smiles:0') AS example
2 FROM DUAL;

EXAMPLE
--------------------------------------------------------------------------------

[#5&aH2]:o:[#5&aH2]

SQL>

ChemAxon a3d59b832c

07-11-2012 08:28:38

Hi Mike,


 


I am sorry to say, but both of these strings do correspond to the smiles syntax.


The character string before the first space is interpreted as smiles or smarts, and the rest is taken as molecule name.


 


What kind of validation would you exactly have in mind?


Probably our Structure Checker product can help with some criteria, but it won't help with ultimately ambiguous cases.


 


Szabolcs

User 7f33ec9a5c

08-11-2012 05:24:18

This function does do well at failing on bad input for every other case I have tested.


I know it's a really unique case, but I would point out that I asked the function to output SMILES, and it gave me SMARTS, so in this case, the function failed to respond properly.


I'd prefer that the function fail when it sees invalid smiles, instead of parsing a fraction of the SMILES.


~mike


 

ChemAxon 9c0afc9aaf

09-11-2012 02:14:51

Mike,


In general after the first whitespace any addition string is treaed as a data field, so can contain anything.


http://www.chemaxon.com/marvin/help/formats/smiles-doc.html#smiles_with_info


When  you mentioned "BOB THE BUILDER" on the phone I certainly did not realize this, and that "BOB" is actually a valid SMILES - some things just work better in writing :)


However "bob" is really not a valid SMILES, as the lower case denotes aromatic atoms which are not legal in this context, so we interpret it as SMARTS on the input.


We should not export it as "smiles:0" indeed.


This is a bug in 5.10.x in 5.11.x it has been fixed.


5.10:


 


 


dorant@T500 /cygdrive/c/jchem5103/bin

$ ./molconvert smiles:0 -s "bob"

[#5&aH2]:o:[#5&aH2]


 


5.11:


 


dorant@T500 /cygdrive/c/jchem5113/bin

$ ./molconvert smiles -s "bob"

 cannot convert molecule 1 to smiles: The following atom cannot be aromatic according to the SMILES definition: B

 


 


Best regards,


 


Szilard

ChemAxon 9c0afc9aaf

09-11-2012 16:33:43

Post for testing issue with notifications - please ignore.