Smiles generated by jchem base don't match FULL theselves

User f88f01065f

02-11-2011 21:50:18

Hi,




We have found a few examples where the jchem base FULL search will not
match a smiles that comes from cd_smiles of the jchem base table itself.




So for some paricular molecules:




- I load a molfile (that looks OK to me) and save it in jchem base




- I then read the smiles that is produced by jchem base




- and do a FULL search using this smiles




- and I get back no results




I am attaching a full reproducible java project.


Get it at:


[Moderator edit: link removed as contains license information]




Just untar and run ant.




Then see the code.




The molfile that is causing this behaviour is below (it is also included in the code)




093


  RCSB PDB11021110093D


Ideal coordinates from Chemical Component Dictionary


 40 41 0 0 0 0 999 V2000


   -1.7480 2.5500 -0.1700 O 0 0 0 0 0 0 0 0 0 0 0 0


   -2.5980 1.5450 0.3640 S 0 0 0 0 0 0 0 0 0 0 0 0


   -3.3260 1.6450 1.5810 O 0 0 0 0 0 0 0 0 0 0 0 0


   -3.7250 1.2370 -0.8100 N 0 0 0 0 0 0 0 0 0 0 0 0


   -4.7360 0.1980 -0.6000 C 0 0 0 0 0 0 0 0 0 0 0 0


   -5.8550 0.3610 -1.6300 C 0 0 0 0 0 0 0 0 0 0 0 0


   -5.3360 0.1240 -2.9400 O 0 0 0 0 0 0 0 0 0 0 0 0


   -1.5960 0.1040 0.5230 C 0 0 0 0 0 0 0 0 0 0 0 0


   -2.1530 -1.0790 0.9770 C 0 0 0 0 0 0 0 0 0 0 0 0


   -3.8390 -1.1380 1.3890 Cl 0 0 0 0 0 0 0 0 0 0 0 0


   -1.3720 -2.2150 1.1040 C 0 0 0 0 0 0 0 0 0 0 0 0


   -0.0320 -2.1740 0.7780 C 0 0 0 0 0 0 0 0 0 0 0 0


   -0.2560 0.1590 0.1990 C 0 0 0 0 0 0 0 0 0 0 0 0


    0.5350 -0.9840 0.3200 C 0 0 0 0 0 0 0 0 0 0 0 0


    1.9720 -0.9330 -0.0310 C 0 0 0 0 0 0 0 0 0 0 0 0


    2.7250 -1.8540 -0.6730 C 0 0 0 0 0 0 0 0 0 0 0 0


    2.1220 -3.1520 -1.1440 C 0 0 0 0 0 0 0 0 0 0 0 0


    4.0350 -1.5800 -0.8840 N 0 0 0 0 0 0 0 0 0 0 0 0


    3.1630 0.3560 0.2890 S 0 0 0 0 0 0 0 0 0 0 0 0


    4.5450 -0.3950 -0.4450 C 0 0 0 0 0 0 0 0 0 0 0 0


    5.7920 0.0680 -0.5470 N 0 0 0 0 0 0 0 0 0 0 0 0


    6.1000 1.2600 -0.0520 C 0 0 0 0 0 0 0 0 0 0 0 0


    5.2450 1.9300 0.4960 O 0 0 0 0 0 0 0 0 0 0 0 0


    7.5080 1.7830 -0.1680 C 0 0 0 0 0 0 0 0 0 0 0 0


   -3.7150 1.7450 -1.6360 H 0 0 0 0 0 0 0 0 0 0 0 0


   -4.2770 -0.7840 -0.7140 H 0 0 0 0 0 0 0 0 0 0 0 0


   -5.1500 0.2920 0.4040 H 0 0 0 0 0 0 0 0 0 0 0 0


   -6.6500 -0.3550 -1.4200 H 0 0 0 0 0 0 0 0 0 0 0 0


   -6.2550 1.3740 -1.5760 H 0 0 0 0 0 0 0 0 0 0 0 0


   -6.0710 0.2360 -3.5580 H 0 0 0 0 0 0 0 0 0 0 0 0


   -1.8120 -3.1350 1.4580 H 0 0 0 0 0 0 0 0 0 0 0 0


    0.5760 -3.0610 0.8760 H 0 0 0 0 0 0 0 0 0 0 0 0


    0.1780 1.0820 -0.1540 H 0 0 0 0 0 0 0 0 0 0 0 0


    2.8890 -3.7520 -1.6350 H 0 0 0 0 0 0 0 0 0 0 0 0


    1.7250 -3.6990 -0.2890 H 0 0 0 0 0 0 0 0 0 0 0 0


    1.3170 -2.9450 -1.8490 H 0 0 0 0 0 0 0 0 0 0 0 0


    4.6030 -2.2200 -1.3420 H 0 0 0 0 0 0 0 0 0 0 0 0


    7.5690 2.7690 0.2930 H 0 0 0 0 0 0 0 0 0 0 0 0


    8.1920 1.1020 0.3400 H 0 0 0 0 0 0 0 0 0 0 0 0


    7.7830 1.8560 -1.2200 H 0 0 0 0 0 0 0 0 0 0 0 0


  1 2 2 0 0 0 0


  2 3 2 0 0 0 0


  2 4 1 0 0 0 0


  2 8 1 0 0 0 0


  4 5 1 0 0 0 0


  4 25 1 0 0 0 0


  5 6 1 0 0 0 0


  5 26 1 0 0 0 0


  5 27 1 0 0 0 0


  6 7 1 0 0 0 0


  6 28 1 0 0 0 0


  6 29 1 0 0 0 0


  7 30 1 0 0 0 0


  8 9 2 0 0 0 0


  8 13 1 0 0 0 0


  9 10 1 0 0 0 0


  9 11 1 0 0 0 0


 11 12 2 0 0 0 0


 11 31 1 0 0 0 0


 12 14 1 0 0 0 0


 12 32 1 0 0 0 0


 13 14 2 0 0 0 0


 13 33 1 0 0 0 0


 14 15 1 0 0 0 0


 15 16 2 0 0 0 0


 15 19 1 0 0 0 0


 16 17 1 0 0 0 0


 16 18 1 0 0 0 0


 17 34 1 0 0 0 0


 17 35 1 0 0 0 0


 17 36 1 0 0 0 0


 18 20 1 0 0 0 0


 18 37 1 0 0 0 0


 19 20 1 0 0 0 0


 20 21 2 0 0 0 0


 21 22 1 0 0 0 0


 22 23 2 0 0 0 0


 22 24 1 0 0 0 0


 24 38 1 0 0 0 0


 24 39 1 0 0 0 0


 24 40 1 0 0 0 0


M END


$$$$

ChemAxon 9c0afc9aaf

02-11-2011 22:27:10

Hi,


Please let us know the exact JChem version you are using.


It can be accessed and printed in the code by referring to:


http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/jchem/version/VersionInfo.html#JCHEM_VERSION


Also, it would save us some time and labor if you could attach the mol file as a  file. (spaces can be lost on the web making it invalid)


Some quick comments on the code (not necessarily related to the problem):


UpdateHandler.close() should be called after modifications, see:


http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/jchem/db/UpdateHandler.html#close()


Directly modifying JChem database objects is not allowed or supported, e.g.:


 


// Clean the database from all tables

        try { statement.execute("create schema me"); } catch(Exception e){}

        try { statement.execute("drop table "+DatabaseProperties.DEFAULT_PROPERTY_TABLE); } catch(Exception e){}

        try { statement.execute("drop table "+tableName); } catch(Exception e){}

        try { statement.execute("drop table "+tableName+"_ul"); } catch(Exception e){}

 


Modifying properties in the property table is not allowed or supported:


 


DatabaseProperties properties = new DatabaseProperties(connectionHandler);




        properties.addProperty("db.autoIncrementPropertyName", "AUTO_INCREMENT");

        properties.addProperty("db.existsBitwiseAND", "true");

        properties.addProperty("db.isAutoIncrementProperty", "true");

        properties.addProperty("registration.code", "xxxxxxx"); 

 


 


Best regards,


 


Szilard

User f88f01065f

25-05-2012 17:00:18

Hi


Sorry for the confusion. I had actually prepared a small standalone project that you could use to independendly reproduce the problem at your end.


But somehow the attachement did not work (maybe the file was too big)


You can download the example java program (that includes the mol files) from


[Moderator edit: link removed as contains license information]


Now about the version:


I got the following from the jar file (included in the small project) looking inside jchem.jar at


chemaxon/jchem/version/version.properties


VERSION=5.3.0.1
MAJOR_VERSION=5.3
TABLE_VERSION=5030002

ChemAxon a3d59b832c

30-05-2012 09:34:23

Hi Dimitris,


The thing that you are doing should not work as is.


 


The query is passed to JChemSearch is interpreted as SMARTS in case of Full structure search, and you are passing a SMILES string. Please note that there are some differences in the meaning of SMARTS and SMILES, so the information may be different.


 


As an alternative, I suggest to pass the original molfile representation or to use DUPLICATE search type.


In this case, JChemSearch will try to interpret the string as SMILES first.


 


Furthermore, the JChem version used is quite old. (More than 2 years old.) We continuously add bugfixes, so it is worth to check out the latest version.


 


Best regards,


Szabolcs

User f88f01065f

05-06-2012 22:05:38

Hi Szabolcs


 


Thanks for your help. I followed your advice and used the latest (5.9.4) JChem base and the "DUPLICATE" search type which indeed seemed to improve things a lot.


We still have problems with some molecules that "don't match themselves" but I noticed that all of these cases have problematic oxidation states (typically sulfur atoms with oxidation state 3).


So the smiles that gets generated by JChem has "red markers" when I look at it in the Marvin applet and then the search does not work.


Examples of such smiles are:


NCS=O


CS(C)CCC(O)=O


CC1=SC(=S)N(CCS(O)(=O)=O)C1=O


I guess we need to look into our molecules but is this an expected behaviour from the JChem base?


Thanks again for your help

ChemAxon a3d59b832c

06-06-2012 10:14:52

Hi Dimitris,


 


Yes, the valence errors can cause problems in this regard.


 


Those can be filtered out using Structure Checker or Chemical Terms:


http://www.chemaxon.com/marvin/help/structurechecker/checkerlist.html#valence


http://www.chemaxon.com/marvin/help/chemicalterms/EvaluatorFunctions.html#hasvalenceerrorex


 


 


I will ask my colleagues who work on the internal representation to check these structures.


 


Best regards,


Szabolcs

ChemAxon 60613ab728

06-06-2012 13:36:08

Hi Dimitris,


The first two SMILES can be found as positively charged molecules in PubChem.


C(C)CCC(=O)O


C(N)=O


Best Regards,


Miklos

ChemAxon 25dcd765a3

08-06-2012 11:06:03

Hi Dimitris,


As you mentioned non charged Sulfur atom cannot have valence 3.


Smiles is not able to handle structures which are not in cemically correct valence state. So valence error cannot be distinguished from radical (in the SMILES string).


We try to solve this problem in cxsmiles.