Extended Smiles

User 7b0ee04e66

08-02-2007 16:04:31

Good afternoon,





I am trying to use extended Smiles and would like to understand more how it works.





I have a chiral molecule with one flat bond and one absolute chiral bond.


[#6]C1CCNC[C@H]1[#6] |a:6|





However, when I try to insert into an Oracle table with a Jchem index (version 3.1.7), I get the following error message
Quote:
ORA-29875: failed in the execution of the ODCIINDEXINSERT routine


ORA-29532: Java call terminated by uncaught Java exception: java.lang.RuntimeException: The following exception has been thrown by the servlet:


Exception:


Some features of [#6]C1CCNC[C@H]1[#6] cannot be converted to smiles/cxsmiles. Use the smarts or cxsmarts format.


ORA-06512: at "JCHEM.JCHEM_CORE_PKG", line 191


ORA-06512: at "JCHEM.JC_IDXTYPE_IM", line 633
I have also tried to insert the molecule below


C[C@H]1CCNC[C@H]1C |o1:1,o2:6| and get the same error message





However I can insert without any problem this molecule


[#6]C1CCNC[C@H]1[#6] |r|





As far as I can see the "Or" option is equivalent to a "relative configuration" (2 bonds on the same side of the ring, but could be Up Up or Down Down).


The "Absolute" means that we know exactly which enantiomer is present.





What does the |r| mean exactly? What extra information is attached that is not included in "AND", "OR", "ABSOLUTE"? Or is it a different way to specify the stereochemistry.


Is a "wiggly" bond equivalent to "AND" or "OR"?





Thanks for you help


Catherine

ChemAxon aa7c50abf8

08-02-2007 17:47:12

I can index AND insert both [#6]C1CCNC[C@H]1[#6] and C[C@H]1CCNC[C@H]1C |o1:1,o2:6| with JChem 3.2.3 .





It is possible to index these structures in both JChem 3.1.7 and 3.1.7.1. The problem occurs when trying to insert them into a jc_idxtype-indexed table.





I suggest to upgrade to the latest JChem version.





Peter

ChemAxon a3d59b832c

08-02-2007 23:09:42

ctravers wrote:
As far as I can see the "Or" option is equivalent to a "relative configuration" (2 bonds on the same side of the ring, but could be Up Up or Down Down).


The "Absolute" means that we know exactly which enantiomer is present.
That's correct.
ctravers wrote:
What does the |r| mean exactly? What extra information is attached that is not included in "AND", "OR", "ABSOLUTE"? Or is it a different way to specify the stereochemistry.
Yes, it is a different way. |r| means the absence of the chiral flag, which was the old MDL way to specify relative configuration. (Unfortunately, it was not defined if the two diastereomers were in "AND" or "OR" relation.) By convention, we treat these as "AND".





See these links for more info:


http://www.chemaxon.com/jchem/doc/user/Query.html#mdl_enhanced_stereo


http://www.mdl.com/products/pdfs/Enhanced_Stereochemical_Representation.pdf
width="90%" cellspacing="0" cellpadding="3" border="0" align="center"> ctravers wrote: Is a "wiggly" bond equivalent to "AND" or "OR"? The wiggly bond can be treated as an "OR" group. (A separate group for each stereo center with wiggly bond.)





I hope this helps.





Szabolcs

User 7b0ee04e66

09-02-2007 10:05:13

Hi,





I have dropped the index and inserted the molecules. I can then re-create the index. Thanks





We have tried to upgrade to JChem 3.2 a few weeks ago without success.


We will try again as we will need to be able to insert molecules on a regular basis in the new database.





Catherine

ChemAxon a3d59b832c

09-02-2007 10:15:54

Good. Let us know if we can help anything in the upgrade.

User 7b0ee04e66

12-02-2007 11:42:41

Good morning,





I have a few more questions around finding all enantiomers in a table using jc_compare





Code:
select * from ccd_extended_smiles where jc_compare (ext_smiles, '[#6][C@H]1CCNC[C@H]1[#6] |a:1,6|', 't:p') = 1;



I only find the same molecule which is what I expected.





Code:
select * from ccd_extended_smiles where jc_compare (ext_smiles, '[#6][C@H]1CCNC[C@H]1[#6] |a:1,6|', 't:p stereoSearch:n') = 1;



I only find the same record again when I expected to retrieve more





Because my test table also contains the following


[#6][C@H]1CCNC[C@H]1[#6] |r|


[#6]C1CCNCC1[#6]


[#6]C1CCNC[C@H]1[#6] |r|


CC1CCNC[C@H]1C |r,w:1,0|


C[C@H]1CCNC[C@H]1C |o1:1,o2:6|





I thought I should be able to retrieve some of the records above by playing with the stereo options, but no luck.





exactStereoMatching:<exact-stereo-matching>


I understand this option is looking at stereo on the target structure, so would not help.





stereoSearch:<stereo-search>


What is different in that option? Does it remove stereochemistry from the query?





When I try the query below in the same table





Code:
select * from ccd_extended_smiles where jc_compare (ext_smiles, 'C[C@H]1CCNC[C@H]1C |o1:1,o2:6|', 't:p stereoSearch:n') = 1;



I retrieve all the examples above including [#6][C@H]1CCNC[C@H]1[#6] |a:1,6|.





How can it be explained?


Thanks


Catherine





PS I am using JChem 3.2.3 on an Oracle 10G database.

ChemAxon a3d59b832c

12-02-2007 15:22:30

We will check it.

ChemAxon a9ded07333

14-02-2007 07:20:40

Hi Catherine,





The problem is, that you use extended smarts format ([#6]) describing the first query molecule ([#6][C@H]1CCNC[C@H]1[#6] |a:1,6|), but the second one (C[C@H]1CCNC[C@H]1C |o1:1,o2:6|) can be assumed as an extended smiles. The difference between the two formats that smarts import does not set implicit hydrogens. [Try the difference in MarvinSketch: paste [#6][C@H]1CCNC[C@H]1[#6] |a:1,6| and it's smiles equivalent, C[C@H]1CCNC[C@H]1C |a:1,6|, then use View / Hydrogens / All option]





The database representation uses the cxsmiles format of the molecules. If you perform a perfect search, the differences in implicit hydrogens will prevent hits.





We advise that you should try not to use smarts or cxsmarts formatted molecules in the database structures, as those can be confused with the internal cxsmiles representation. The use of smiles, mol, mrv, and all other formats that set implicit hydrogens is safe.





We will solve the handling of implicit hydrogens during perfect search when we introduce query tables in the database.





Tamás

ChemAxon a3d59b832c

14-02-2007 10:28:31

Just a small addition: the current version (JChem 3.2.3) expects that the structures to be put in the database are specific molecules, this is why it uses internally the cxsmiles format. In the future, with the query tables, all format issues will be solved and even smarts/cxsmarts formats will be fully supported.





Certainly, SMARTS and extended SMARTS formats are absolutely OK to use as substructure queries even in the current version.

User 7b0ee04e66

14-02-2007 14:06:44

OK, Thanks, I understand the difference.


I cannot use 'p' to search a Smarts against a table of Smiles. But I can use 'e' instead.





Thanks


Catherine

ChemAxon a3d59b832c

14-02-2007 21:09:50

Very good insight!

ChemAxon a9ded07333

16-01-2008 14:34:13

From JChem 5.0 we are supporting a new search option, implicitHMatching. Using the option one can control matching between implicit and explicit hydrogens. More information can be found at JChem Query Guide (Look for feature name: "Implicit H matching mode").





Tamás

ChemAxon 9c0afc9aaf

22-01-2008 13:16:34

Quote:
In the future, with the query tables, all format issues will be solved and even smarts/cxsmarts formats will be fully supported.
Query tables are now available in the latest JChem version (5.0).





Szilard