Trouble matching aromatic and Kekule benzene

User f05f6b8c05

23-05-2012 03:42:50

 


Hi,


I'm using jchem 5.8.3 API


I am confused why below search code cannot use aromatic benzene to identify kekule benzene, after both get standardized and dearomatized etc.


If I comment out so.setStereoSearchType(SearchConstants.STEREO_EXACT); below, then match is found, but I don't know why that is necessary.


Any help would be very much appreciated.


Thanks,


Andrew


   public void Process() throws Exception {


        MolSearch s = new MolSearch();


        MolSearchOptions so = new MolSearchOptions(SearchConstants.FULL_FRAGMENT);


        so.setQueryAbsoluteStereo(true);


        so.setChargeMatching(SearchConstants.CHARGE_MATCHING_EXACT);


        so.setDoubleBondStereoMatchingMode(StereoConstants.DBS_ALL);


        so.setImplicitHMatching(SearchConstants.IMPLICIT_H_MATCHING_ENABLED);


        so.setExactBondMatching(true);


        so.setStereoSearchType(SearchConstants.STEREO_EXACT);


        so.setTautomerSearch(SearchConstants.TAUTOMER_SEARCH_OFF);


        s.setSearchOptions(so);


 


        Standardizer standardizer = new Standardizer("neutralize");


 


        MolHandler mhl = new MolHandler("c1ccccc1", false);


        Molecule mm = mhl.getMolecule();


        mm.dearomatize();


        MakeHsImplicit(mm);


 


        standardizer.standardize(mm);


        System.out.println(mm.toFormat("smiles"));


        s.setQuery(mm);


 


        MolHandler mhl2 = new MolHandler("C1=CC=CC=C1", false);


        Molecule tarm = mhl2.getMolecule();


        standardizer.standardize(tarm);


        System.out.println(tarm.toFormat("smiles"));


 


        s.setTarget(tarm);


        int[][] hits = s.findAll();


        if (hits!=null) {


            System.out.println("found it");


        }


    }


 


    private void MakeHsImplicit(Molecule m) throws Exception {


        m.addExplicitHydrogens(0);


        m.implicitizeHydrogens(MolAtom.ALL_H & ~MolAtom.LONELY_H & ~MolAtom.ISOTOPE_H & ~MolAtom.CHARGED_H & ~MolAtom.RADICAL_H & ~MolAtom.MAPPED_H & ~MolAtom.HCONNECTED_H & ~MolAtom.CTSPECIFIC_H  );


    }



 

User f05f6b8c05

27-05-2012 16:06:56

Hi,


I think the answer is related to another post of mine .. I needed to use StandardizedMolSearch here.


Thanks,


Andrew

ChemAxon 42004978e8

29-05-2012 09:03:05

Hi Andrew,


We are happy that you could resolve the issue, feel  free to ask further questions if you have any.


Regards,


Robi

User f05f6b8c05

30-05-2012 19:52:03

 


Hi,


I am unfortunately still confused by this problem.  


I'm using jchem 5.8.3 API


I am confused why attached Java code cannot use aromatic benzene to identify kekule benzene, after both get standardized and dearomatized etc.


If I comment out so.setStereoSearchType(SearchConstants.STEREO_EXACT); below, then match is found, but I don't know why that is necessary.


Any help would be very much appreciated.


Thanks,


Andrew


 

ChemAxon a3d59b832c

01-06-2012 09:48:36

Hi Andrew,


 


Your latest code still seems to explicitly dearomatize the structures and use MolSearch.


Please note that multiple different dearomatized resonance structures could be possible from the same substituted benzene input, so this practice seems incorrect to me.


However, I think that this is a different issue. I seems to relate to the fact that double bond stereochemistry is set on the double bonds coming from the smiles import, but it is not set by the dearomatization routine.


See the corresponding sections from our smiles documentation:


 


Cis-trans isomerism

The default stereoisomers in small rings (size < 8) are cis,
which are not written explicitly.

See import option c
to override this feature.

http://www.chemaxon.com/marvin/help/formats/smiles-doc.html#parity


 


And the related smiles import option:


c


Ignore fixing of double bond stereo information in small rings,
also ignore fixing of aromatic bonds to aliphatic if necessary.

Double bonds in small rings (ring size < 8) is imported
automatically with CIS stereo information. If c options is set,
the double bond stereo information is not changed to CIS
during the import.





http://www.chemaxon.com/marvin/help/formats/smiles-doc.html#ioption_c


 


Best regards,


Szabolcs

User f05f6b8c05

01-06-2012 11:01:29

Thank you for the explanation.  Very helpful.


Very good point about different dearomatized resonance structures from same substituted benzene input!


 





User f05f6b8c05

02-06-2012 04:05:43


Hi,


 


I have studied your response to better understand it.


 


If the context of the code/program is to take a structure from a user and search it for benzene, then you advise to do it with all structures aromatized?  (because program will not know how user generated their structure .. maybe they import from smiles without -c option set, like in this example).  This seems more robust than to use all structures dearomatized?


 


For same reasons, do you recommend jchem tables be set with aromatize standardization?


 


This issue come up because our users prefer Kekule representation over aromatic.


 


Thank you for guidance with this matter.


 


Andrew


ChemAxon a3d59b832c

04-06-2012 12:17:46

Dear Andrew,




If the context of the code/program is to
take a structure from a user and search it for benzene, then you advise
to do it with all structures aromatized?  (because program will not know
how user generated their structure .. maybe they import from smiles
without -c option set, like in this example).  This seems more robust
than to use all structures dearomatized?



Yes, that is correct. Both the query and target (database) structures should be aromatized for correct search results.


 



For same reasons, do you recommend jchem tables be set with aromatize standardization?



Indeed. A standardizer configuration for the JChem table should include aromatization.


 


This issue come up because our users prefer Kekule representation over aromatic.


Yes, we are aware of this user preference as well. So the recommended presentation is to use the cd_structure column, which contains the originally inserted structure. The hit display routines (hit coloring, etc.) use this original form for the display, while the standardized form is used for the searching operation - ensuring correct results.


 


So it is OK to dearomatize the structure before inserting into a JChem table, but the table standardization must still contain aromatization.


 


Another approach is to prepare the molecules for presentation after the search. A different, "beautification" Standardizer configuration can be used for this, or just simply calling dearomatization. This approach is not integrated into JChem, it must be developed within your own workflow


 


See also the corresponding sections of the manual:


http://www.chemaxon.com/jchem/doc/dev/dbconcepts/index.html#standardizerintegration


http://www.chemaxon.com/jchem/doc/user/query_standard.html


 


Best regards,


Szabolcs


User f05f6b8c05

05-06-2012 18:49:25

Thanks .. this all makes sense now .. smile.