Problems (BUG??) with superstructure search

User 3cdafe845f

13-10-2009 12:52:51

When we use the superstructure search for the following two molecules:


CCN(CCO)C(=NC(=N)C1=CC=C(S1)[N](O)=O)C1=CC=CC=C1   (query)


CCN(CCO)C(=NCC1=CC=C(S1)[N](O)=O)C1=CC=CC=C1           (target)


we get no match, although we think that this should certainly be the case. The interesting thing is, that if you remove the ringstructure at the end of the SMILES string, you get a match although the part that differs in the two structures is the same.


CCN(CCO)C(=NC(=N)C1=CC=C(S1)[N](O)=O)   (query)


CCN(CCO)C(=NCC1=CC=C(S1)[N](O)=O)           (target)


 


Here a short part of our code:


public static void main(String[] args) throws Exception{

           Molecule[] mols;
           MolSearch ms = new MolSearch();
           ms.setSearchType(SearchConstants.SUPERSTRUCTURE);
    
           Molecule mol1 = MolImporter.importMol("CCN(CCO)C(=NC(=N)C1=CC=C(S1)[N](O)=O)C1=CC=CC=C1");
           Molecule mol2 = MolImporter.importMol("CCN(CCO)C(=NCC1=CC=C(S1)[N](O)=O)C1=CC=CC=C1");
           Boolean matching = false;

             ms.setQuery(mol1);
             ms.setTarget(mol2);
             if (ms.findAll()==null){
                   System.out.println("no matches");
           }

}

If we use SUBSTRUCTURE instead of SUPERSTRUCTURE and switch mol1 and mol2 the same problem occurs. The smaller molecule is (in our opinion incorrectly) not recognized as substructure of the bigger molecule.


Any ideas? Is our assumption that those two are structure and superstructure wrong? Or is it really a bug?


 


Thanks and regards,


Tobias

ChemAxon a3d59b832c

13-10-2009 14:05:16

Hi Tobias,


 


I moved this topic to the appropriate forum section. We will investigate this problem.


 


Best regards,


Szabolcs

ChemAxon 42004978e8

14-10-2009 13:52:43

Hallo Tobias,


The matching depends on aromaticity handling:


If you aromatize (standardize) both structures before searching you get a match. We allways advise you to standardize structures in order to achieve matching between different representations.


You may ask why this is important here if the two strictures have the same Kekule form. The explanation is the following:


Both structure contain a 6 membered aromatic ring. (phenyl)


They also contain a five membered ring, whose aromaticity depends on the surrounding structures (ambiguous aromaticity). see: http://chemaxon.com/jchem/doc/user/query_searchoptions.html#vaguebond at level 1.


While handling such structures we aromatize the original query molecule (query mol, or target in case of superstructure search) and then build the different representations. So the 5 membered will match both the target (query for superstr) side kekule and aromatized forms as well. (That's why there is a match if you delete the phenyl)


This is not repeated for the target (query for superstructure) where the phenyl ring remains in its kekule form which won't  match the already aromatized phenyl of the other melecule.


 


So to summarize please aromatize (standardize) your structures before searching. Alternatively you can use StandardizedMolSearch instead of MolSearch. This class will do the job for you and explicit aromatization is not needed.


Regards,


Robert


 


 

User 3cdafe845f

14-10-2009 14:26:40

Thanks Robert,


that solves the problem.


Regards,


Tobias