Error at matching query smarts to target molecule

User 55ffa2f197

20-03-2012 21:25:19

Hi,


I am trying to match quite a number of molecules to 147 smarts, and get a count for the matching smarts in each mol. I put the query mol in queryMol map, and two loops as following. However for a molecule like this smiles c1cc(ccc1S(=O)(=O)NC(CCC(F)(F)F)C(=O)N)Cl I am getting following error message. The molecule itself is OK, and the smarts have been converted to mols fine.


It dies on the first molecue at int mc = ms.getMatchCount()


How do I fix the problem? Or at least prevent the process from dieing, and let it run through the molecule file.


Here is the code snippet:


   try {
        
         statement = connection.createStatement();
            rs = statement.executeQuery(
            "  select m.achiral_smi,i.name,i.mol_id "
           + " from ca_molname i, ca_mol m "
           + " where i.vendor_id=1 and i.mol_id=m.mol_id" );
      
         while(rs.next()){
            Molecule t = MolImporter.importMol(rs.getString("achiral_smi"));
            t.aromatize();
            for ( Map.Entry<String, Molecule> entry : queryMol.entrySet() ){
                Molecule q = ( Molecule ) entry.getValue();
                ms = new MolSearch();
                ms.setQuery( q );
               ms.setTarget( t );
                int mc = ms.getMatchCount();
           }
         }
       
        } catch ( SQLException e ) {
         e.printStackTrace();
        } catch ( MolFormatException e ) {
         e.printStackTrace();
        } catch ( IOException e ) {
         e.printStackTrace();
        } catch ( SearchException e ){
         e.printStackTrace();
        }


Error Message:


chemaxon.sss.search.SearchException: chemaxon.formats.MolFormatException: Unmatched ring closure number 2 in SMILES string CF2
Caused by:
Unmatched ring closure number 2 in SMILES string CF2
 at chemaxon.sss.search.SmartsAtomMatcher.createMolHandlerFromSMARTS(SmartsAtomMatcher.java:111)
 at chemaxon.sss.search.SmartsAtomMatcher.isRecursiveMatching(SmartsAtomMatcher.java:74)
 at chemaxon.sss.search.SmartsAtomMatcher.evaluateSmartsAtomTree(SmartsAtomMatcher.java:368)
 at chemaxon.sss.search.SmartsAtomMatcher.evaluateSmartsAtomTree(SmartsAtomMatcher.java:318)
 at chemaxon.sss.search.SmartsAtomMatcher.evaluateSmartsAtomTree(SmartsAtomMatcher.java:291)
 at chemaxon.sss.search.StructureSearch.compareAtomQueryProperties(StructureSearch.java:4548)
 at chemaxon.sss.search.StructureSearch.compareAtoms(StructureSearch.java:4801)
 at chemaxon.sss.search.StructureSearch.initMaps(StructureSearch.java:3694)
 at chemaxon.sss.search.StructureSearch.findFirst0(StructureSearch.java:6565)
 at chemaxon.sss.search.StructureSearch.findFirstHit(StructureSearch.java:6524)
 at chemaxon.sss.search.MolSearch.findNextEnumeratedHit(MolSearch.java:979)
 at chemaxon.sss.search.MolSearch.findNextFilteredHit(MolSearch.java:837)
 at chemaxon.sss.search.MolSearch.findFirstHit(MolSearch.java:708)
 at chemaxon.sss.search.MolSearch.findAllHits(MolSearch.java:777)
 at chemaxon.sss.search.Search.findAll(Search.java:200)
 at com.bms.mscoi.molProps.MolProps.sssMapping(MolProps.java:100)
 at com.bms.mscoi.molProps.MolProps.main(MolProps.java:122)
Caused by: chemaxon.formats.MolFormatException: Unmatched ring closure number 2 in SMILES string CF2
 at chemaxon.marvin.io.formats.smiles.SmilesImport.readMol0(SmilesImport.java:1034)
 at chemaxon.marvin.io.formats.smiles.SmilesImport.readMol(SmilesImport.java:566)
 at chemaxon.marvin.io.formats.smiles.SmilesImport.readMol(SmilesImport.java:526)
 at chemaxon.marvin.io.MRecordImporter.readStructure(MRecordImporter.java:764)
 at chemaxon.marvin.io.MRecordImporter.readMol(MRecordImporter.java:709)
 at chemaxon.marvin.io.MRecordImporter.readMol(MRecordImporter.java:678)
 at chemaxon.marvin.io.MRecordImporter.readMol0(MRecordImporter.java:593)
 at chemaxon.marvin.io.MRecordImporter.readMol(MRecordImporter.java:509)
 at chemaxon.formats.MolImporter.readMol(MolImporter.java:859)
 at chemaxon.formats.MolImporter.read(MolImporter.java:747)
 at chemaxon.formats.MolImporter.read(MolImporter.java:717)
 at chemaxon.sss.search.SmartsAtomMatcher.createMolHandlerFromSMARTS(SmartsAtomMatcher.java:109)
 ... 16 more

ChemAxon 42004978e8

21-03-2012 17:48:06

Hi,


The problem is not with the target but with the query. It seems that a group (CF2) is interpreted as recursive smarts. 


To get a clue why this occurs could you please send the query for which the error occurs? You may send all the 145 we can find the problematic ones. If they are confidential you can send them to support at chemaxon.com.


Thanks,


Robert

User 55ffa2f197

21-03-2012 17:55:22

Hi Rob,


I have sent the smarts to [email protected]. Please take a look. However they are translated into mol objects without problem I am using followin to translate smarts to mol



Molecule q = MolImporter.importMol( elements[0] );


For validity checking I also write them out as cxsmarts they seem OK.


Thanks


Dong


ChemAxon a3d59b832c

22-03-2012 09:01:01

Hi Dong,


 


Thanks for sending the SMARTS file to us.


It indeed contains invalid SMARTS strings. The "CF2" used in the recursive SMARTS expressions is not correct.


(It looks like SMARTS import does not look into the recursive parts for efficiency reasons. Only the searching code tries to import these recursive parts when needed - when the outer parts of the SMARTS match. - This is why you received the error only during searching only.)


 


The substrings "$(CF2)" should be replaced by the correct "$(C(F)F)" representation, if that is the intended substructure group.


 


Best regards,


Szabolcs

User 55ffa2f197

22-03-2012 12:30:01

Thanks for elobrating. Good to know the subtle difference on checking smarts between parsing and searching. it would be good to apply the same criteria.


Thanks


Dong

ChemAxon a3d59b832c

22-03-2012 14:50:06

Hi Dong,


 


it would be good to apply the same criteria.


Yes, I agree. Our colleagues working on SMARTS import has put this issue into their backlog.


(To look inside the recursive smarts during import.)


Best regards,


Szabolcs

User 55ffa2f197

22-03-2012 15:40:49

Hi Szablocs, since I got your attention I will beat on this topic a bit. I know the way I did the matching may not be efficient for large number of mols, ie going through every target mol then looping through all smarts. I vaguely remember Tim Dudgeon mention that cartridge can turn the smarts into customized fingerprints, when the new mol is loaded the counts can be calculated. Here we do need the counts not just on and off. Is it possible to use cartridge function.


Thanks


Dong

ChemAxon a3d59b832c

23-03-2012 09:50:54

Hi Dong,


 


Yes, that is correct. When a searching on a JChem table or JChem index is involved, it will use fingerprints and so will be fast. It is also true that this kind of serching will only give you a yes/no answer. But this can already be used to quickly fill the 0 match counts (non-matching SMARTS patterns). For the matching queries, you will need another step to determine the exact match counts.


 


To bring this speedup into the workflow, you have two choices:


1. Put the molecules into a molecule or any table or index, and then run substructure search with the individual SMARTS strings. OR


2. Put the SMARTS strings into a query table or query index type and then run superstructure search with the individual molecules.


 


This could be done either with JChem Base (even in a temporary Derby database) or the Cartridge.


 


With JChem Base, you should check these documentation and methods:


http://www.chemaxon.com/jchem/doc/dev/


http://www.chemaxon.com/jchem/doc/dev/search/index.html#searchmem


http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/search/Search.html#getMatchCount%28%29


 


If you would like to use the cartridge, please check out these methods / functions:


http://www.chemaxon.com/jchem/doc/dev/cartridge/cartapi.html#jc_compare


http://www.chemaxon.com/jchem/doc/dev/cartridge/cartapi.html#jc_matchcount


 


Best regards,


Szabolcs