Bug in MolSearch re atom wildcards? - ChemAxon Forum Archive

User c30a4b265a

02-12-2009 22:03:53

Hi - I am using version 5.2.4 of JChem, JRE version 6 update 17 on Windows XP. I am doing an in-memory molecule substructure comparison using the MolSearch object. Query molecules are being drawn with MarvinSketch version 5.2.3.

It works fine with a basic substructure query. But, if I replace the heavy atoms in my query molecule with the atom wildcard A in MarvinSketch (off the atom toolbar, "More", then "Advanced" tab, at right), then nothing is matched - while of course the results should be a superset of those found using all atom literals (C, N, etc).

I don't think it's a MarvinSketch problem, because when I pass the same kind of query molecule from MarvinSketch to our ISIS data cartridge in Oracle to do a substructure search, that seems to work just fine.

Below is a fragment from my code that is doing the searching in-memory, which should be sufficient for you to see how I am making the call...

Thanks,

Barry

================================

MolSearch ms = new MolSearch(); // search object creation

    queryMol.aromatize();      // aromatization of query molecule
    ms.setQuery(queryMol);     // assignment of query to search
    ms.getSearchOptions().setSearchType(chemaxon.sss.SearchConstants.SUBSTRUCTURE);

    numRows= getRowCount();
    Boolean curr= null;
    boolean sssMatch;
    Molecule targetMol= null;
    int numMatches= 0;
    for (i=0;i<numRows;i++) {
      targetMol= (Molecule) molColumn.getObjectAt(i);
      if (targetMol == null || targetMol.getAtomCount() == 0) continue;

      targetMol= targetMol.cloneMolecule();
      targetMol.aromatize();   // aromatization of target molecule
      ms.setTarget(targetMol); // assignment of target molecule to search

      sssMatch= false;
      try {
        sssMatch= ms.isMatching();
      }
      catch (chemaxon.sss.search.SearchException se)
      {
        se.printStackTrace();
      }
      if (sssMatch) numMatches++;

================================

ChemAxon a9ded07333

03-12-2009 09:54:12

Hi Barry,

Could you send a query and a target that don't match (mrv files if possible)?

Best regards,
Tamás

User c30a4b265a

03-12-2009 18:31:29

Attached are a query molecule, which is just indole with A substituted for heavy atoms, and 2 target molecules that don't match, indole itself with the original atoms, and methyl-indole, which contains indole.

When I use plain indole for the query molecule, it works just as it should though, matching both.

ChemAxon a9ded07333

06-12-2009 20:27:11

Hi Barry,

I checked your code and it's ok. The search fails because your query molecule is aromatized not correctly during the search. We examine the problem and return to you soon.

Thank you for your report,
Tamás

ChemAxon a3d59b832c

07-12-2009 21:38:57

Hi Barry,

There are two possible workarounds for this problem:

You could use explicit aromatic bonds in the query. That prevents the problems in aromatization.

If you are migrating from ISIS, you may find the loose aromatization method more familiar to the old system. Using that other aromatization method instead of the default one (named "general"), you will again get a hit. See more information here:

http://www.chemaxon.com/marvin/help/sci/aromatization-doc.html

http://www.chemaxon.com/marvin/help/developer/beans/api/chemaxon/struc/MoleculeGraph.html#aromatize%28int%29

Let us know if you have any questions.

Best regards,

Szabolcs

User c30a4b265a

09-12-2009 00:35:10

Hi Szabolcs - I replaced the vanilla aromatization with a line like so:
queryMol.aromatize(MoleculeGraph.AROM_LOOSE,true); // aromatization of query molecule

...this seems to work. I used this option because it sounded like otherwise I'd have to detect aromaticity myself in order to set aromatic bond types manually.

Can you give me more information though on what this is doing? My main concern is, might there be any problems with using this (perhaps in false positives now, instead of false negatives)? I am not familiar with the rules for how the loose aromatization algorithm operates. Also, along these lines - should the results obtained with loose aromatization in MolSearch be consistent with what we should observe with ISISDirect?

Thank you,
Barry

ChemAxon a3d59b832c

09-12-2009 13:18:10

bwythoff wrote:

Hi Szabolcs - I replaced the vanilla aromatization with a line like so:
queryMol.aromatize(MoleculeGraph.AROM_LOOSE,true); // aromatization of query molecule

...this seems to work. I used this option because it sounded like otherwise I'd have to detect aromaticity myself in order to set aromatic bond types manually.

Can you give me more information though on what this is doing? My main concern is, might there be any problems with using this (perhaps in false positives now, instead of false negatives)?

Handling of aromaticity is a difficult part in structure searching. Aromatization itself will identify the aromatic rings in the query/target molecule and change them to aromatic.

If your want to avoid false positives, you may rather use StandardizedMolSearch, with vague bond options when necessary.

You can read more about the aromatization theory in the following links:

http://www.chemaxon.com/jchem/doc/user/query_standard.html

http://www.chemaxon.com/jchem/doc/user/query_searchoptions.html#vaguebond

http://www.chemaxon.com/marvin/help/sci/aromatization-doc.html

bwythoff wrote:

I am not familiar
with the rules for how the loose aromatization algorithm operates.
Also, along these lines - should the results obtained with loose
aromatization in MolSearch be consistent with what we should observe
with ISISDirect?

Not entirely. Any aromaticity detection algorithm is just a model, and can fail for certain chemical classes. This is true for the ISIS model and all our models also.

This presentation from a user compares the ISIS and the default ChemAxon aromaticity models (amongst other aspects - see particularly slides 10-13 and 23-24):

http://www.chemaxon.com/forum/viewpost12226.html#12226

Since then, we have added many improvements, especially in query aromatization and vague bond searching.

Loose aromaticity is special in a way that it does not try to be very accurate chemically, but uses a simple pattern, so it is more likely that query and target (molecule) patterns will behave the same way.

Best regards,

Szabolcs