RGroupDecomposition, stop Java Heap space errors

User c2ffbfa8f8

07-10-2010 16:14:54

JChem version: 5.3.2

Hi,

I am running out of memory running a decomposition on a single target. Please see attached code and stack trace.

Unfortunately I can't provide the SMILES and SMARTS but could send them confidentially.

Basically, if I remove the

RGroupDecomposition.addRGroups(query);

line then the decomposition will at least finish, however when I do this I don't get the matching behaviour I am

looking for. The vast majority of decompositions work ok with this code. Apart from providing more memory to Java,

is there a way to quit the decomposition before Java falls over.

Sorry for the sketchy posting.

Thanks

User c2ffbfa8f8

09-10-2010 19:07:50

I suppose what I really need to know is can you reliably add R groups then use the UNDEF_R_MATCHING_GROUP_H_EMPTY flag onto the search:

RGroupDecomposition.addRGroups(query);

rgd.setQuery(query, MolSearchOptions.UNDEF_R_MATCHING_GROUP_H_EMPTY);

This gives me the behaviour that I want but from the docs ( http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/search/RGroupDecomposition.html ) I'm not sure I am using the toolkit in the right way and on occasions I run into the problems above.

User c2ffbfa8f8

10-10-2010 17:15:08

Ah looks like using setTimeoutLimit(30) stops the Heap space problem for the examples I have.

ChemAxon a3d59b832c

11-10-2010 09:03:19

Hi Derek,

Could you send the query and target structures to scsepregi at chemaxon dot com ?

We will check what is going on.

Thank you,

Szabolcs

ChemAxon fb166edcbd

11-10-2010 15:14:18

The problem is that there is no limit when caching the group extensions of a single hit, and this may cause out-of-memory if there are too many possible extensions. This is the case now, because the R-atom addition to the original query has a lot of R-atoms which can match both H and empty. Taking all combinatiions leads to combinatorial explosure.

I am setting a limit of 1000 for the max. number of group extensions and this seems to solve this problem. The correction will be available in the 5.4 release.

Thanks for the report.

User c2ffbfa8f8

11-10-2010 15:31:41

Thanks Nora for having a look so quickly.

On a related note..

I was wondering, seeing as we ideally don't want to add extra RGroups to the query structure (as we remove these ligands from the final decomposition later) , do you think a good work around would be to add explicit hydrogens to the query molecule, thus still providing a full match for RGroupDecomposition but only creating ligands for the handful of RGroups specified originally on the query. Do you think there might be any situations where this might cause unexpected results? I will give this a try anyway, just wondered if you'd approached decomposition this way before.

Hope that made sense!

Derek

ChemAxon a3d59b832c

13-10-2010 07:32:52

Hi Derek,

I think R-group decomposition will return the same answers whether you pass the unmodified or the hydrogenized query, because of the way R-atom queries are handled:

If the query contains at least one undefined R-atom, then substitution is blocked at all other positions.

In other words this means that a full fragment matching is performed if otherwise substructure search was specified.

(Note, that in case when there is no undefined R-atom in the query, individual positions can be blocked using the s* query property.)

http://www.chemaxon.com/jchem/doc/user/query_features.html#undefined_ratoms

With both methods, only substitutions at the R-atoms will be allowed.

Best regards,

Szabolcs