RGroup Decomposition using API - molfile target problem

User c2ffbfa8f8

25-08-2010 19:15:48

Hi again,

I am seeing some differences in Decompositions when I have molfiles that have come from different sources. Attached is a snippet to show how two seemingly identical structures decompose differently (the succesful decomposition returns 3 ligands, and the unsuccesful returns 2). Is there a way in the API of standardising these molfiles so I can get consistent results?

I have also attached the molfiles used in the snippet. The only differences I can see is the order of atoms in the connection table.

I am able to work around these differences by importing the molfile, converting it to a SMILES, then re-importing, but obviously this is not a realistic solution.

I am using "SearchConstants.UNDEF_R_MATCHING_ALL" again, but I think this is the only way I can get a succesful decomposition for this particular query and target.

I am using JChem 5.3.2.

Thanks, hope this makes sense.

ChemAxon fb166edcbd

25-08-2010 23:01:43

The problem here is again that RGroupDecomposition is prepared to handle only R-atom matching group (UNDEF_R_MATCHING_GROUP, UNDEF_R_MATCHING_GROUP_H, UNDEF_R_MATCHING_GROUP_H_EMPTY). The solution is to add an R4 in order to match the remaining ligand, since RGroupDecomposition always requires full matching.

I have modified your code again to leave the R-atom matching behavior unchanged and to use the q1.mol query:

Expected:
SCAFFOLD: N1C=CC=N1
        LIGANDS: 5, *N1C=CC=C1
        LIGANDS: 6, S*
        LIGANDS: 7, *C1=CC=CC=C1
        LIGANDS: 8, CC(C)(C)OC(*)=O


Not expected:
SCAFFOLD: N1C=CC=N1
        LIGANDS: 5, *N1C=CC=C1
        LIGANDS: 6, S*
        LIGANDS: 7, *C1=CC=CC=C1
        LIGANDS: 8, CC(C)(C)OC(*)=O

Now you have the same result for both.

There were big changes in RGroupDecomposition in the 5.3 release. The Decomposition object was introduced and the whole API and the algorithm behind went through major refactoring. This full matching behavior had not been a requirement before 5.3.

User c2ffbfa8f8

06-09-2010 08:46:13

Hi Nora, is there a preferred way to use the API to do non-full matching of RGroups (even though RGroupDecomposition requires full-matching). This is functionality that is quite common in other RGroup decomposition tools apparently. Thanks again.

ChemAxon fb166edcbd

06-09-2010 10:24:16

You can use RGroupDecomposition.addRGroups(Molecule query) or RGroupDecomposition.addRGroups(Molecule query, int bondtype) to add the missing R-atoms automatically. This method is called internally whenever the query does not contain R-atoms at all, e.g. if you simply use "N1C=CC=N1" as query.

If you have some R-atoms but not at all possible places, then you should call one of the above methods explicitly before you set the query. I attach the modified code (look at line 48) and a query with one R1 attached to the N atom. If you do not need the other R-atom matches then simply ignore it when processing the output.

Example run:

java ChemaxonForumPostMolfileDecompositionMod2
Expected:
SCAFFOLD: N1C=CC=N1
        LIGANDS: 5, CC(C)(C)OC(*)=O
        LIGANDS: 6, *C1=CC=CC=C1
        LIGANDS: 7, S*
        LIGANDS: 8, *N1C=CC=C1


Not expected:
SCAFFOLD: N1C=CC=N1
        LIGANDS: 5, CC(C)(C)OC(*)=O
        LIGANDS: 6, *C1=CC=CC=C1
        LIGANDS: 7, S*
        LIGANDS: 8, *N1C=CC=C1