MCS results change if target and query are switched

User 3cdafe845f

20-01-2010 12:40:01

Hi,

I found some weird behavior in the MCS that I don't understand. Here my code:

mol1 = MolImporter.importMol("COC1=C(N)C=C(N)C=C1");
mol2 = MolImporter.importMol("COC1=CC=CC=C1");

MCS mcs = new MCS();

mol1.implicitizeHydrogens(MolAtom.ALL_H);
mol2.implicitizeHydrogens(MolAtom.ALL_H);
mol1.aromatize(2);
mol2.aromatize(2);
mcs.setMolecules(mol1,mol2);
mcs.setMode(MCS.MODE_EXACT);
mcs.setDontBreakRingBonds(false);

mcs.setMinimumCommonSize(1);
mcs.search();

I tried two versions of the first molecule:

a) mol1 = MolImporter.importMol("COC1=CC=C(N)C=C1(N)");
mol2 = MolImporter.importMol("COC1=CC=CC=C1");

b) mol1 = MolImporter.importMol("COC1=C(N)C=C(N)C=C1");

mol2 = MolImporter.importMol("COC1=CC=CC=C1");

The result is different, although it should be the same in my opinion:

a) COc1ccccc1

b) COc(:c:c):c:c

If I now switch target and query and use

mcs.setMolecules(mol2,mol1);

The result is always COc1ccccc1, regardless if I use version a) or b) of the molecule.

Is this a bug? Is there something wrong in my understanding of the procedure or chemistry?

best Regards

Tobias

ChemAxon efa1591b5a

29-01-2010 14:41:54

Hi Tobias,

I confirm that we managed to reproduce the behaviour you experienced, and it is a bug indeed. We try to fix it in the next minor release 5.3.1 (due around the end of Q1/2010).

Thank you for your bug report and apologies for the inconvenience this bug might cause.

Regards,
Miklos

ChemAxon 990acf0dec

25-02-2010 18:34:37

Hi Tobias,

I would like to inform you that we had to make an urgent patch release that was named 5.3.1, therefore the fix promised in this topic is targeted to be included only in the patch release coming at the end of March (probably named 5.3.2).

Best regards,

Akos

ChemAxon 4a2fc68cd1

10-01-2011 15:09:54

Hi Tobias,

I would like to inform you that an entirely new MCS algorithm is introduced in JChem 5.4. It is more effective and more robust.

Now the results don't depend on the order of the query and target molecules, but they can still depend on the order of atoms and bonds within the molecules. It means that you can still obtain different results for equivalent molecules if they were imported from different SMILES strings. This is because the algorithm is not exact, but it is an approximation method. Eliminating all these discrepancies would make the algorithm much more complex and much slower. However, our tests show that the new algorithm is more stable and robust in this aspect, as well.

Would you like to test the new implementation? It is available through API and a simple command line tool, as well. You can try e.g. the following:

mcs -q "COC1=CC=C(N)C=C1(N)" -t "COC1=CC=CC=C1" -w
mcs -q "COC1=C(N)C=C(N)C=C1" -t "COC1=CC=CC=C1" -w

Peter