I am using the MCS java class in development and could not find information on how the approximations/heuristics in the MCS fast and turbo mode are done. It would be important for me, so that I know how far from the exact MCS of the two molecules I am (especially regarding the size of the MCS).
If I am right, the MCS (better MCES) is calculated via backtracking?
Indeed, MCES is calculated by a backtracking algorithm. Heuristics prune various possible search paths. In nut shell, the key heuristics in turbo mode omits some atoms of the largest MCES found so far as a starting point for the next mach; while in fast mode not all possible bonds of atoms part of the MCES are traversed.
Both heuristics may lead to loss of solution, although the likelihood of of missing the exact MCES highly depends on the particular structures. We have not yet published rigourous statistical analysis but as a rough estimate (inferred from LibraryMCS runs) heuristics may lead to loss of the exact solution in less than 5% of molecule pairs. In these cases the found MCES size is smaller by the exact one by less than 6 atoms, typically.
Apologies that we are not able to provide more comprehensive statistics at present. However, we appreciate your contribution to such analysis. All comments, suggestions are warmly welcome.
first of all thanks for the reply. I am going to conduct some experiments on some smaller datasets (~500 Instances) to check how much the difference is and what threshold I should use. What I want to do is, use the size of the FAST or TURBO MCES as a decision criterion to cut down the number of calculations of the EXACT MCES.
I'll let you know when the results are there or if another problem comes up.