multicenter attachment point makes match fail

User 870ab5b546

05-08-2014 20:27:22

Hello,

In my code, the target *.CCOC(=O)C=C |m:0:6.7| fails to match the query CCOC(=O)C=C. Is there an option to set so that a multicenter attachment point that is not actually attached to anything is ignored during a search? Or do I have to manually remove such attachment points before doing the comparison?

-- Bob

P.S. This appears to be a relatively new behavior. I don't know exactly which version of JChem it was, but in a fairly recent previous version, this was not an issue.

ChemAxon abe887c64e

06-08-2014 09:50:34

Hi Bob,

You are right, we could reproduce the change in behavior between 6.0 and the latest version. However, we don't think the present behavior is incorrect. Unfortunately, there isn't any search option to hide the multicenter * atom in the target, so we suggest to delete it manually.

Best regards,

Krisztina

User 870ab5b546

06-08-2014 13:23:12

OK, I've modified our code to remove bondless multicenter attachment points before comparing molecules, but two things.

It would be nice to have a SearchOption, ignoreBondlessMulticenterAttachmentPoints(), so I don't need to modify the Molecule.

I don't understand why you think the current matching behavior is correct. A multicenter attachment point has no chemical meaning if it isn't actually making a bond to anything. Without a bond, it's a conceptual grouping of atoms in a molecule, but not an actual chemical feature. It makes no sense to me that ethyl acrylate with the two pi bond C atoms in a group won't match to ethyl acrylate.

ChemAxon abe887c64e

06-08-2014 13:55:25

I see your point. Please, have a look at our Attached Data function. It allows to attach any data to any part of a structure. (In MarvinSketch: Structure>Add>Data...) and, by default, the search ignores these data. See the description of the relevant search options here.

Krisztina

User 870ab5b546

06-08-2014 14:13:06

How is attached data relevant? If you're suggesting that I can use attached data instead of a multicenter attachment point, it won't work for us. In order to draw mechanisms involving transition metals, we sometimes need a pi bond to have a multicenter attachment point, as in the following mechanistic step:

<?xml version="1.0" encoding="UTF-8"?>

<cml xmlns="http://www.chemaxon.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.chemaxon.com/marvin/help/formats/schema/mrvSchema_6_2_0.xsd" version="ChemAxon file format v6.2, generated by v6.2.1">

<MDocument>

  <MRectangle id="o1">

    <MPoint x="-10.414816856384277" y="6.210670471191406"/>

    <MPoint x="0.6210671067237854" y="6.210670471191406"/>

    <MPoint x="0.6210671067237854" y="-0.38219529390335083"/>

    <MPoint x="-10.414816856384277" y="-0.38219529390335083"/>

  </MRectangle>

  <MRectangle id="o2">

    <MPoint x="1.958749696262462" y="6.21067055119374"/>

    <MPoint x="12.23024366761027" y="6.21067055119374"/>

    <MPoint x="12.23024366761027" y="-0.3821952139010172"/>

    <MPoint x="1.958749696262462" y="-0.3821952139010172"/>

  </MRectangle>

  <MPolyline id="o3" headLength="0.6" headWidth="0.4">

    <MPoint x="0.6210671067237854" y="2.9142375886440277"/>

    <MRectanglePoint pos="7" rectRef="o2"/>

  </MPolyline>

  <MChemicalStruct>

    <molecule molID="m1">

      <atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18" elementType="Pd C Br P P C C Cl Pd C Br P P C C Cl X X" sgroupRef="0 0 0 0 0 sg2 sg2 0 0 0 0 0 0 sg1 sg1 0 0 0" x2="-6.448750019073486 -5.3598055760462024 -5.359805576046203 -7.537694462100769 -7.537694462100769 -2.887500047683716 -2.887500047683716 -1.55382092585568 6.288241350820014 7.377185793847298 7.377185793847297 5.199296907792731 5.199296907792731 8.846229756047675 8.846229756047675 10.179908877875711 8.846229756047675 -2.887500047683716" y2="2.887500047683716 3.9764444907109984 1.7985556046564326 1.7985556046564326 3.976444490710999 3.70562481880188 2.16562481880188 1.3956248188018798 2.9390625681195948 4.028007011146878 1.8501181250923115 1.8501181250923115 4.028007011146878 3.757187339237759 2.217187339237759 1.4471873392377588 2.987187339237759 2.93562481880188"/>

      <bondArray>

        <bond id="b1" atomRefs2="a1 a2" order="1"/>

        <bond id="b2" atomRefs2="a1 a3" order="1"/>

        <bond id="b3" atomRefs2="a4 a1" convention="cxn:coord"/>

        <bond id="b4" atomRefs2="a5 a1" convention="cxn:coord"/>

        <bond id="b5" atomRefs2="a6 a7" order="2"/>

        <bond id="b6" atomRefs2="a7 a8" order="1"/>

        <bond id="b7" atomRefs2="a9 a10" order="1"/>

        <bond id="b8" atomRefs2="a9 a11" order="1"/>

        <bond id="b9" atomRefs2="a12 a9" convention="cxn:coord"/>

        <bond id="b10" atomRefs2="a13 a9" convention="cxn:coord"/>

        <bond id="b11" atomRefs2="a14 a15" order="2"/>

        <bond id="b12" atomRefs2="a15 a16" order="1"/>

        <bond id="b13" atomRefs2="a17 a9" convention="cxn:coord"/>

      </bondArray>

      <molecule id="sg1" role="MulticenterSgroup" molID="m2" atomRefs="a14 a15" center="a17"/>

      <molecule id="sg2" role="MulticenterSgroup" molID="m3" atomRefs="a6 a7" center="a18"/>

    </molecule>

  </MChemicalStruct>

  <MEFlow id="o5" arcAngle="140.0" headSkip="0.25" headLength="0.5" headWidth="0.4" tailSkip="0.25" baseElectronContainerIndex="-1" baseElectronIndexInContainer="0">

    <MEFlowBasePoint atomRef="m1.a18"/>

    <MAtomSetPoint atomRefs="m1.a18 m1.a1" weights="0.25 0.75"/>

  </MEFlow>

</MDocument>

</cml>

At the same time, we want to be able to match ClCH=CH₂ to the structure in the drawing, which may (or may not) have the multicenter attachment point.

ChemAxon abe887c64e

08-08-2014 07:56:21

Hi Bob,

In substructure search both structures - with / without multicenter atom - matches with ClCH=CH2.In case you would like to exclude substituted ClCH=CH2 targets, the application of s* query atom property could help.

Best regards,

Krisztina

User 870ab5b546

11-08-2014 17:30:06

kvajda wrote:

In substructure search both structures - with / without multicenter atom - matches with ClCH=CH2.In case you would like to exclude substituted ClCH=CH2 targets, the application of s* query atom property could help.

Yes, I guess I could do that, but that would require going through every atom of the query molecule and setting the query property to s*. And that's no more efficient, and maybe a lot less efficient depending on how your search algorithm works, than going through every atom of the target molecule and removing it if it's a bondless multicenter attachment point. (I use MolSearch only.) I would also worry about changing from full searches to substructure searches and how that might affect other kinds of matching, such as stereochemistry.

Again, I don't understand why you think it's desirable to have the target *.CCOC(=O)C=C |m:0:6.7| fail to match the query CCOC(=O)C=C. It makes no chemical sense. But even if you want to have this behavior, I respectfully ask you to implement an option to disable it. And I would encourage you to have the default behavior of this option be for the match to be found.

-- Bob

ChemAxon 25dcd765a3

12-08-2014 15:29:01

Hi Bob,

Let me explain what is our problem with your request (which may be a representation problem).

In case of full structure search all atoms of the query should match to one of the target atoms and all target atoms should match to one of the query atoms (no exception). The multicenter atom which you use in this case is a MolAtom object, which can match only to multicenter MolAtom object. Changing the search algorithm to not follow these rules would be really painful.

But let's not give up and try to find a solution to the problem.

To find a solution I would like to understand the use case in detail.

I understand that you would like to represent mechanisms involving transition metals. What is not clear for me that why do you need multicenter atom in the representation? I can draw an electron flow from the double bond without any problem, see attached screenshot. What is the additional information what you would like to represent with the multicenter atom?

User 870ab5b546

12-08-2014 15:55:01

volfi wrote:

In case of full structure search all atoms of the query should match to one of the target atoms and all target atoms should match to one of the query atoms (no exception). The multicenter atom which you use in this case is a MolAtom object, which can match only to multicenter MolAtom object. Changing the search algorithm to not follow these rules would be really painful.

We agree that all atoms of the query should match to one of the target, and vice versa. However, even though you represent the multicenter attachment point as a MolAtom, it is not an atom. It is a fictional object created for the purpose of creating multicenter bonds. So I would say that all target atoms should match query atoms, but not all target MolAtoms should necessarily match query MolAtoms.

Of course, I can't speak to the cost-benefit analysis, and obviously it's your prerogative to say that a particular change is too costly or painful for you to implement.

volfi wrote:

I understand that you would like to represent mechanisms involving transition metals. What is not clear for me that why do you need multicenter atom in the representation? I can draw an electron flow from the double bond without any problem, see attached screenshot. What is the additional information what you would like to represent with the multicenter atom?

Your picture represents the cleavage of the alkene pi bond and the formation of a bond between one of the C atoms of the alkene and the Pd (with concomitant formation of a carbocation), not the formation of a pi complex between the alkene and the Pd. See the two pictures below.

ChemAxon 25dcd765a3

13-08-2014 10:32:46

I have modified my example according to your need.

Is it acceptable for you?

I have also attached the mrv representation.

User 870ab5b546

13-08-2014 16:56:16

As far as I can tell, you just drew a text asterisk next to the double bond. The appearance of the picture is not what matters. What matters is that there should be a difference between the MRV representations of, on the one hand, a π bond being used to make a σ bond and a carbocation, and, on the other hand, a π bond being used to make a π complex. We use the electron-flow arrow sources and sinks to calculate the products of the electron-flow arrows that a student drew. Whether you use a text box to draw an asterisk doesn't affect the electron-flow arrow sources and sinks.

Anyway, the issue is not the electron-flow arrows per se, it's the fact that the matching algorithm considers a fictional atom.

ChemAxon 25dcd765a3

14-08-2014 11:13:39

Yes, I thought the depiction what is the important, but I understand that this is not the case.

So you would like to differentiate a π bond being used to make a σ bond and a carbocation, and a π bond being used to make a π complex. Are these chemically differentiable?

On the other hand you are right the problem is that the matching algorithm considers a fictional atom, so we should figure out something which can differentiate the two bonds but not with a fictional atom. What about adding a property to the bond? In this case you could use the electron-flow arrow arrow source to calculate the products.

Attached an example mrv.

User 870ab5b546

14-08-2014 14:07:49

volfi wrote:

So you would like to differentiate a π bond being used to make a σ bond and a carbocation, and a π bond being used to make a π complex. Are these chemically differentiable?

Yes, they are differentiable and very different. If you consider X=Y making a bond to Z, in one case you end up with only an X-Z bond, and in the other you end up with both X-Y and X-Z bonds (expressed as a dative bond from the X=Y π bond).

volfi wrote:

On the other hand you are right the problem is that the matching algorithm considers a fictional atom, so we should figure out something which can differentiate the two bonds but not with a fictional atom. What about adding a property to the bond? In this case you could use the electron-flow arrow arrow source to calculate the products.

So you're saying, instead of making the X=Y bond into a multicenter group, and thereby adding a fictional atom to the molecule, we should ask the user to add a property to the X=Y bond, and then use the property to change the meaning of the X=Y to X-Z electron-flow arrow? I can imagine a lot of problems with this workaround. For one, the process involves the user learning two MarvinSketch features, not just one. (Remember, I am working with students, not MarvinSketch experts.) For another, if the user then draws the products of the electron-flow arrow, he or she is going to have to add the multicenter group anyway, and then remove the bond property, so the process requires extra steps. No, I would rather just look for and remove any multicenter attachment points that have zero bonds (i.e., have no chemical meaning) before comparing structures. But I still wonder why you can't do this step internally in the search algorithm.

ChemAxon 25dcd765a3

15-08-2014 11:18:35

bobgr wrote:

volfi wrote:

So you would like to differentiate a π bond being used to make a σ bond and a carbocation, and a π bond being used to make a π complex. Are these chemically differentiable?

Yes, they are differentiable and very different. If you consider X=Y making a bond to Z, in one case you end up with only an X-Z bond, and in the other you end up with both X-Y and X-Z bonds (expressed as a dative bond from the X=Y π bond).

Yes it is quite obvious that they are differentiable after the reaction took place, but my question refers to whether they are differentiable beforehand. I think they are not.

bobgr wrote:

volfi wrote:

On the other hand you are right the problem is that the matching algorithm considers a fictional atom, so we should figure out something which can differentiate the two bonds but not with a fictional atom. What about adding a property to the bond? In this case you could use the electron-flow arrow arrow source to calculate the products.

So you're saying, instead of making the X=Y bond into a multicenter group, and thereby adding a fictional atom to the molecule, we should ask the user to add a property to the X=Y bond, and then use the property to change the meaning of the X=Y to X-Z electron-flow arrow? I can imagine a lot of problems with this workaround. For one, the process involves the user learning two MarvinSketch features, not just one. (Remember, I am working with students, not MarvinSketch experts.) For another, if the user then draws the products of the electron-flow arrow, he or she is going to have to add the multicenter group anyway, and then remove the bond property, so the process requires extra steps. No, I would rather just look for and remove any multicenter attachment points that have zero bonds (i.e., have no chemical meaning) before comparing structures. But I still wonder why you can't do this step internally in the search algorithm.

I would like to focus first to the problem and not to the solution. I understand that one solution for the problem is changing the search algorithm. I was suggesting things according to my knowledge. I'm sorry if that does not solves your problem but at least it will give me more insight of your problem. I try to summarize your use case now, but I would be glad if you could make additional comments on that so the original problem will be clear for me:

You have a student who sketches chemical reaction.

You want to validate the result of the sketch (check if it is correct or not programmatically).

The programmatic checking is based on full structure search, you are searching for the reagents in the database.

You compare the sketched result with the hit available in the database.

You want to mark the double bond of one reactant in the reaction for some reason which is not clear for me why.

I feel I miss an important part of the use. Could you please describe in detail, step by step.

You know the use case and I would like to help, but rewriting the search algorithm for something which is not totally clear for us is not possible.

User 870ab5b546

15-08-2014 15:21:33

OK, I'll try. First, it is important to note that we do not look up any compounds in JChem tables; we use MolSearch to compare targets derived from a student response to queries that a question author specifies.

The student sketches the mechanism of a given reaction. The mechanism consists of a series of steps, each in a box, connected by graphical arrows. In each step are some compounds and electron-flow arrows.

We evaluate the mechanism that the student drew in many different ways. One way is to make sure that the student used all of the starting materials that they were supposed to use. We classify the compounds in the student's mechanism as response products (have no electron-flow arrows touching them), response starting materials (have electron-flow arrows touching them and not produced by electron-flow arrows in a previous step), and response intermediates (neither response starting materials not response products). We then compare the response starting materials to the starting materials that the question author requires them to use. (In some cases, a starting material may have different acceptable forms; e.g., HCl or H⁺ may both be acceptable.)

Now, consider the question, "Draw the mechanism of the Heck reaction between PhBr and CH₂=CHCO₂Et catalyzed by (Ph₃P)₄Pd." An author writing the question would require the student's mechanism to contain the starting materials PhBr, CH₂=CHCO₂Et , and (Ph₃P)₄Pd. But if the student writes a mechanism in which CH₂=CHCO₂Et coordinates to (Ph₃P)₂Pd(Ph)Br, he would place a multicenter attachment point on the π bond so that he could show the formation of a dative bond. And then the search for CH₂=CHCO₂Et in the mechanism would fail.

Now, the author could add the multicenter attachment point to the CH₂=CHCO₂Et that JChem is supposed to look for in the mechanism. But, aside from this additional step being counterintuitive for the author, suppose the student draws a different mechanism, one in which the insertion of the π bond into the Pd-Ph bond occurs without prior coordination of the π bond to Pd. Here the student would not add a multicenter attachment point to the π bond. Now, you might not want to mark such a mechanism correct, but regardless, you wouldn't want the program to say that the student failed to use CH₂=CHCO₂Et as a starting material.

Finally, it would be possible to require the author to write every starting material twice: once with the multicenter attachment point, and once without, and then compare the response starting materials to each one separately. But this would be very awkward, and besides it would require the author to anticipate any of the multicenter attachment points that a student might draw, which could be legion.

So, what I am now doing is looking at every atom in a target and removing it (and its associated Sgroup) if it is a bondless multicenter attachment point. But again, the sensible approach is to modify the search algorithm to say that a bondless multicenter attachment point is a ghost and is ignored, at least by default.

ChemAxon 25dcd765a3

19-08-2014 16:19:24

Thank you for the detailed use case explanation. I think we have understood the problem.

As you are always compare just reactants, products or intermediates and compare just fragments, the solution would be to use full fragment search instead of full search. What do you think, would it work for you or there are some hidden facts that does not allow to use this solution?

User 870ab5b546

19-08-2014 16:36:44

I guess that might work if JChem did not consider the multicenter attachment point to be part of the same fragment as the molecule to which it belonged. But when we get the student's response, we already fragment the imported Molecule to generate the individual molecules in the response, and the multicenter attachment point stays with its parent molecule. Are you saying that the multicenter attachment point is considered a separate fragment under some circumstances, but not under others?

ChemAxon 25dcd765a3

20-08-2014 07:47:30

Yes you are right. The fragmentation method can be parametrized, see the findFrags() method of Molecule. So you can choose which type of fragmentation you want to follow.

User 870ab5b546

20-08-2014 17:59:34

Interesting... We have been using Molecule.convertToFrags(), which does not allow parametrization, as findFrags() does.

So far we have mentioned only the cases where the multicenter attachment point has no bonds. Suppose, though, it does have a bond, as in the target *[Pd](Cl)Cl.C=C |m:0:4.5,C:0.0|. I would not want this target to match the query CH₂=CH₂, because the multicenter attachment point is no longer fictional; there is a real bond between the alkene and the Pd, and hence we do not have the compound CH₂=CH₂. I gather from the ChemAxon Extended SMILES representation of the π complex that in a full fragment search, the target would match the query. Is that correct? If so, then a full fragment search would not solve my problem.

ChemAxon d4fff15f08

22-08-2014 14:51:03

Hi Bob,

Full fragment will clearly not consider the CH2=CH2 for *[Pd](Cl)Cl.C=C |m:0:4.5,C:0.0| as hit. It is a substructure of it, but not a full fragment match. It is because of the bond formed (now the whole structure is a fragment).

Best regards,

Norbert

User 870ab5b546

25-08-2014 13:49:46

OK, then full fragment should satisfy our needs. Thanks.