not getting unique / canonical smiles from r-group decomp

User 22337819af

07-08-2012 13:32:56

Hello


I am doing R-group decomposition (using the RGroupDecomposition class) and we have noticed that we are getting R-groups that are not unique.  2 different chemaxon smiles are produced for the same molecule.  Details / example below.  How do I get these to not be different chemaxon smiles?  Is there a flag / setting in RGroupDecomposition to handle this?  Is there another way?  Thank you,


Dave


 


Details / example:  The R group identified is a tolyl group (phenyl with a methyl attached).  Both of the produced R1 groups are ortho to the methyl on the phenyl, but they are mirror images of each other.  However since the molecule (R group) is planar, these are the exact same molecule.  (See below also in attached example.mrv)


Chemaxon smiles for the core / search molecule:


ChemAxon a3d59b832c

07-08-2012 14:14:00

Hi Dave,


 


I think this is because the smiles canonicalization algorithm does not consider the label for the attachment point.


It seems to me that if you use another type of attachment point, for example ATTACHMENT_POINT constant, then the canonicalization algorithm will be able to consider it.


See:


http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/search/RGroupDecomposition.html#setAttachmentType%28int%29


 


Best regards,


Szabolcs

User 22337819af

07-08-2012 14:43:21

Hello Szabolcs


Thanks for the quick response.  That is probably it; I see that we have done:


rGroupDecomp.setAttachmentType(RGroupDecomposition.ATTACHMENT_RLABEL);


in our code.  Unfortunately I don't remember why we did that so I  can't immediately change it.  Is there any other way to achieve the proper symmetry?  Is there anything in the API where I can test for / achieve symmetry with the labels?


Thanks,
Dave 

ChemAxon a3d59b832c

07-08-2012 17:04:08

Hi Dave,


Is there any other way to achieve the proper
symmetry?  Is there anything in the API where I can test for / achieve
symmetry with the labels?


Not that I am aware of. I will pass on the question to my colleagues working with smiles export.


 


Best regards,


Szabolcs

User 22337819af

07-08-2012 17:49:47

Hi Szabolcs


Thanks, I appreciate that.  I'm also wondering if this is a bug.  There is no difference between the 2 R1 ligands / molecules above, whether R1 is a label or an atom.  Even if it is a label, from symmetry there is no difference.


Dave

ChemAxon a3d59b832c

08-08-2012 09:21:58

Hi Dave,


 


I'll leave it to my smiles expert colleagues to answer...


 


Szabolcs

User 22337819af

09-08-2012 13:58:29

Thanks.  Should I post this question elsewhere to get it in front of smiles experts?  

ChemAxon a3d59b832c

10-08-2012 08:45:43

Hi Dave,


 


I have forwarded the question to my colleagues, but unfortunately the expert is on holiday at the moment.


He will be able to answer next week.


 


Best regards,


Szabolcs

ChemAxon d26931946c

13-08-2012 12:17:33

Hi Dave, 


 


I'm sorry for the late answer.


The extended part of the unique cxsmiles/cxsmarts is not unique. The smiles part have to be unique, therefore the atom order depens only on properties that can be represented in smiles. The extended part is generated only when we have the final atom order.


Regards, 


Peter

User 22337819af

22-08-2012 20:53:55

I'm confused.  What do you mean by "therefore the atom order depends only on properties that can be represented in smiles".  The smiles for both of these R-groups should be identical.  Why is the "final atom order" different for them?

ChemAxon 25dcd765a3

23-08-2012 10:58:59

"Is there anything in the API where I can test for / achieve
symmetry with the labels?"



SMILES cannot handle labels. That is why gezapeti wrote "the atom order depens only on properties that can be represented in smiles". So the canonicalization algorithm does not differentiate properties that cannot be represented in smiles, such a way molecules with labels will be not unique.

User 22337819af

24-08-2012 21:02:49

OK, thank you, I understand that.  Let me ask a follow up related question:  Why is that the R-group decomposition calculation generates these molecule objects that are different?  Is there anyway to handle the molecule objects generated from the R-group decomposition that will identify that these are the same and allow us to output a truly canonical set of smiles for the R-groups produced by R-group decomposition?


Thank you again for your patience answering my questions.

ChemAxon fb166edcbd

29-08-2012 13:10:17

You can use MolSearch for duplicate filtering, but it is not that simple to check atom labels with it. You should set a user-defined MolComparator object for this. I attach some sample code using your input molecules.


java rgdecomp.RGDecompWithDuplicateCheck

Ligands of Cc1ccccc1-c1ccc(cc1)[C@H]1[C@H]2CN(Cc3ccc4OCOc4c3)C[C@@H]1N2:
R1: Cc1ccccc1 |$_AV:;;;;;;R1$|
R2: Cc1ccc2OCOc2c1 |$_AV:R2;;;;;;;;;$|
R3: [H] |$_AV:R3$|

Ligands of COc1cccc(CN2C[C@H]3N[C@@H](C2)[C@@H]3c2ccc(cc2)-c2ccccc2C)c1:
R1: Cc1ccccc1 |$_AV:;;R1;;;;$|
R2: COc1cccc(C)c1 |$_AV:;;;;;;;R2;$|
R3: [H] |$_AV:R3$|

Ligands for R1 are the same.
Ligands for R2 are different.
Ligands for R3 are the same.

User 22337819af

29-08-2012 15:11:36

Wow!! Thank you for the very extensive work and thank you for the source code.  That is really great.