jcsearch inconsistency in SSSR perception?

User 7c177bab3b

14-10-2011 12:36:37

Hi


Running jcsearch on different atom orderings for the same molecule produces different results, as below.


jcsearch -q '[R3]' -s 'C1CC23CCN1CN2CCc1c3oc2ccccc12' --allHits

    Query has 1 match:

        Match 1:[    3 ]

C1CC23CCN1CN2CCc1c3oc2ccccc12


------


$ jcsearch -q  '[R3]' -s 'c1ccc2oc3c(CCN4CN5CCC43CC5)c2c1' --allHits

    Query has 2 matches:

        Match 1:[   10 ]

        Match 2:[   15 ]


C1CC23CCN1CN2CCc1c3oc2ccccc12


 


Regards


Stephen


 

User 7c177bab3b

14-10-2011 12:53:24

Another example, in this case both smiles strings are "canonical", they only differ by the presence of the methoxy groups on the phenyl ring, the core ring system is the same.


jcsearch -q '[R3]' -s 'C1Cc2ccccc2C23CCN(CC2)CN13' --allHits

    Query has 1 match:

        Match 1:[    9 ]

C1Cc2ccccc2C23CCN(CC2)CN13


 


 jcsearch -q '[R3]' -s 'COc1cc2CCN3CN4CCC3(CC4)c2cc1OC' --allHits

    Query has 2 matches:

        Match 1:[    8 ]

        Match 2:[   13 ]

COc1cc2CCN3CN4CCC3(CC4)c2cc1OC


Regards


Stephen

ChemAxon 9c0afc9aaf

14-10-2011 19:54:16

Hi,


Could you please let us know which JChem version are you using ?


(this is also displayed on the top of the output of  "jcsearch -h"


 


Best regards,


 


Szilard

User 7c177bab3b

17-10-2011 08:23:32

Sorry, should have included that.


This is with JChem 5.5.1.0

ChemAxon 8407015329

17-10-2011 11:53:10

Hi,


 


We were able to reproduce the issue with JChem 5.5.1.0, and other versions as well. When the molecule searcher determines the ring count for a given atom, it uses SSSR(Smallest Set of Smallest Rings). Unfortunately the SSSR is not a well defined set, so in some cases the resulting set of rings can depend on the order of atoms in the molecule. After research we concluded that in some cases using canonical SMILES strings can solve the issue, but there are exceptions. The second example you posted is such an exception:


- our SSSR algorithm gives different results


- Daylight gives different results as well (http://www.daylight.com/daycgi_tutorials/depictmatch.cgi)


 


A global solution to this problem would be to use CSSR(Complete Set of Smallest Rings), since it is well defined. In this case the query should be [R4] for hitting the C bridge atom, and maybe it will cause confusion for chemists. Would this solution be acceptable for you?


 


We will further investigate for other solutions to this issue. 


 


Regards,


Vencel

User 7c177bab3b

17-10-2011 12:42:59

Hi


As a user I'm looking for consistency but given the nature of the SSSR I'm guessing there would always be cases where such issues arise. For example, canonicalising the ring group independently of the whole molecule might fix the particular instance above but would still lead to other very similar rings having different SSSRs. However it may be a step forward?


I'm not sure the CSSR is the best approach. Maybe others have a thought on this. The bridgehead can be found using the 'x' primitive taking care to exclude spiros - [Cx4;!R2]


I'm just going to have to rewrite the smarts in a way that gives the superset of matches between the different toolkit implementations I was comparing.


Thanks


Stephen

ChemAxon a3d59b832c

19-10-2011 13:27:39

Hi Stephen,


 


Thanks for your thoughts, we were discussing this issue here internally. As you write, SSSR will never give a consistent solution, not even with a canonicalized ring system.


For example, let's consider cubane (C12C3C4C1C1C2C3C41 ). Using SSSR in this molecule would always give 4 hits to [R3], and another 4 to [R2]. It is true regardless substitution and choice of the SSSR rings (canonical or not).


On the other hand, the CSSR approach would give 8 hits to [R3] and 0 hits to [R2].


 


Could you tell us why you thought CSSR would not be a good approach?


 


It seems to me that it change the semantics only of those cases where SSSR is not well defined. (CSSR chooses the higher ring count, and with SSSR it by chance which result comes out.)


 


Thanks,


Szabolcs

User 7c177bab3b

21-10-2011 08:54:53

Hi Szabolcs


As you say using CSSR resolves ambiguity and that should be a good thing. CSSR makes the ring perception more local, more a property of the atom, so much more relevant within a smarts matching (substructure searching) context.


I was just cautious for all those applications built on the (necessarily imperfect) SSSR.


However, I'd welcome the use of CSSR as the default or as an option.


Regards


Stephen

ChemAxon a3d59b832c

26-10-2011 13:54:36

Hi Stephen,


 


I am sorry for the late answer.


 


In this case we will make a cautious step: we will add a new search option that switches the SMARTS ring property matching to use CSSR. The default will stay SSSR. This way we can easily evaluate the changes between the two.


JChem 5.7 is in beta testing now, and is feature-freezed. Maybe we could add this option in 5.8, which should be released around the end of the year.


 


Best regards,


Szabolcs

ChemAxon 4a2fc68cd1

25-09-2012 07:22:59

Hi Stephen,


We are sorry for not notifying you for so long. You may have already noticed that the new search option was intorduced in JChem 5.8, so you can use the CSSR option in the recent JChem versions (but the SSSR option remained the default one). In case you have already tried it, did it solve your problems?


Best regards,
Peter