R-group decomposition and canonicalization of SMILES results

User b701b293b4

03-07-2011 04:14:08

Hi,


I ran rgdecomp on an sdf file with the option -a R. [Note, the resulting .txt file couldn't be read by Excel (Version X), but could be by Word.] Upon examining the resulting SMILES I discovered that the same substituent was encoded by two different SMILES. Needless to say, this complicated the analysis. Sorry, I can't supply structures. The errant substituent was ortho-fluoro benzyl.


Yvonne

ChemAxon fb166edcbd

05-07-2011 13:31:30

Can you give us the 2 SMILES forms of this structure?

User b701b293b4

07-07-2011 16:58:29

Hi,


I cannot give you the full structures of the starting compounds, but here are the two SMILES that refer to the same substituent:



CC1=CC=CC=C1F


CC1=C(F)C=CC=C1


I can see no pattern in the structures of the starting compounds that yielded one or the other decompositions.




ChemAxon fb166edcbd

10-07-2011 17:13:57

You can use unique SMILES by setting -f smiles:u which will output an aromatized unique form. Example:


rgdecomp -q '
  • C1CCCCC1 |$_R1;;;;;;$|' "CC1=CC=C(C=C1F)C1CCCCC1" -f smiles:u
  • C1CCCCC1

  • Cc1ccc(cc1F)C1CCCCC1
    Cc1ccc(*)cc1F

If you prefer the defaule SMILES table output:


rgdecomp -q '
  • C1CCCCC1 |$_R1;;;;;;$|' "CC1=CC=C(C=C1F)C1CCCCC1"
    [*:1]C1CCCCC1 [*:1]
    CC1=CC=C(C=C1F)C1CCCCC1 CC1=CC=C(*)C=C1F

  • then I can modify this to output the unique SMILES form as well.


    Please let me know if you think this would be reasonable.


    Note, that the -a R (attachment in atom label with R prefix) option is the same as -a N (no attachment in output) because atom labels are not written in SMILES:


    rgdecomp -a R -q '
  • C1CCCCC1 |$_R1;;;;;;$|' "CC1=CC=C(C=C1F)C1CCCCC1"
    [*:1]C1CCCCC1 [*:1]
    CC1=CC=C(C=C1F)C1CCCCC1 CC1=CC=CC=C1F

  • By default, attachments are written as any-atoms which can be seen in SMILES.


    You can see the effect of -a R in MRV or SDF:


    rgdecomp -a R -q '
  • C1CCCCC1 |$_R1;;;;;;$|' "CC1=CC=C(C=C1F)C1CCCCC1" -f mrv | mview -c 2 - &