User b701b293b4
03-07-2011 04:14:08
Hi,
I ran rgdecomp on an sdf file with the option -a R. [Note, the resulting .txt file couldn't be read by Excel (Version X), but could be by Word.] Upon examining the resulting SMILES I discovered that the same substituent was encoded by two different SMILES. Needless to say, this complicated the analysis. Sorry, I can't supply structures. The errant substituent was ortho-fluoro benzyl.
Yvonne
ChemAxon fb166edcbd
05-07-2011 13:31:30
Can you give us the 2 SMILES forms of this structure?
User b701b293b4
07-07-2011 16:58:29
Hi,
I cannot give you the full structures of the starting compounds, but here are the two SMILES that refer to the same substituent:
CC1=CC=CC=C1F
CC1=C(F)C=CC=C1
I can see no pattern in the structures of the starting compounds that yielded one or the other decompositions.
ChemAxon fb166edcbd
10-07-2011 17:13:57
You can use unique SMILES by setting -f smiles:u
which will output an aromatized unique form. Example:
rgdecomp -q '
- C1CCCCC1 |$_R1;;;;;;$|' "CC1=CC=C(C=C1F)C1CCCCC1" -f smiles:u
- C1CCCCC1
Cc1ccc(cc1F)C1CCCCC1
Cc1ccc(*)cc1F
If you prefer the defaule SMILES table output:
rgdecomp -q '
- C1CCCCC1 |$_R1;;;;;;$|' "CC1=CC=C(C=C1F)C1CCCCC1"
[*:1]C1CCCCC1 [*:1]
CC1=CC=C(C=C1F)C1CCCCC1 CC1=CC=C(*)C=C1F
then I can modify this to output the unique SMILES form as well.
Please let me know if you think this would be reasonable.
Note, that the -a R
(attachment in atom label with R prefix) option is the same as -a N
(no attachment in output) because atom labels are not written in SMILES:
rgdecomp -a R -q '
- C1CCCCC1 |$_R1;;;;;;$|' "CC1=CC=C(C=C1F)C1CCCCC1"
[*:1]C1CCCCC1 [*:1]
CC1=CC=C(C=C1F)C1CCCCC1 CC1=CC=CC=C1F
By default, attachments are written as any-atoms which can be seen in SMILES.
You can see the effect of -a R
in MRV or SDF:
rgdecomp -a R -q '
- C1CCCCC1 |$_R1;;;;;;$|' "CC1=CC=C(C=C1F)C1CCCCC1" -f mrv | mview -c 2 - &