unqiue smiles with labels

User d83ec9d6e4

21-01-2005 17:40:27

The unique smiles algorithm is very powerful, but there's one limitation that I don't quite understand, why can't labels be respected during canonicalization?





I notice that the unqiue smiles for


[C:1]NC is [CH3:1]NC


while for CN[C:1] it is CN[CH3:1].





If you substitute a charge instead of an atom map it works fine:


[C+]NC -> CN[CH2+] and CN[C+] -> CN[CH2+]





I've hacked out a way to use other properties like charge to resolve these kinds of issues where I just need a unique smiles for a labeled molecule, but shouldn't it be straight-forward to include these tags directly into the canonicalization algorithm (thus making it unqiue for all cxsmiles)?

User f359e526a1

21-01-2005 21:07:10

Yeah, seems to be a good idea.Well, it can be included, if you need it badly, I mean there is a way to include maps as atomic invariants (since they should be invariant.) If you want this feature I can include it and you can have it in the next release (in tree-four weeks or so.) Or even have a build "especially for you" if you need it by yesterday. How are you getting the unique SMILES, using molconvert or cxcalc or MView/MSketch ?

21-01-2005 21:26:00

szilva wrote:
How are you getting the unique SMILES, using molconvert or cxcalc or MView/MSketch ?
I am generally using Molecule.toFormat("smiles:u") though I have also on occassion needed to use "cxsmiles:u".





I have a work-around at the moment, but I'd love to see it in the next release!

User f359e526a1

25-01-2005 10:50:03

$ echo "CN[C:1]" |./scripts/molconvert smiles


CN[CH3:1]


$ echo "[C:1]NC"|./scripts/molconvert smiles


CN[CH3:1]





Note it is considering maps even in the default (non-unique) case. It is in the CVS now, needs testing and some extra integration.