User 21b7e0228c
10-11-2010 14:23:38
Hi,
I wonder whether there is (or there will be, in a glorious future) some :option to generate a unique SMILES string which, unlike the current :u option, does NOT attempt to tamper with bond orders/aromaticity. In other words, I'd like an option which, if provided with some arbitrary Kekülé "cyclohexatriene" structure, would return... a canonical way to write cyclohexatriene as smiles, and not smartass aromatized benzene. After all, the user has other tools at hand - aromatization, the mesomerize option, tautomer management, etc, to make sure bond orders are being set coherently in the molecules BEFORE using smiles:u. In other words, I don't think it's a good idea to let smiles:u option play the role of Standardizer Almighty. For example:
echo "O=C1C=CNC=C1" | molconvert smiles:u --- > produces "O=c1cc[nH]cc1"
with a five-legged carbonyl carbon! I positively hate this overaggressive aromatization, which may cause a lot of trouble if you try to read the output in some other god-fearing software, etc. That's why we decided to stick with the softer aromatize:b/l option in our standardization approach. Unfortunately,
echo "O=C1C=CNC=C1" | molconvert smiles:u | standardize -c "aromatize:l"
does NOT reverse the aggressive aromatization, you actually need to "dearomatize..aromatize:l" Of course, one could live with this in virtually all situations of real life, so my post is basically just for the sake of academic haggling: in my view, smiles "canonicalization" is about generating a unique string for a specified molecular graph, with the bond orders as input, and not an attempt to beautify the graph itself. For that matter, if you push split-charge nitro groups through smiles:u, it will not convert to pentavalent N - so smiles:u cannot be directly employed to check for "duplicate" molecules in a collection by checking for identical smiles:u - unless you take explicitly care for standardization issues. Simes:u thus somewhat unconfortably overlaps with the standardizer, and I'd prefer to keep them apart, so we know who's doing what.
At least... if you keep smiles:u working the way it is (I'm happy with that, either), it would be necessary to explain in the documentation that smiles:u tampers with the aromatization status... my slow neuron needed some time to track the incoherencies in my standardization protocols.
Cheers!
Dragos