jc_standardize best practices. - ChemAxon Forum Archive

User 7f33ec9a5c

26-10-2012 19:24:12

Hi,

We would like to do our best to standardize SMILES prior to inserting them to our structure table with the following goal: Have one unique smiles for each unique chemical entity. So we'd like to standardize tautomers, aromatize the molecule, preserve steriochemistry, but move the wedges as appropriate to standardize, and remove explicit hydrogens. We do want to allow and preserve dot-seperated smiles, with no change to stocheometry.

My question is, what is your suggestion for best practice to achive this, using jc_standardize(). Using aromatize and removeexplicitH is obvious. I suspect that wedgeclean and convertwedgeinterpretation would be good ideas as well, but as I interpret the manual to say that these will remove unnecessary wedge bonds and clean up necessary wedge bonds, but this is a bit mirky. I'm definately not sure how many of the below SMIRKS would be covered by your Tautomerize option, but getting this option correct seems key to our success.

Currently we are using daylight's smi2cansmi, then applying the following SMIRKS:

[NX3+:1](=[O:2])-[O-;X1:3]>>[N:1](=[O:2])=[O:3] --fix charge-separated nitro groups
[SX4+:1](=[O:2])-[O-;X1:3]>>(=[O:2])=[O:3] --fix charge-separated sulphuro groups
[CX1H0:1]=[NX2H0:2]>>[C-:1]#[N+:2] --charge-separated isocyanide groups
[NX1H0:1]#[NX2H0:2]=[N:3]>>[N-:1]=[N+:2]=[N:3] --charge separate azide groups
[P+:1][O-:2]>>[P:1]=[O:2] --fix charge-separate phosphoro groups
[n:1]-[O-:2]>>[n:1]=[O:2] --charge-separated aromatic N

The daylight standardization represents the minimum level of functionality that we need, but we are interested in any further normalization (especially in moving wedge bonds, and removing meaningless wedge bonds) would be appreciated.

Thank you,

~mike

ChemAxon e08c317633

27-10-2012 07:39:44

Hi,

If your input and output format is SMILES, then the wedge related standardizer actions (like wedgeclean and convertwedgeinterpretation) are not required, because they will not modify anything during standardization. In SMILES the stereo information is stored explicitly ("@" and "@@" define stereo configuration), it is not stored in wedges.

Considering your requirements the suggested standardizer configuration for you is:

"removeexplicitH..[NX3+:1](=[O:2])-[O-;X1:3]>>[N:1](=[O:2])=[O:3]..[SX4+:1](=[O:2])-[O-;X1:3]>>~~(=[O:2])=[O:3]..[CX1H0:1]=[NX2H0:2]>>[C-:1]#[N+:2]..[NX1H0:1]#[NX2H0:2]=[N:3]>>[N-:1]=[N+:2]=[N:3]..[P+:1][O-:2]>>[P:1]=[O:2]..[n:1]-[O-:2]>>[n:1]=[O:2]..tautomerize..aromatize"~~

Note: "tautomerize" dearomatizes the molecules, so the aromatize action should be performed after the tautomerize.

If the duplicate handling in your system is based on SMILES string comparsion, then for the output format use unique SMILES ("smiles:u").

Please let us know if the configuration above works for you.

Zsolt

User 7f33ec9a5c

27-10-2012 20:02:54

COOL! That does it.

...I'm still learning the ChemAxon way, so for others reading this post, the final function call looks like

jc_standardize(<INPUT_SMILES>, 'sep=~ config:removeexplicitH..[NX3+:1](=[O:2])-[O-;X1:3]>>[N:1](=[O:2])=[O:3]..[SX4+:1](=[O:2])-[O-;X1:3]>>~~(=[O:2])=[O:3]..[CX1H0:1]=[NX2H0:2]>>[C-:1]#[N+:2]..[NX1H0:1]#[NX2H0:2]=[N:3]>>[N-:1]=[N+:2]=[N:3]..[P+:1][O-:2]>>[P:1]=[O:2]..[n:1]-[O-:2]>>[n:1]=[O:2]..tautomerize..aromatize~outFormat:smiles:u'~~);

User 7f33ec9a5c

03-12-2012 17:08:57

1). We removed the 'tautomerize' argument from the options. It is great for grouping all tautomers, but we found that it leads to depictions that the chemists do not like.

2). One of our SMIRKS was incorrect. [N:1]-[O-:2]>>[N:1]=[O:2] should be [N+:1]-[O-:2]>>[N:1]=[O:2]

so our chosen option list is below:

'sep=~ config:removeexplicitH..[NX3+:1](=[O:2])-[O-;X1:3]>>[N:1](=[O:2])=[O:3]..[SX4+:1](=[O:2])-[O-;X1:3]>>(=[O:2])=[O:3]..[CX1H0:1]=[NX2H0:2]>>[C-:1]#[N+:2]..[NX1H0:1]#[NX2H0:2]=[N:3]>>[N-:1]=[N+:2]=[N:3]..[P+:1][O-:2]>>[P:1]=[O:2]..[n+:1]-[O-:2]>>[n:1]=[O:2]..[N+:1]-[O-:2]>>[N:1]=[O:2]..aromatize~outFormat:smiles:u'

ChemAxon e08c317633

04-12-2012 09:45:43

1). We removed the 'tautomerize' argument from the options.  It is great for grouping all tautomers, but we found that it leads to depictions that the chemists do not like.

Could you tell us what they don't like in the depiction?

Thanks for the feedback and for the update.

User 7f33ec9a5c

10-12-2012 22:03:11

The following "tautomerize" action produces Oc1ccc(O)n1OC(=O)C(=O)On1c(O)ccc1O

select jcf.standardize('O=C(ON1C(=O)CCC1=O)C(=O)ON1C(=O)CCC1=O','sep=~ config:tautomerize~outFormat:smiles:u') from dual

Aromatizing the ring with the N's is a result our chemists don't like.

See the attached screenshot for before/after pictures.

User 851ac690a0

11-12-2012 21:38:47

Hi,

Thanks for reporting this bug.

The fix will be available in the 5.12 version.

The prediction may fails in cases whenever a hetero atom "X" is connecting to the "N" atom. See the attached figure.

If you have any other "ugly" structure then please let me know here or on the support e-mail.

Thanks.

Jozsi