User 21b7e0228c
06-05-2014 09:27:50
This one was almost fatal for us, because it concerns the automated handling of molecules on the QSAR predictions server. Consider the following:
echo "[O-][N+](=O)c1ccc(cc1)C(=O)Nc1cc2[n][n]([n]c2cc1)C1CCCC1" | standardize --unstandardized-mol-on-error -f smiles:u-a -c "mesomerize..[O;D1:2]=[N:1]=O>>[O;D1-:2][N+:1]=O..[O;D1:2][N:1]=O>>[O;D1-:2][N+:1]=O..[#7;h0v4:1][O;D1:2]>>[#7;h0v4+:1]-[#8;D1-:2]..[#7;h0v4:1]>>[#7;h0v4+:1]..[*:3]=,:[#7:1]([*:4])=[O:2]>>[*:4][#7+:1](=,:[*:3])-[#8-:2]"
proudly produces [O-][N+](=O)C1=CC=C(C=C1)C(=O)NC1=CC2=NN(N=C2C=C1)C1CCCC1 on an older installation 5.11.5
Now, Satan incited me to upgrade to 6.1.7 on the web server... and I realized with horror that, on that version, the above command returns... nothing!
I could have lived with an EMPTY NEWLINE - but nothing is a no go, because it offsets all the molecular IDs, and I no longer knew, as the incriminated smiles was amid 2000 others, which is the line that was zapped upon standardization. Note that it uses --unstandardized-mol-on-error: all I want is have SOME spacekeeper allowing me to assign correctly the initial IDs to the compounds that succeeded in standardization, and know which failed.
This is an absolutely important issue for us - if a tester wants its molecules predicted, he may at some point get predictions of compound i+1 assigned to compound i... I have therefore no other solution but to wrap up the standardizer in a molreader loop and make damned sure it WILL spit something out for every line it reads in! But.... isn't it better to make it work that way by default?
Cheers!