Hi! I wrote a small program TmpPka.java that reads in mols, standardizes them, send them to pKa plugin, fetches the predominant mikrospecies at given pH, then builds a pharmacophore property map according to the ionization status set by the pka plugin and writes a 5-column output:
1 - molnr.
2 - standardized smiles of input species
3 - standardized smiles of predominant mikrospecies
4 - percentage of the latter at pH=7 (default)
5 - pharmacophore property list
However, the order in which the atoms appear in the latter depends on whether I enter an .sdf or a smiles. Consider, for example, beztetrazole. I do:
echo "c1ccc(cc1)-c2nnn[nH]2"|pka_pharm (I dressed up the java class in the same way you did pmapper and other jchem/bins)
1 c1ccc(cc1)-c2nnn[nH]2 c1ccc(cc1)-c2nnn[n-]2 98.9 Ar;Ar;Ar;Ar;Ar;Ar;Ar;Ar/HA;Ar/HA;Ar/HA;Ar/HA/NC
How nice! It charges tetrazole negatively, and decides that the latest n in the smiles is aromatic, acceptor AND negatively charged (NC), while the other Ns are Ar/HA
Now, however, when I dump an .mol file of the same tetrazole on input (see tetrazole.mol), then i get:
1 c1ccc(cc1)-c2nnn[nH]2 c1ccc(cc1)-c2nnn[n-]2 98.9 Ar;Ar/HA/NC;Ar/HA;Ar/HA;Ar/HA;Ar;Ar;Ar;Ar;Ar;Ar
which is the same except that the flags are written out in the ancient order of atoms in the original .mol file - in spite of all the undertaken standardization! So, how can I know what flag belongs to what atom in a pmap???
Actually, the intended behaviour is to output pmap in the original
atom order, optionally in the standardized molecule atom order, but neither of these will coincide with the SMILES format atom order in general. The problem is that SMILES export may output atoms in a different order because we want to output unique SMILES - therefore compute graph invariant numbers for each atom and use these to determine the output atom order.
The original atom order in the input molecule is the atom order the user already knows - therefore by default our tools use this atom order for atomic output. And for the reason described above, this may differ from the atom order in the SMILES form of the molecule.
Standardization cannot explicitely change the atom order (we do not have a "change atom order" action), although there are some standardization tasks that change the atom order as a side effect (e.g. dehydrogenize). By default, plugin calculations always output results in the original atom order, even if their built-in standardization changes the atom order. You can also use the atom order of the standardized molecule if you set the input molecule by
|Molecule setMolecule(Molecule mol, boolean st, boolean om) |
This method returns the standardized molecule.
In PMapper the built-in standardization is configured in the <StandardizerConfiguration> subsection of the configuration XML.
Unfortunately in the current version the standardized molecule is used for atom indexing, but it is not returned to the caller.
This is a bug which will be fixed in the next major JChem release. I am planning to implement a similar solution as in the case of plugins: pmaps will be returned either in the original atom order or the atom order of the standardized molecule, depending on a parameter.
There is a related topic in this subject:
However, there are two more problems that remain:
(1) If you use outside standardization with a separate Standardizer object then PMapper and the plugins have no chance to know the original molecule and therefore the standardized molecule will determine the atom order
(2) In case of microspecies, the plugin returns the microsepcies molecules generated from the standardized molecule. In this case standardization include dehydrogenization which changes the atom order. Therefore if you use these microspecies as an input to PMapper, then the output cannot be returned in the original atom order (which is the atom order of the input molecule, unknown to PMapper) - and since these microspecies exist as molecule objects in memory, there is no atom order the user knows. I have no solution to this - suggestions are welcome.
PS: you did not attach you pmapper and standardizer config XML - but I could run your program with my sample XMLs.
Thanks, Nora - it's like i imagined; i guessed that cannonical smiles have an atom numbering that is independent on initial atom order BUT i was hopin' for some global atom naming procedure allowing me to store in a DB a molecule together with its most populated microspecies in such a way as to ensure that atom #5 always refers to the same heavy atom in all of them... this is, if i understood your answer right, never granted since some microspecies may have shifted atoms. What if you introduce an atom ID, different from the atom NR, which every atom carries with him through standardization. These atom IDs should be cannonical - therefore perhaps related to the atom position in the smiles string, calculated at input and then inherited throughout clone/standardization. There was a mechanism like that in Cerius2 SDK, where you had a function to retrieve the list of atomIDs iatom(1...natoms); then to get the charge of the i-th atom in your list you had charge(iatom(i)) rather than charge(i) - with the obvious advantage that asking for charge(given_atom_index) you knew exactly which atom was meant.
Every reference to an atom property should be done through the ID; for example, CNode should return the ID rather than the atom# - by the way, what DOES it return - the doc on this is sketchy, at best - how can I get my hands on the atom list of a molecule? How can I retrieve the symbol, charge, coordinates of each atom - API doc is too "fractal" and I didn't hit the right page yet.... and how would THOSE classes depend on standardization?
I see what you mean, but there's no need to create any "magical" ID that really encodes the NATURE of the atom - just give them a number that can be unambiguously determined once the molecular structure is specified: if you can check for identity of two structures (and you can) then there is no reason not being able to design an unambiguous ID. Now that u mentioned MolAtom: how does it actually work (doc, again, is meager - got my full understanding, writing doc is my most hated item on the to do list as well ;-). Do you have an example of a code of someone shuffling through all the atoms in a molecule? At some point, one needs a list of all the object pointers for atoms - (how do you get that?) but does this really solve the problem? Am i sure that pointer#1 points to the same oxygen in mol and its microspecies? Eventually, my problem would go away if there would be a MolAtom.getPhamFlag() thing similar to the gets you mentioned - then I'll know who's who!