atom numbering??

User 21b7e0228c

04-03-2005 17:34:11

Hi! I wrote a small program TmpPka.java that reads in mols, standardizes them, send them to pKa plugin, fetches the predominant mikrospecies at given pH, then builds a pharmacophore property map according to the ionization status set by the pka plugin and writes a 5-column output:





1 - molnr.


2 - standardized smiles of input species


3 - standardized smiles of predominant mikrospecies


4 - percentage of the latter at pH=7 (default)


5 - pharmacophore property list





However, the order in which the atoms appear in the latter depends on whether I enter an .sdf or a smiles. Consider, for example, beztetrazole. I do:





echo "c1ccc(cc1)-c2nnn[nH]2"|pka_pharm (I dressed up the java class in the same way you did pmapper and other jchem/bins)





I get:


1 c1ccc(cc1)-c2nnn[nH]2 c1ccc(cc1)-c2nnn[n-]2 98.9 Ar;Ar;Ar;Ar;Ar;Ar;Ar;Ar/HA;Ar/HA;Ar/HA;Ar/HA/NC





How nice! It charges tetrazole negatively, and decides that the latest n in the smiles is aromatic, acceptor AND negatively charged (NC), while the other Ns are Ar/HA





Now, however, when I dump an .mol file of the same tetrazole on input (see tetrazole.mol), then i get:





1 c1ccc(cc1)-c2nnn[nH]2 c1ccc(cc1)-c2nnn[n-]2 98.9 Ar;Ar/HA/NC;Ar/HA;Ar/HA;Ar/HA;Ar;Ar;Ar;Ar;Ar;Ar





which is the same except that the flags are written out in the ancient order of atoms in the original .mol file - in spite of all the undertaken standardization! So, how can I know what flag belongs to what atom in a pmap???

ChemAxon fb166edcbd

04-03-2005 23:09:03

Actually, the intended behaviour is to output pmap in the original atom order, optionally in the standardized molecule atom order, but neither of these will coincide with the SMILES format atom order in general. The problem is that SMILES export may output atoms in a different order because we want to output unique SMILES - therefore compute graph invariant numbers for each atom and use these to determine the output atom order.





The original atom order in the input molecule is the atom order the user already knows - therefore by default our tools use this atom order for atomic output. And for the reason described above, this may differ from the atom order in the SMILES form of the molecule.





Standardization cannot explicitely change the atom order (we do not have a "change atom order" action), although there are some standardization tasks that change the atom order as a side effect (e.g. dehydrogenize). By default, plugin calculations always output results in the original atom order, even if their built-in standardization changes the atom order. You can also use the atom order of the standardized molecule if you set the input molecule by


Code:
Molecule setMolecule(Molecule mol, boolean st, boolean om)



This method returns the standardized molecule.


See


http://www.chemaxon.com/marvin/doc/api/chemaxon/marvin/plugin/CalculatorPlugin.html#setMolecule(chemaxon.struc.Molecule,boolean,boolean)





In PMapper the built-in standardization is configured in the <StandardizerConfiguration> subsection of the configuration XML.


Unfortunately in the current version the standardized molecule is used for atom indexing, but it is not returned to the caller.


This is a bug which will be fixed in the next major JChem release. I am planning to implement a similar solution as in the case of plugins: pmaps will be returned either in the original atom order or the atom order of the standardized molecule, depending on a parameter.


There is a related topic in this subject:


http://www.chemaxon.hu/forum/viewpost1502.html#1502





However, there are two more problems that remain:





(1) If you use outside standardization with a separate Standardizer object then PMapper and the plugins have no chance to know the original molecule and therefore the standardized molecule will determine the atom order








(2) In case of microspecies, the plugin returns the microsepcies molecules generated from the standardized molecule. In this case standardization include dehydrogenization which changes the atom order. Therefore if you use these microspecies as an input to PMapper, then the output cannot be returned in the original atom order (which is the atom order of the input molecule, unknown to PMapper) - and since these microspecies exist as molecule objects in memory, there is no atom order the user knows. I have no solution to this - suggestions are welcome.





PS: you did not attach you pmapper and standardizer config XML - but I could run your program with my sample XMLs.

User 21b7e0228c

07-03-2005 08:41:29

Thanks, Nora - it's like i imagined; i guessed that cannonical smiles have an atom numbering that is independent on initial atom order BUT i was hopin' for some global atom naming procedure allowing me to store in a DB a molecule together with its most populated microspecies in such a way as to ensure that atom #5 always refers to the same heavy atom in all of them... this is, if i understood your answer right, never granted since some microspecies may have shifted atoms. What if you introduce an atom ID, different from the atom NR, which every atom carries with him through standardization. These atom IDs should be cannonical - therefore perhaps related to the atom position in the smiles string, calculated at input and then inherited throughout clone/standardization. There was a mechanism like that in Cerius2 SDK, where you had a function to retrieve the list of atomIDs iatom(1...natoms); then to get the charge of the i-th atom in your list you had charge(iatom(i)) rather than charge(i) - with the obvious advantage that asking for charge(given_atom_index) you knew exactly which atom was meant.





Every reference to an atom property should be done through the ID; for example, CNode should return the ID rather than the atom# - by the way, what DOES it return - the doc on this is sketchy, at best - how can I get my hands on the atom list of a molecule? How can I retrieve the symbol, charge, coordinates of each atom - API doc is too "fractal" and I didn't hit the right page yet.... and how would THOSE classes depend on standardization?





Cheers,


Dragos

ChemAxon fb166edcbd

07-03-2005 12:21:36

dragos wrote:



Every reference to an atom property should be done through the ID; for example, CNode should return the ID rather than the atom# - by the way, what DOES it return - the doc on this is sketchy, at best - how can I get my hands on the atom list of a molecule? How can I retrieve the symbol, charge, coordinates of each atom - API doc is too "fractal" and I didn't hit the right page yet.... and how would THOSE classes depend on standardization?


I agree that it would be useful to implement some sort of atom identification mechanism - we already considered this but would require a lot of work - it is not clear to me how to generate these ID-s for simmetrical atoms? The SMILES form is not a good basis because not every molecule can be exported to SMILES - and the order in the SMILES string may change as the molecule changes - even if the atoms are left unchanged.





An obvious identifier is the MolAtom object pointer itself. I know this is very programmatic and not very user-friendly for a printout but you can use this in your program - plugins also use the object pointers to identify atoms in the molecule before and after standardization, in this way you can refer to atoms by their index in the original molecule.





The atom symbol, charge, coordinates can be retrieved through the MolAtom API:


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MolAtom.html





Symbol:


String getSymbol()


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MolAtom.html#getSymbol()





Charge:


int getCharge()


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MolAtom.html#getCharge()





Coordinates:


double getX(), getY(), getZ()


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MolAtom.html#getX()


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MolAtom.html#getY()


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MolAtom.html#getZ()





How these are changed by standardization depends on your standardization tasks. Basically standardization changes your original molecule, that is, keeps the original Molecule object, therefore the identification with MolAtom object pointers will work. MolAtom objects may disappear, change some of their properties (e.g. charge) and new MolAtom objects may be created. Atom coordinates are changed by Clean actions.

User 21b7e0228c

07-03-2005 13:04:04

I see what you mean, but there's no need to create any "magical" ID that really encodes the NATURE of the atom - just give them a number that can be unambiguously determined once the molecular structure is specified: if you can check for identity of two structures (and you can) then there is no reason not being able to design an unambiguous ID. Now that u mentioned MolAtom: how does it actually work (doc, again, is meager - got my full understanding, writing doc is my most hated item on the to do list as well ;-). Do you have an example of a code of someone shuffling through all the atoms in a molecule? At some point, one needs a list of all the object pointers for atoms - (how do you get that?) but does this really solve the problem? Am i sure that pointer#1 points to the same oxygen in mol and its microspecies? Eventually, my problem would go away if there would be a MolAtom.getPhamFlag() thing similar to the gets you mentioned - then I'll know who's who!

ChemAxon fb166edcbd

07-03-2005 15:37:44

dragos wrote:
I see what you mean, but there's no need to create any "magical" ID that really encodes the NATURE of the atom - just give them a number that can be unambiguously determined once the molecular structure is specified: if you can check for identity of two structures (and you can) then there is no reason not being able to design an unambiguous ID. Now that u mentioned MolAtom: how does it actually work (doc, again, is meager - got my full understanding, writing doc is my most hated item on the to do list as well ;-). Do you have an example of a code of someone shuffling through all the atoms in a molecule? At some point, one needs a list of all the object pointers for atoms - (how do you get that?) but does this really solve the problem? Am i sure that pointer#1 points to the same oxygen in mol and its microspecies? Eventually, my problem would go away if there would be a MolAtom.getPhamFlag() thing similar to the gets you mentioned - then I'll know who's who!
To get the atom objects:


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MoleculeGraph.html#getAtomArray()


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MoleculeGraph.html#getAtomCount()


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MoleculeGraph.html#getAtom(int)





Back to the microspecies problem: unfortunately the microspecies are generated molecules having different atom objects from the original molecule.


I attach some "tricky" example code that shows the correspondence between the original molecule atoms and atoms in the microspecies. The tricky part is that I use the fact that the only standardization task that changes the atom order is dehydrogenize - in our case. The correspondence is based on the following facts:





1) Although the microspecies contain different atoms from the origianl molecule,


the microspecies atom order corresponds to the atom order in the dehydrogenized input molecule.





2) The correspondence between atoms in the original molecule and atoms in the dehydrogenized molecule can be done by first collecting the atoms in a list, then perform dehydrogenize, and finally look for the list positions of atoms in the dehydrogenized molecule.





The example code maps oxygen and nitorgen atom objects in the original molecule to the corresponding atom objects in the microspecies, outputs the atom object pointer, the atom symbol and the charge.








Run it by


Code:



java MsTest test.smiles





or


Code:



java MsTest test.mrv








We cannot add a MolAtom.getPharmFlag() method because we do not want to include too specific data in the general MolAtom: pharmacophore feature calculation is only needed in specific cases. Also, plugin results such as pKa, partial charge,.... are not added to the MolAtom object.

ChemAxon fb166edcbd

07-03-2005 15:48:16

Also, to identify atoms, you can assign atom maps:


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MolAtom.html#setAtomMap(int)


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MolAtom.html#getAtomMap()





I attach the example molecules with atom maps on the ionizable atoms and the test code with printing the atom map in <atom symbol>:<atom map> form.