odd little Marvin bug

User 870ab5b546

10-09-2005 12:06:00

Turn on implicit hydrogens: hetero and terminal. Draw this compound:





Code:



<?xml version="1.0" encoding="MacRoman" ?>


<MDocument>


  <MChemicalStruct>


    <molecule molID="m1">


      <atomArray


          atomID="a1 a2 a3 a4 a5"


          elementType="C C O C C"


          x2="-0.8730979326513149 -0.5820652884342099 -1.7461958653026302 0.8730979326513151 2.3282611537368396"


          y2="-1.713883309913484 -0.20163333057805666 0.8065333223122277 0.3024499958670852 0.8065333223122277"


          />


      <bondArray>


        <bond atomRefs2="a1 a2" order="1" />


        <bond atomRefs2="a2 a3" order="1" />


        <bond atomRefs2="a2 a4" order="1" />


        <bond atomRefs2="a4 a5" order="3" />


      </bondArray>


    </molecule>


  </MChemicalStruct>


</MDocument>








Now go to Edit -> Source, change to SMARTS, and import. Note that the implicit H atoms no longer show. If you erase atoms and redraw them, the implicit H atoms show properly on the newly drawn atoms.

ChemAxon 25dcd765a3

11-09-2005 09:55:15

Hi Bob!





Actually this is not a bug but rather a feature :-)





The point is the following:


If you draw a molecule in Marvin then valenceCheck automatically assigns the number of implicit Hydrogens as neccesary.


Of course if you are drawing a molecule this is perfect, but if you just want to draw a part of a molecule (for query), then you don't need the implicit Hydrogens at all (you can turn it off).


During the drawing process it is not possible to find out if you are drawing a molecule or a molecule fragment so Marvin assumes molecules and assigns the number of implicit Hydrogens.


The SMARTS format is designed to store rather queries (molecule fragments) and not molecules. (For the molecules the SMILES format should be used.) So if you import a fragment from SMARTS that means you are dealing with a query (molecule fragment) where the implicit Hydrogens are not needed. That is why in this case the valenceCheck is not used to assign the number of implicit Hydrogens.





Let's take following simple example:


SMARTS string:


[#15]


This is a Phosphorus atom query.


This atom matches molecules containing: aromatic Phosphorus atom, Phosphorus atom with valence 3, Phosphorus atom with valence 5, elemental Phosphorus, etc..


It would be misleading to assign any number of implicit Hydrogen to this atom.


So Marvin doesn't assign any.





All the best


Andras

User 870ab5b546

11-09-2005 18:56:27

Ahhh.... I see.





Shouldn't SMARTS be able to handle everything that SMILES does? Although SMILES can now handle radicals by indicating the number of explicit H atoms on an atom, SMARTS still can't. Consider CH3CH2•. In SMILES it is [CH2]C, but in SMARTS, it is [#6]-[#6].

ChemAxon 25dcd765a3

14-09-2005 06:49:00

Hi Bob,


I would be happy to solve this problem with the implicit Hs.


The problem is the the following:


- in SMILES for atoms in bracket any Implicit H should be specified, for atoms without bracket the number of attached hydrogens conforms to the lowest normal valence consistent with explicit bonds.


- in SMARTS the " H<number>" defines an atom that has <number> attached hydrogens ("implicit" or "explicit", i.e. H property or H atom count)





You can see that if someone draws a substructure (query), in general he don't want to convert implicit Hs to convert it to H property.





Lets take the following example: draw a C atom as a query (we are searching for molecules containing carbon atom).


If you export it to SMILES (as a whole molecule) it is metane CH4


If you export it to SMARTS it is [#6] a carbon atom (with any valence).


If I would export it to SMARTS as [#6H4] this would mean a carbon with 4 four Hs, which is not the expected query.


So we shouldn't store the implicit Hs like this.





The radicals are calculated from the valence and from the number of implicit Hs, which is not possible in case of SMARTS (usually neither the valence nor the implicit Hs are known in SMARTS).





You can overcome this problem if you use CXSMARTS (ChemAxon extended SMARTS) as it stores the radicals in a separate field.





All the best


Andras