Molfile format query (M APO)

User 6bec43dc6c

14-02-2006 17:31:17

Hi,





One of our users has had problems parsing one of the molfiles created by MarvinSketch. The functionality used by our curators in this specific molfile is the 'Groups' functionality available in the 'more' button of MarvinSketch. After using the 'Groups' functionality we have realised that an additional attribute is used in the molfile.


I've copied and pasted a simple 'Groups' molfile below. You will notice that there is an attribute 'M APO' which according to the molfile standard is the 'Atom Order Attachment Point'. According to the documentation it says that this refers to R# groups but in the molfile there are no R# groups. This has caused an error in our users parser. If we remove this 'M APO' the molfile is still generated correctly which leads us to believe that this line is superfluous to requirements.





Any input on this would be greatly appreciated.





Best Regards,


Paula





Code:



  Marvin  02140617222D         





  2  1  0  0  0  0            999 V2000


   -1.9750    1.6563    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


   -1.1500    1.6563    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


  1  2  1  0  0  0  0


M  APO  1   2   1


M  STY  1   1 SUP


M  SAL   1  2   1   2


M  SMT   1 Et


M  END

User 870ab5b546

15-02-2006 03:24:24

Hello,





I think I can help on this one. The Groups menu refers to molecule fragments with free valences, like the ethyl group (Et) of your example, or iPr, or the amino acid residue Gly, -NHCH2C(=O)-. The M APO line is parsed as follows: First number is number of attachment points (usually 1, but two for amino acid residues like Gly), second number is the atom number with the free valence, and third number designates which free valence is filled first, which second, etc., when new atoms are attached to the group. If you delete the M APO line, the shortcut group will no longer have any atoms with free valences. For example, type iPr and click in the Marvin window. Set View -> Hydrogens -> Implicit to All. Then choose Edit -> Groups -> Expand. You will see the central C atom has only one H and an asterisk (designating the attachment point). If you delete the M APO line from the MOL file, the asterisk will go away and the middle C will have two H atoms.





Whether your MOL file is correct after deleting M APO will depend on whether, (1) the shortcut group is connected to something else already, and (2) you want to preserve the information about the shortcut group. If the answer to (1) is yes and (2) is no, you can delete M APO and the rest of the special attributes following. Doing so is equivalent to choosing Edit -> Groups -> Ungroup.





-- Bob Grossman

ChemAxon a3d59b832c

15-02-2006 08:39:06

Yes, Bob is absolutely right. There is an attachment point left in the structure.





You can also delete the attachment point from marvin. Expand the group, and right-click on the atom with attachment point. Select menu item 'Group/Attachment point'. That toggles '1 connection point', 'two connection points' and 'no connection point' states.





If you don't need the abbreviation name (grouping), you can also ungroup it, that erases all attachment points as well. When you draw the abbreviation with the shift pressed, the group is drawn straight away ungrouped.





And just one more tip: you don't need to go to the More window for the abbreviations. It is enough to type the name and click.





Best regards,


Szabolcs

User 6bec43dc6c

15-02-2006 09:40:14

Hi again,





Thanks for all your replies so far but it still does not solve my problem.





According to my user and I quote





"The file contains a reference to an undefined R-group for atom 1 (and


also atoms 13 and 15 - but the input is aborted before this is reached). There are no R-groups defined at all in that file - only S-groups. The file is broken and seems to be confused about the difference between R-and S-groups. I checked the MDL docs - "M APO" is clearly only valid for R-groups, and there are none."





I checked the MDL molfile docs and 'M APO' seems to be specified for RGroup only. Here is an excerpt from the spec


"Attachment Point [Rgroup]


M APOnn2 aaa vvv ...


vvv Indicates whether atom aaa of the Rgroup member is the first attachment point (vvv = 1), second attachment point (vvv = 2), both attachment points (vvv = 3); default of 0 = no attachment."





Any light you could shed on this would be greatly appreciated.





Best regards,


Paula

ChemAxon a3d59b832c

17-02-2006 10:16:07

Hi Paula,





Sorry for the delay, we had to walk round the problem thoroughly.
PauladeMatos wrote:
The file is broken and seems to be confused about the difference between R- and S-groups. I checked the MDL docs - "M APO" is clearly only valid for R-groups, and there are none."
It is true that the ctfile mentions R-groups, but I don't think that means it can only apply to R-groups. I just received the file below which was saved by ISIS/Draw and contains M APO without any R-groups. ISIS/Draw can also import the file you sent without any problem.





I think the third party molfile parser is that broken. It should simply ignore the molfile tags it does not understand. This is how all the different molfile importers work that I have encountered.





I can imagine the following options for you now:


1. Contact the authors of the third party molfile parser to fix their bug.


2. The users may remove all tangling attachment points from the drawing as described above.


3. We can provide you some Java code that removes these using our API.





Please let us know if option 3 would be any help.





Best regards,


Szabolcs

ChemAxon a3d59b832c

19-02-2006 13:54:22

Szabolcs wrote:
2. The users may remove all tangling attachment points from the drawing as described above.


3. We can provide you some Java code that removes these using our API.
Just a few easy options for this if you don't need the abbreviations: Ungroup the groups in Marvin (Edit/Groups/Ungroup) or convert to smiles or cml and back. (Marvin: Edit/Source)





Best regards,


Szabolcs

User 6bec43dc6c

23-02-2006 13:30:42

Thanks for all your feedback. We will continue to use this functionality but we will try to contact MDL and see if we can get further clarification on this.





Thanks again,


Paula

User 9d72be7cd4

05-05-2007 03:54:47

I have a similar issue, posted here





http://www.chemaxon.com/forum/viewpost11555.html#11555





It seems the attachment points are corrupting the mol file. This only occurs for S-groups, and not manually created Groups..





Is there an API for removing the attachment points so that I can script an export? This is NEEDED functionality (ie, correct mol files) so that we can generate ISISdb files and subsequently SDfiles.





I'm going to try removing them manually with an S-group and see if that works.





NCM


FSCI





------------- UPDATE





Manually removing the attachment point DOES allow a SAVE AS and a correct MOL file that ISIS does not complain about. The R-group in ISIS expands and works fine.





It sounds like this might be a bug in the MOLfile export. Obviously, ISIS does not care for the attachment point...please let me know if there is an automated way to remove these on saving as MOL file.





Thanks for all the work!





We're in the last phase of our evaluation, and hope to purchase several licenses shortly if we can work out these last few kinks.





NCM


FSCI

User f359e526a1

05-05-2007 07:28:49

Hello, you can manually set attachment points (see MolAtom.setAttach() in the API documentation) but that will not solve your other problem ;)





http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/MolAtom.html#setAttach(int)





Is that suitable for you?

User 9d72be7cd4

05-05-2007 18:22:37

So to use the setAttach method, I'd need to iterate through all the S-groups, identify the id, run setAttach, move to the next S-group.





What would be the best method to automatically iterate through all S-groups in a molecule?





A global molecule method to remove all attachment points would be nice...





OR offer the option upon export to MOL as another parameter, to remove attachment points.





Thanks for any further hints on how to automate this.





Manual is fine for one or two, but not thousands :-)





NCM


FSCI

User f359e526a1

07-05-2007 09:21:52

Molecule.getSgroupArray() will give you back all the Sgroups:


http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/Molecule.html#getSgroupArray()


but as it is written in the manual you have to be cautious. The ultimate solution is to fix the bug we introduced and it is already in the tasklist to be fixed soon.

User 9d72be7cd4

08-05-2007 17:05:37

What is the bug referring to exactly?





I'm testing using:





mol.ungroupSgroups();





mol.toFormat("mol");





This seems to work fine in my tests, and provides a nice clean mol file for inclusion in our SDfile generation.





Thanks for the hint to look in the API. the ungroup function is exactly what we needed.





This way, we can save in two formats - MRV for a visual representation, and MOL for the fully drawn out version - with one click through a scripted save.





FEATURE REQUEST - on MOL export via the GUI give the option to Ungroup all groups and/or Remove Attachment Points.





Thanks,





NCM


FSCI

User f359e526a1

10-05-2007 08:57:47

I was referring to


http://www.chemaxon.com/forum/viewpost11555.html#11555


as it is an Sgroup bug.


We will consider the feature request.