User c0e481a82c
14-08-2006 10:06:59
Hi,
We've had quite a few problems recently with SD files having abbreviations in them which are not understood correctly. For example, "Me" for methyl, and "CF3" for trifluoromethyl. We read them in using the MolImporter class in our applications but they obviously fail when the record with the abbreviations are encountered. Is there anything I can do about this, or is it simply a matter that the SD file cannot contain such abbreviations?
Regards,
Phil.
User c0e481a82c
14-08-2006 10:22:36
Actually, to just add to this question/comment, it's not that MolImporter fails to read the file as I stated, it's that the molecule created can't be converted to a SMILES string. Sorry for the confusion.
User f359e526a1
14-08-2006 11:33:21
Hello, could you please send some examples? I can export SDF with abbreviated groups to SMILES without any problem. What version of Marvin are you using?
User c0e481a82c
14-08-2006 12:03:03
Hi,
It's not in Marvin that I have the problem. As you say, Marvin will deal with the abbreviations fine and can display them. It's when I try to convert the abbreviated structures to SMILES strings that I have the problem. Therefore, I guess the question is more concerned with parsing the SD file into something from which a SMILES string could be generated. To reproduce the problem in Marvin, you can draw something with a methyl group defined as Me. Then choose Edit | Source, and choose the SMILES format. It will tell you that it cannot convert that structure to a SMILES string. This is the same stack trace as I see from within my applications. Does that help at all?
Regards,
Phil.
User f359e526a1
14-08-2006 12:37:31
I just tried that , and works fine. Note I am using the latest version (4.1) that was released recently.
User c0e481a82c
14-08-2006 13:18:14
Ah yes, I see. I was using 4.0.6. I've updated and now I see a different problem. That is, if I choose CF3 as my alias, then it sees that as just methyl (so trifluoromethylbezene becomes toluene). I suspect this is because the alias atom in the CTAB is represented by a carbon atom as a place holder. This is kind of confirmed if you use COOH or CO2H as abbreviations in the sketch; both come out as toluene from what should be benzoic acid.
I can appreciate that it's not reasonable to expect all aliases to be represented "out of the box". Therefore, in the same way as one can define new abbreviations in ISIS Draw, would it be possible to do the same for Marvin, and in the underlying API for custom applications which use Marvin?
Regards,
Phil.
User f359e526a1
14-08-2006 14:01:02
It is possible to create your own abbreviations for Marvin : draw the structure and right-click, and choose "Create Group", then type its name. Again right-clicking on an atom in the blue brackets you can define the (upto two) attachment points.
Alternatively you can dive into de abbreviated groups file:
http://www.chemaxon.com/marvin/doc/user/abbrevgroup-doc.html
Still, I can not reproduce the toluene bug you mention, does it happens using the API only?
User c0e481a82c
14-08-2006 14:16:02
I'll be sure to have a look at those documents. The issue I'm having with non-Me examples is that the abbreviation (CF3, COOH, CO2H, are the only three examples I've tried) ends up being simply a carbon atom in the SMILES string. This is in Marvin, as I'm in the process of updating the client in which this is seen, along with the server (the server uses JChem libraries, and so must be updated to be compatable with the new version of Marvin on the client). So after I draw a molecule such as that which is attached, goto Edit | Source and choose SMILES as the format, I see the SMILES string for toluene, not the molecule which is drawn.
Regards,
Phil.
User f359e526a1
14-08-2006 14:39:27
Now I see. The confusion is that probably we are using the "alias" word with a different meaning ;)
In Marvin the alias is not the abbreviated group, but a string attached to the atom. In this case you are assigning the "CF3" string to a carbon atom - that will still stay a carbon but has an alias string. To make it a real abbreviated group, forget the "More/Alias" menu just draw a toluene, and type CF3 into the keyboard. At the left side of the sketching area strings will apear and your mouse pointer will change to CF3 also. You can click then to the dangling carbon that will be changed to CF3. It is now an abbreviated group, by right-clicking on it you can expand then contract it.
In the Marvin file it is saved as
<molecule id="sg1" role="SuperatomSgroup" title="CF3" leftName="F3C" molID="m2"> ....
instead of "mrvAlias" . Is it what you are looking for?
User c0e481a82c
14-08-2006 14:48:55
That sounds a lot like what I want to do. Thanks for your help!
Regards,
Phil.