How to expand protein labels in molfile

User 73531e86ff

02-12-2010 10:50:21

We are having an issue with molfiles that are stored in our database that contain protein labels in the connection table instead of explicit atoms.


This is possibly a non-standard molfile since the specification doesn't mention that this can be done.  However, Marvin and other tools seem happy to import and display these files.


Our database also stores SMILES for each compound and that is the column which is indexed for searching.  Therefore, when a  we need to search for a structure, we use jchem to convert the Molfile into a SMILES.  Unfortunately, jchem doesn't automatically expand the groups so protein labels from the molfile are represented in the SMILES by a star and thus the structure is not found.


If we retrieve the SMILES from the database and then search it works fine because the SMILES is explicitly specifying all the atoms in the protein.  However, users mostly retrieve the Molfile since it contains preferred coordinates for rendering.


Without changing *all* the protein molfiles in our database is there a way that JChem or the standardizer can convert these Molfiles into the corresponding SMILES?  I did try the "Alias to Atom" and "Expand Group" standardizer functions but they didn't work for the example molfile I have.

ChemAxon e08c317633

03-12-2010 15:25:30

Could you attach an example molfile? If it's confidential then please modify the molecules in file, only the format and the "protein labels in the connection table" are important.


I think you have to convert the molfile to ChemAxon Extended SMILES ("cxsmiles") format (JChem stores molecules in this format), and maybe you have to use some cxsmiles export options to export the protein labels. We can tell more after we see the molfile.


See also: http://www.chemaxon.com/marvin/help/formats/cxsmiles-doc.html


Zsolt