extracting embeded chemdraw structures from word documents

User 55ffa2f197

21-02-2013 19:21:49

Hi, we have a couple of thousands word documents that contain analytical reports of the compounds. we would like to extract some of the information from the documents and put them into the database. There are structures pasted into the document chemists drawn using ChemDraw. When I use vb.net to process these word docs, VB see them as image, and I can save the image to gif. I know there must be function in Marvin/Jchem .Net API that would recognize the ChemDraw mol object, and convert them to smiles .... Usually I use Java API to do the coding in this case I have to use vb.Net since I am dealing with MS Word doc to dynamically open them, and get things out ...


Can you give me a hand on this task as spefic as you can?


I have attached a sample word document, with a structure in it and a chromatogram as well. This is the typical dcoument I am dealing with. Also a seperate cdx file which has the mol in it.


This is urgent, i wish to get answer quick. My last resort would be saved the structure as an image, then convert image to molecule using Java API


Thanks


Dong

User 55ffa2f197

21-02-2013 19:32:04

I am adding more information to my question, following is the code snippet I am using to traverse the word documents, and find inlineshape in the doc, some of these inlineshapes are actually chemdraw mol objects ... I am using freeware IDE sharpdevelop


Public Partial Class MainForm
    Public Sub New()
        ' The Me.InitializeComponent call is required for Windows Forms designer support.
        Me.InitializeComponent()
        Dim app As Microsoft.Office.Interop.Word.Application =  New Microsoft.Office.Interop.Word.Application()
        Dim doc As Microsoft.Office.Interop.Word.Document =  app.Documents.Open( "C:\Documents and Settings\lid17\Desktop\analytical\test.doc", ReadOnly:=True )
        For i=1 to doc.ActiveWindow.Document.InlineShapes.Count
              doc.ActiveWindow.Document.InlineShapes(i).Range.Select
              doc.ActiveWindow.Selection.Copy
            Clipboard.GetImage.Save("C:\Documents and Settings\lid17\Desktop\analytical\" & i & ".jpeg")
        Next
    End Sub
End Class

ChemAxon eb65a25631

25-02-2013 10:03:42

Hi,


I was able to extract the molecule from the documnet you attached the folowing way using MolImporter:


var mi = new MolImporter(@"D:\temporary\Downloads\test-doc.docx");
var mol = mi.read();


You can get detailed information on using MolImporter in the developer section of the site.


(You may need to use the latest JChem.NET API.)


 


Regards,


Andras

User 55ffa2f197

25-02-2013 13:02:10

Andras,


thanks for the information. I found similar function in Marvin Java API (5.11.3), so called document2structure, or d2s. it works fine.


Dong