User 4aada85f0d
25-06-2015 13:03:20
Hello,
I have a problem with document to structure.
If I try and run molconvert on a PDF I get the following (truncated)...
$ molconvert smiles:a test.pdf -o test.smi
Jun 25, 2015 1:53:03 PM chemaxon.naming.document.TesseractProcessOCR isAvailable
WARNING: Tesseract could not be installed, OCR is disabled
java.io.IOException: I'm sorry, I could not find tesseract-unknown-3.01.jar
at chemaxon.marvin.util.InstalledComponent.findLocalJar(InstalledComponent.java:195)
at chemaxon.marvin.util.InstalledComponent.nonAppletInstall(InstalledComponent.java:171)
MarvinBeans: marvinbeans-15.6.15.0-macos
OS: Mac OS X Yosemite (10.10.3).
Java: 1.6.0.jdk
tesseract (via Homebrew): tesseract-3.02.02_3
Is it that this functionality is not supported on OS X? I see 'tesseract-X-3.01.jar' files for Windows and Linux but not Mac.
Thanks,
Francis
ChemAxon e7b9408ca1
25-06-2015 13:47:20
Hi Francis,
This should work on Mac OS. However there might be a few issues to solve to get it working. The first thing is, I suspect you are actually running an older version of marvin. Could you please check what this command says:
molconvert | head -1
User 4aada85f0d
25-06-2015 14:14:45
Oh dear, I *was* an old version (/Applications/MarvinBeans/bin/molconvert) by mistake and not the new version I'd installed.
However, now that's fixed, I have a slightly different problem...
$ whence molconvert
/Applications/ChemAxon/MarvinBeans/bin/molconvert
$ molconvert -h | head -1
Molecule File Converter, version 15.6.15.0, (C) 1999-2015 ChemAxon Ltd.
$ molconvert smiles:a test.pdf -o test.smi
Jun 25, 2015 3:09:55 PM chemaxon.naming.document.TesseractProcessOCR isAvailable
WARNING: Tesseract could not be installed, OCR is disabled
java.io.IOException: Missing resource: /tesseract-macosx-3.01.zip
at chemaxon.util.InstalledComponent.installInto(InstalledComponent.java:224)
I do have an Mac OSX jar now, though...
-rw-r--r-- 1 francis admin 1451901 15 Jun 13:32 /Applications/ChemAxon/MarvinBeans/lib/tesseract-macosx-3.01_1.jar
ChemAxon e7b9408ca1
26-06-2015 06:29:51
Good, we're making progress :) There is indeed an issue with tesseract on Mac OS in our current version. This should be fixed in next week's version. Is this OK for you, or would you need a workaround sooner?
User 4aada85f0d
26-06-2015 08:34:41
Next week would be fine! Any particular release number I should look out for?
ChemAxon e7b9408ca1
26-06-2015 12:00:42
This fix should be included in version 15.06.29.
ChemAxon e7b9408ca1
01-07-2015 06:25:49
Version 15.06.29 is now released. Francis, could you please confirm if it fixes the issue for you?
User 4aada85f0d
01-07-2015 13:54:13
I think that's fixed, yes: I've no problems with any of the PDFs I've tried it on so far. Many thanks!
ChemAxon e7b9408ca1
03-07-2015 06:23:21
Great! Thanks for your report and your patience.