JChem issue with MacRoman encoding

User f67d4188b6

15-11-2012 11:02:51

There are various formats possible in the instantjchem molecule cells, one of them is


 


<?xml version="1.0"?><cml version="ChemAxon file format v5.10.0, generated by v5.11.3">
<MDocument><MChemicalStruct><molecule molID="m1"><propertyList><property dictRef="CdId" title="CdId"><scalar>18</scalar></property><property dictRef="Mol Weight" title="Mol Weight"><scalar>93.12</scalar></property><property dictRef="Formula" title="Formula"><scalar>C6H7N</scalar></property></propertyList><atomArray atomID="a1 a2 a3 a4 a5 a6 a7" elementType="C C C C C C N" x2="31.803101997879107 30.469422876051073 30.469422876051073 33.136781119707145 33.136781119707145 31.80310199787911 31.80310199787911" y2="-2.106169580076351 -1.33616958007635 0.20383041992365047 0.20383041992365047 -1.3361695800763493 0.9738304199236496 2.51383041992365"></atomArray><bondArray><bond atomRefs2="a5 a1" order="2"></bond><bond atomRefs2="a2 a1" order="1"></bond><bond atomRefs2="a3 a2" order="2"></bond><bond atomRefs2="a5 a4" order="1"></bond><bond atomRefs2="a6 a3" order="1"></bond><bond atomRefs2="a4 a6" order="2"></bond><bond atomRefs2="a7 a6" order="1"></bond></bondArray></molecule></MChemicalStruct></MDocument>
</cml>


 


and another one is


 


<?xml version="1.0" encoding="MacRoman"?><cml version="ChemAxon file format v5.10.0, generated by v5.11.3">
<MDocument><MChemicalStruct><molecule molID="m1"><propertyList><property dictRef="CdId" title="CdId"><scalar>18</scalar></property><property dictRef="Mol Weight" title="Mol Weight"><scalar>93.12</scalar></property><property dictRef="Formula" title="Formula"><scalar>C6H7N</scalar></property></propertyList><atomArray atomID="a1 a2 a3 a4 a5 a6 a7" elementType="C C C C C C N" x2="31.803101997879107 30.469422876051073 30.469422876051073 33.136781119707145 33.136781119707145 31.80310199787911 31.80310199787911" y2="-2.106169580076351 -1.33616958007635 0.20383041992365047 0.20383041992365047 -1.3361695800763493 0.9738304199236496 2.51383041992365"></atomArray><bondArray><bond atomRefs2="a5 a1" order="2"></bond><bond atomRefs2="a2 a1" order="1"></bond><bond atomRefs2="a3 a2" order="2"></bond><bond atomRefs2="a5 a4" order="1"></bond><bond atomRefs2="a6 a3" order="1"></bond><bond atomRefs2="a4 a6" order="2"></bond><bond atomRefs2="a7 a6" order="1"></bond></bondArray></molecule></MChemicalStruct></MDocument>
</cml>


 


The only difference is the encoding.
The MacRoman encoding bit chokes the Jchem Search node (and possibly other nodes as well, i have not tested that) with the following rather silly error:


ERROR     JChem Search     Execute failed: Could not read molecule from byte array.


A workaroud i have found is to simply remove the encoding bit:


update TEST_COMPOUNDS set cd_structure = REPLACE(cd_structure, ' encoding="MacRoman"', '')


But that is quite silly as well, i can not expect the chemists to know about encodings. It should not make the JChem node to go cry in the corner.

User 5458277630

15-11-2012 11:22:19

Hi,

Thank you for your reporting
Is it possible that you post the stacktrace of this error message?

Thank you for your patience and kindly cooperation.

Best,
Taka

User f67d4188b6

15-11-2012 11:41:56

The essential part:


 


 


2012-11-15 12:37:14,736 ERROR KNIME-Worker-1 JChem Search : Execute failed: Could not read molecule from byte array.
2012-11-15 12:37:14,737 DEBUG KNIME-Worker-1 JChem Search : Execute failed: Could not read molecule from byte array.
chemaxon.formats.MolFormatException: Could not read molecule from byte array.
        at chemaxon.jchem.db.JChemSearch.getMolecule(JChemSearch.java:6517)
        at chemaxon.jchem.db.JChemSearch.getHitsAsHitDisplayTool(JChemSearch.java:3671)
        at chemaxon.jchem.db.JChemSearch.getHitsAsMolecules(JChemSearch.java:3567)
        at jp.co.infocom.cheminfo.jchem.jchemsearch.JChemSearchNodeModel.execute(JChemSearchNodeModel.java:614)
        at org.knime.core.node.NodeModel.executeModel(NodeModel.java:536)
        at org.knime.core.node.Node.invokeNodeModelExecute(Node.java:995)
        at org.knime.core.node.Node.execute(Node.java:889)
        at org.knime.core.node.workflow.SingleNodeContainer.performExecuteNode(SingleNodeContainer.java:894)
        at org.knime.core.node.exec.LocalNodeExecutionJob.mainExecute(LocalNodeExecutionJob.java:100)
        at org.knime.core.node.workflow.NodeExecutionJob.run(NodeExecutionJob.java:166)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at org.knime.core.util.ThreadPool$MyFuture.run(ThreadPool.java:124)
        at org.knime.core.util.ThreadPool$Worker.run(ThreadPool.java:239)
Caused by: chemaxon.formats.MolFormatException: Invalid encoding name "MacRoman".
        at chemaxon.marvin.io.formats.cml.MrvImport.readDocument(MrvImport.java:94)
        at chemaxon.marvin.io.MRecordImporter.readMol(MRecordImporter.java:700)
        at chemaxon.marvin.io.MRecordImporter.readMol(MRecordImporter.java:678)
        at chemaxon.marvin.io.MRecordImporter.readMol0(MRecordImporter.java:593)
        at chemaxon.marvin.io.MRecordImporter.readMol(MRecordImporter.java:509)
        at chemaxon.formats.MolImporter.readMol(MolImporter.java:860)
        at chemaxon.formats.MolImporter.read(MolImporter.java:747)
        at chemaxon.formats.MolImporter.read(MolImporter.java:717)
        at chemaxon.util.MolHandler.importMol(MolHandler.java:654)
        at chemaxon.util.MolHandler.setMolecule(MolHandler.java:172)
        at chemaxon.util.MolHandler.<init>(MolHandler.java:105)
        at chemaxon.jchem.db.JChemSearch.getMolecule(JChemSearch.java:6515)
        ... 14 more
Caused by: org.xml.sax.SAXParseException: Invalid encoding name "MacRoman".
        at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
        at chemaxon.marvin.io.formats.cml.MrvImport.readDocument(MrvImport.java:90)
        ... 25 more
2012-11-15 12:37:14,739 DEBUG KNIME-Worker-1 WorkflowManager : JChem Search 0:4 doBeforePostExecution
2012-11-15 12:37:14,739 DEBUG KNIME-Worker-1 NodeContainer : JChem Search 0:4 has new state: POSTEXECUTE
2012-11-15 12:37:14,739 DEBUG KNIME-Worker-1 KnimeResourceNavigator : Node message changed: ERROR: Error in sub flow.
2012-11-15 12:37:14,740 DEBUG KNIME-Worker-1 NodeContainer : MacRoman_error 0 has new state: EXECUTING
2012-11-15 12:37:14,740 DEBUG KNIME-Worker-1 WorkflowManager : JChem Search 0:4 doAfterExecute - failure
2012-11-15 12:37:14,740 DEBUG KNIME-Worker-1 JChem Search : reset
2012-11-15 12:37:14,740 DEBUG KNIME-Worker-1 JChem Search : clean output ports.
2012-11-15 12:37:14,740 DEBUG KNIME-Worker-1 WorkflowFileStoreHandlerRepository : Removing handler c77c616e-40e5-4b51-906e-a1aefa702896 (JChem Search 0:4: <no directory>) - 1 remaining


 


If you want the entire dump i can send it by email.

User f67d4188b6

05-12-2012 09:52:14

Any progress yet ?

User 5458277630

06-12-2012 08:56:10

Hi


I am really sorry I have taken so long to answer your topic.
I have still been investigating this problem, but found a possible solution to the problem.


Hopefully, next version which will be based on JChem 5.12 will fix it.
I apologize for the trouble and thank you for your patience.


Best,
Taka

User f67d4188b6

06-12-2012 09:48:31

As a temporary workaround i put the following sql code in an insert and update trigger on the molecules table (mysql)


SET NEW.cd_structure = REPLACE(NEW.cd_structure, ' encoding="MacRoman"', '');

User 5458277630

25-04-2013 11:03:27

This problem has been fixed in the latest version 2.6.3.v0137 that uses JChem5.12.3.0.
I deeply apologize for my neglect.

Best regards,
Taka