Document to Structure Bug

User 2347372188

21-05-2013 23:06:39

Hello.  I've been pulling examplar compounds out of patents that we've pulled down as XML files from google patents and I've run into a bug.  d2s hangs on certain patents.  I've attached one of the patents that causes d2s to hang.  I've encountered the bug in versions 5.12.3 and 5.12.4.  Here is the code I'm using to read the patent:


MolImporter importer = new MolImporter(file, "d2s");



        while (true) {


            Molecule m = importer.read();


            if (m == null) {


                break;


            }


}


Thank you for the help.


-&


ChemAxon e7b9408ca1

22-05-2013 17:15:47

Hi Steven,


I could reproduce the issue, thank you for the clear report. This patent triggers a bug in d2s that increases the amount of work so much that it indeed appears to hang. I fixed this bug in the development version, and will soon add it to the 6.0 branch as well. However we are very close to the 6.0.0 release, so the fix will probably be in 6.0.1 only.


Are you doing batch conversion on a large number of patents? As a workaround until then, you might want to use the "d2s:timeout=N" option, where N is a maximum time in seconds. Just set N high enough that normal processing is allowed to finish, but avoid this case to run "forever". If the timeout is reached, a chemaxon.naming.document.TimeoutException (subclass of IOException) will be thrown, so you can catch and log it.


On a separate note, upcoming version 6.0 of d2s has some speed optimization which should be around 20% in average for text (including patent XML) processing. More optimizations are in the works for future versions.


Best regards,


Daniel

User 2347372188

22-05-2013 17:28:48


Thanks for getting back to me so quickly and for the
work-around.


I'm processing about 8400 patents from google.  I multithreaded the code so I can process up
to 16 patents at once (on a 16 core machine). 
It's very cool!  It took me a
while to convince myself that the bug wasn't mine.


 


-&


ChemAxon e7b9408ca1

23-05-2013 08:38:41

Yes, I know how debugging multithreaded code can be confusing. Sorry for the inconvenience. It's great that this bug will be gone in the near future, thank you for your help in identifying it!