Can we provide chemical names to "Document Extractor&qu

User a8852677c2

22-12-2010 06:15:55

Hi All,


We want to provide list of chemical names to "Document Extractor" so that  "Document Extractor" will ignore provided chemical name  during extraction process.


 


for example. we have example.txt file contain 3 chemical name .x,y,z. If we give this file to "Document Extractor" .It  will extract 3 chemical names  x, y and z.


But now we dont want z as chemical name when we extract example.txt file. So Is there any provison so that we can tell "Document Extractor" please dont extract z as chemical name.


 


Regards


Yogesh Dhawale
Java Developer
[email protected]

 

Patent iNSIGHT Pro
Gridlogics Technologies
Pvt.Ltd
4th Floor, Sunflower Commercials,
77/1 Baner Road,
Pune :-
411045. INDIA.
www.patentinsightpro.com

ChemAxon e7b9408ca1

22-12-2010 11:45:35

Hi Yogesh,


We have such "exclude list" internally. There is currently no way for clients to specify it dynamically yet. We are planning to add that functionality.


However, it should not be very hard, after you get a list of hits from Document Extractor, to go through the list and to remove any hit you want, for instance based on your list of excluded names.


Is this solution acceptable for you, or is it difficult/impossible to implement on your side?


Best regards,


Daniel

User a8852677c2

22-12-2010 12:58:33

Hi Daniel,


Yes we can exclude the chemical names after extraction process . But If our excluded
list is big then It takes more time to remove hits in our side. For us It is an performance issue. Because It takes lot of time to extract chemical names from files.


It is not difficult/impossible to implement in our side. But It affects performance.


 



Thanks & Regards




Yogesh Dhawale
Java Developer
[email protected]



 



Patent iNSIGHT Pro
Gridlogics Technologies

Pvt.Ltd
4th Floor, Sunflower Commercials,
77/1 Baner Road,
Pune :-

411045. INDIA.
www.patentinsightpro.com

ChemAxon e7b9408ca1

22-12-2010 14:03:57

We know about major performance issues on certain documents, and we are working on them. There is potential for major speedups (which I think would have much more impact than adding an exclude list, which I expect would only affect performance slightly).


If performance is affecting you now, would you be able to send me some example documents that are especially slow to process? The next version (due in about one month) would then surely be much faster. We could also send you a beta earlier if required.

User a8852677c2

27-12-2010 12:48:06

Hi,


Sorry for late reply. I am sending text file contain html  which take 2 min 57 second for extraction.


After extraction we need add loop for remove chemical name which we dont want.


So It takes around 3 min and 10 sec for extraction for single html file. same for text file also.


Can you tell me how I can improve performance of extraction?


 


Thanks & Regards


Yogesh Dhawale

ChemAxon e7b9408ca1

28-12-2010 09:12:50

Dear Yogesh,


Thanks for the example text. I'll investigate how I can make processing faster.


Daniel

ChemAxon e7b9408ca1

04-01-2011 09:49:18

Hi Yogesh,


I made speed optimizations, processing this document is currently 2.3 times faster than in 5.4.0. I see similar speed ups for most documents I tested. There are still some cases that look slower than expected, so I hope for more improvements soon.


Daniel

User a8852677c2

04-01-2011 11:01:01










Daniel Bonniot wrote:

Hi Yogesh,


I made speed optimizations, processing this document is currently 2.3 times faster than in 5.4.0. I see similar speed ups for most documents I tested. There are still some cases that look slower than expected, so I hope for more improvements soon.


Daniel



Hi Daniel,


Thanks for speed optimizations.  Which version I should download to test performance on my machine or should I wait till next version.


Thanks & Regards


Yogesh Dhawale.

ChemAxon e7b9408ca1

05-01-2011 17:02:43

Hi Yogesh,


You can download an alpha version.


(sorry, it's just a list of files, I hope you will find the version you need, it should match what you used previously from 5.4.0).


Do you see improvements? If you have other documents that are processed slowly (relatively to their size) with this version, please forward them, I will analyze what further optimizations can be done.


Regards,


Daniel

User a8852677c2

06-01-2011 06:39:21










Daniel Bonniot wrote:

Hi Yogesh,


You can download an alpha version.


(sorry, it's just a list of files, I hope you will find the version you need, it should match what you used previously from 5.4.0).


Do you see improvements? If you have other documents that are processed slowly (relatively to their size) with this version, please forward them, I will analyze what further optimizations can be done.


Regards,


Daniel



Hi Daniel,


Thanks for an an alpha version. I will let you know the improvements.


Thanks & Regards


Yogesh Dhawale

User a8852677c2

06-01-2011 12:28:36










Yogesh wrote:










Daniel Bonniot wrote:

Hi Yogesh,


You can download an alpha version.


(sorry, it's just a list of files, I hope you will find the version you need, it should match what you used previously from 5.4.0).


Do you see improvements? If you have other documents that are processed slowly (relatively to their size) with this version, please forward them, I will analyze what further optimizations can be done.


Regards,


Daniel



Hi Daniel,


Thanks for an an alpha version. I will let you know the improvements.


Thanks & Regards


Yogesh Dhawale



Hi Daniel,


I have download the alpha version and test It. Now performace has increase lot. It 2 times faster.


Previously 1 text file takes 3 min now It takes 1 min. But If we want to extract 60 text files then It takes 60 minutes.


We have to wait 1 hour to extract for 60 file. Means time is increase with file count.  We are thinking how to decrease extraction times?


Please let us know If  there is any other way to decrease extraction time.


Thanks & Regards


Yogesh Dhawale

ChemAxon e7b9408ca1

18-01-2011 14:40:50

Dear Yogesh,


I'm glad you could confirm a 2-3x speedup.


Regarding further improvements, I do believe it is possible to increase speed further, it will just take some time.


Some things you might be able to do on your side:




I will inform you when I have further improvements you can test.


Regards,


Daniel

User a8852677c2

19-01-2011 06:10:06










Daniel Bonniot wrote:

Dear Yogesh,


I'm glad you could confirm a 2-3x speedup.


Regarding further improvements, I do believe it is possible to increase speed further, it will just take some time.


Some things you might be able to do on your side:



  • "guess" which documents might not be related to chemistry at all, maybe by looking for keywords.



  • process the first 10% of the document, if it has no/few hits, it might be skipped.


I will inform you when I have further improvements you can test.


Regards,


Daniel


ChemAxon e7b9408ca1

04-02-2013 11:57:35

An update on this question about performance. I made speed measurements on the attached document (Temp.txt above) with several versions released since the original question:

























Version Time Speed
5.4 65s 1.6
5.5 19s 5.6
5.9, 5.11, 5.12 9s 11.8

So there has been a major speed increase in the newer versions (about 7 times faster than 5.4). I hope than is satisfactory and helpful in your task.


Additionally, for processing more documents per unit of time, you can of course also process them in parallel, on the same machine (with multiple CPU cores) or separate machines.