First comments on www.chemicalize.org

User 677b9c22ff

17-11-2008 23:16:54

Hello Daniel,


chemicalize.org is a useful service and the non-invasive output looks good.


The system as is works good as a chemical reading enhancer.





Similar projects (with different scopes) are:


* Project Prospect


* OSCAR3/OPSIN which also power Project Prospect


* IBM Patent Search


* SureChem patent search powered by ACDName


* ChemMantis not online yet


* LEXICHEM which was used for PubChem


* CAS and Beilstein systems in use





Common problems are


1) how to filter out non chemcial noise using stop lists,


2) how to detect complex names as


2,15-dimethyl-14-(6-methylheptan-2- yl)tetracyclo[8.7.0.0^{2,7}.0^{11,15}]heptadec-7- en-5-ol


(which is Cholesterol from Marvin Name)


3) how to deal with different formats from chemistry publications and patents and websites


4) how to apply semantic filters and use curated vocabulary or ontology sets (IUPAC/CHEBI)





Applications are limitless


a) Build a web crawler which crawls the web and allows substrucure search


This actually should be built within Google Scholar, the only website which has


access to most of the digitized chemical literature (except CAS and Beilstein)


b) Built chemistry enhanced websites (as in Project Prospect)


c) Prepare documents prior submission to journals


d) Analyze journals after submission to find chemicals


e) Chemical Text Mining on full texts (not only Medline abstracts)





I am quite sure using the ChemAxon API and JChem cartrige on could


built such a stand-alone service or program or as you showcased


on can use it as webservice. Still having it as standalone program


would be nice.





The question is where to go with chemicalize.org?


(I)


I would use chemicalize.org for existing documents


to read through them. (The problem here is that the proxy can


not access the subscription literature (which is the majority)


The solution to that problem would be to download the whole


document locally and run the service again. That does not work.


Second problem most publications are in PDF, so PDF-->HTML is


needed which is even worse to perform (even with full Acrobat


it is a mess and usually fails) The system should also perform


OCR to convert chemical pictures to structures as done whith


systems like Kekule, Clide, OSRA and ChemOCR.





(II)


I would use chemicalize.org for existing web documents,


to obtain a list of Names, canonical SMILES, INCHIs, InChIKeys


That could be exported as TAB separated TXT or XLS using


a small button on top of the document. That would be a really helpful extension.





(III)


Attach PubChem Names which are free to download,


as a Lexicon for this service, because many common names


are not covered in this implementation.








BTW. there is also a nice PPT from David Wild from an ACS meeting:


Integrating text and literature sources


with traditional chemoinformatics tools






Cheers


Tobias

ChemAxon e7b9408ca1

27-11-2008 18:29:49

Hi Tobias,





Thanks a lot for your detailed analysis. Definitely a lot of food for thought! We are and will be looking into all these issues.





Regarding content necessiting login, this is probably not possible to handle with chemicalize.org itself, for technical and legal reasons. But we will be thinking about offline and/or browser plugin modes that could handle this situation.





Cheers,





Daniel