First comments on www.chemicalize.org - ChemAxon Forum Archive

User 677b9c22ff

17-11-2008 23:16:54

Hello Daniel,

chemicalize.org is a useful service and the non-invasive output looks good.

The system as is works good as a chemical reading enhancer.

Similar projects (with different scopes) are:

* Project Prospect

* OSCAR3/OPSIN which also power Project Prospect

* IBM Patent Search

* SureChem patent search powered by ACDName

* ChemMantis not online yet

* LEXICHEM which was used for PubChem

* CAS and Beilstein systems in use

Common problems are

1) how to filter out non chemcial noise using stop lists,

2) how to detect complex names as

2,15-dimethyl-14-(6-methylheptan-2- yl)tetracyclo[8.7.0.0^{2,7}.0^{11,15}]heptadec-7- en-5-ol

(which is Cholesterol from Marvin Name)

3) how to deal with different formats from chemistry publications and patents and websites

4) how to apply semantic filters and use curated vocabulary or ontology sets (IUPAC/CHEBI)

Applications are limitless

a) Build a web crawler which crawls the web and allows substrucure search

This actually should be built within Google Scholar, the only website which has

access to most of the digitized chemical literature (except CAS and Beilstein)

b) Built chemistry enhanced websites (as in Project Prospect)

c) Prepare documents prior submission to journals

d) Analyze journals after submission to find chemicals

e) Chemical Text Mining on full texts (not only Medline abstracts)

I am quite sure using the ChemAxon API and JChem cartrige on could

built such a stand-alone service or program or as you showcased

on can use it as webservice. Still having it as standalone program

would be nice.

The question is where to go with chemicalize.org?

(I)

I would use chemicalize.org for existing documents

to read through them. (The problem here is that the proxy can

not access the subscription literature (which is the majority)

The solution to that problem would be to download the whole

document locally and run the service again. That does not work.

Second problem most publications are in PDF, so PDF-->HTML is

needed which is even worse to perform (even with full Acrobat

it is a mess and usually fails) The system should also perform

OCR to convert chemical pictures to structures as done whith

systems like Kekule, Clide, OSRA and ChemOCR.

(II)

I would use chemicalize.org for existing web documents,

to obtain a list of Names, canonical SMILES, INCHIs, InChIKeys

That could be exported as TAB separated TXT or XLS using

a small button on top of the document. That would be a really helpful extension.

(III)

Attach PubChem Names which are free to download,

as a Lexicon for this service, because many common names

are not covered in this implementation.

BTW. there is also a nice PPT from David Wild from an ACS meeting:

Integrating text and literature sources

with traditional chemoinformatics tools

Cheers

Tobias