Best string form for search

User f5e6ccf034

08-08-2005 23:51:00

I would like to know which string version of a Molecule object should be used so that searches using the string and the molecule will always return the exact same results.

ChemAxon 9c0afc9aaf

09-08-2005 06:47:28

Hi,





I suggest the Marvin Document (mrv) format:





http://www.chemaxon.com/marvin/doc/user/mrv-doc.html





This format can store both MDL and Daylight-specific features.





Best regards,





Szilard

User f5e6ccf034

09-08-2005 15:00:18

Sorry by string form I meant a one-liner, something that could be used in logs, caches etc and still capture all the information required to re-run the query.





What do you use internally? There are SMILES in the db, so you must be converting molecules to some sort of SMARTS or SMILES string for searching.

ChemAxon a3d59b832c

09-08-2005 16:43:25

olefevre wrote:
Sorry by string form I meant a one-liner, something that could be used in logs, caches etc and still capture all the information required to re-run the query.





What do you use internally? There are SMILES in the db, so you must be converting molecules to some sort of SMARTS or SMILES string for searching.
The one-liner which can contain the most query features is Chemaxon extended smarts (cxsmarts):





http://www.chemaxon.com/marvin/doc/user/cxsmiles-doc.html





For the communication between the client and the server we use the mrv format, but this is customizable in the sample JSP application.





You are right about that in the database we use the more compact chemaxon extended smiles (cxsmiles), but that is only prepared for representing molecules. This means that many query features are not supported.





Best regards,





Szabolcs

User f5e6ccf034

09-08-2005 17:51:52

But would I always get the exact same results from a cxsmarts and from a Molecule search?

ChemAxon a3d59b832c

10-08-2005 14:11:19

Not necessarily. There can be differences if the Molecule comes from non-smarts/smiles format. See the relevant section of the Query Guide:





http://www.chemaxon.com/jchem/doc/user/Query.html#daylMDLDiff





All the best,


Szabolcs

User f5e6ccf034

10-08-2005 18:43:17

So in general such a one-line string does not exist?

ChemAxon a3d59b832c

11-08-2005 08:28:38

That's correct.

User f5e6ccf034

11-08-2005 15:40:56

But then what do you use in your cache? The SMILES and SMARTS


strings don't capture all the information, you told me, and thus are


not suitable. OTOH formats such as *.mol or *.mrv capture way


too much information (e.g., exact atomic positions) and thus are


even more unsuitable. So how are queries represented in your


cache??? This discussion would seem to imply that there is no way


to implement caching.

ChemAxon a3d59b832c

11-08-2005 15:52:17

Please note that currently JChem does not support importing queries into the database, only molecules. The structure cache uses cxsmiles, which contains all structural features.





We plan to allow importing query structures into the database in a future version.





Best regards,


Szabolcs

User f5e6ccf034

11-08-2005 16:04:04

So what you call the cache in this context is just the content of the JCHEM_STRUCTURES table?

ChemAxon a3d59b832c

11-08-2005 16:11:09

Yes, that's correct.

ChemAxon 9c0afc9aaf

11-08-2005 16:43:20

Hi,





The structure cache contains the following information in memory for each structure:





- Fingerprints: These correspond to the cd_fp1, cd_fp2, ... columns in the table.


Fingerprints allow to rapidly filter out most of the non-hit molecules in the first phase of the search.





- ChemAxon Extended Smiles: this corresponds to the cd_smiles column. In most cases it can describe the required features for the search. In some rare cases it cannot, in this case the cd_smiles is null, and the cd_structure column is accessed from the table.


(in other cases no DB acces is required at all)





The smiles part is compressed and the storage structure is optimized.


This makes the cache very effective: only around 100MB is needed for 1 million typical structure.





Best regards,





Szilard

User f5e6ccf034

11-08-2005 18:23:34

Ah ok now it's finally becoming clearer. I wouldn't call the structures


a cache: it's more a case of precomputing some aspects of the searches.





OTOH so it would seem that at the moment one couldn't implement


a proper cache anyway since there is no "exact" representation of


an arbitrary query molecule (in the sense that it captures neither


too much nor too little information). Correct?

ChemAxon a3d59b832c

12-08-2005 10:21:47

olefevre wrote:
Ah ok now it's finally becoming clearer. I wouldn't call the structures


a cache: it's more a case of precomputing some aspects of the searches.
That's correct, but it is within the meaning of cache. Below is a quote from the Oxford Dictionary:
Quote:
cache n. & v.


...


3 (in full: cache memory) (Computing) an auxiliary memory from which high-speed retrieval is possible.


olefevre wrote:
OTOH so it would seem that at the moment one couldn't implement


a proper cache anyway since there is no "exact" representation of


an arbitrary query molecule (in the sense that it captures neither


too much nor too little information). Correct?
What are your intentions? Would you like to store a search history which would store the queries performed?





If so, I suggest to put the queries in a separate table in mrv format. (Currently you cannot use a JChem table for that, but I suggest to create a simple table with a text column large enough to hold the query mrv-s.)





Or do you want to store the hits of a previous search? You can do that with the following methods of JChemSearch:





http://www.chemaxon.com/jchem/doc/api/chemaxon/jchem/db/JChemSearch.html#setResultTable(java.lang.String)


http://www.chemaxon.com/jchem/doc/api/chemaxon/jchem/db/JChemSearch.html#setResultTableMode(int)





Best regards,


Szabolcs

User f5e6ccf034

18-08-2005 15:25:38

> Would you like to store a search history which would store the queries


> performed? [...] Or do you want to store the hits of a previous search?





No, I don't care about the history and past results as such. I was


wondering if it was possible to create a true results cache so that


equivalent queries don't have to be rerun. But since there is no


format that exactly captures all the features of a query, neither


too much nor too little, that won't be possible.