User f5e6ccf034
08-08-2005 23:51:00
I would like to know which string version of a Molecule object should be used so that searches using the string and the molecule will always return the exact same results.
ChemAxon 9c0afc9aaf
09-08-2005 06:47:28
User f5e6ccf034
09-08-2005 15:00:18
Sorry by string form I meant a one-liner, something that could be used in logs, caches etc and still capture all the information required to re-run the query.
What do you use internally? There are SMILES in the db, so you must be converting molecules to some sort of SMARTS or SMILES string for searching.
User f5e6ccf034
09-08-2005 17:51:52
But would I always get the exact same results from a cxsmarts and from a Molecule search?
ChemAxon a3d59b832c
10-08-2005 14:11:19
User f5e6ccf034
10-08-2005 18:43:17
So in general such a one-line string does not exist?
User f5e6ccf034
11-08-2005 15:40:56
But then what do you use in your cache? The SMILES and SMARTS
strings don't capture all the information, you told me, and thus are
not suitable. OTOH formats such as *.mol or *.mrv capture way
too much information (e.g., exact atomic positions) and thus are
even more unsuitable. So how are queries represented in your
cache??? This discussion would seem to imply that there is no way
to implement caching.
ChemAxon a3d59b832c
11-08-2005 15:52:17
Please note that currently JChem does not support importing queries into the database, only molecules. The structure cache uses cxsmiles, which contains all structural features.
We plan to allow importing query structures into the database in a future version.
Best regards,
Szabolcs
User f5e6ccf034
11-08-2005 16:04:04
So what you call the cache in this context is just the content of the JCHEM_STRUCTURES table?
ChemAxon 9c0afc9aaf
11-08-2005 16:43:20
Hi,
The structure cache contains the following information in memory for each structure:
- Fingerprints: These correspond to the cd_fp1, cd_fp2, ... columns in the table.
Fingerprints allow to rapidly filter out most of the non-hit molecules in the first phase of the search.
- ChemAxon Extended Smiles: this corresponds to the cd_smiles column. In most cases it can describe the required features for the search. In some rare cases it cannot, in this case the cd_smiles is null, and the cd_structure column is accessed from the table.
(in other cases no DB acces is required at all)
The smiles part is compressed and the storage structure is optimized.
This makes the cache very effective: only around 100MB is needed for 1 million typical structure.
Best regards,
Szilard
User f5e6ccf034
11-08-2005 18:23:34
Ah ok now it's finally becoming clearer. I wouldn't call the structures
a cache: it's more a case of precomputing some aspects of the searches.
OTOH so it would seem that at the moment one couldn't implement
a proper cache anyway since there is no "exact" representation of
an arbitrary query molecule (in the sense that it captures neither
too much nor too little information). Correct?
User f5e6ccf034
18-08-2005 15:25:38
> Would you like to store a search history which would store the queries
> performed? [...] Or do you want to store the hits of a previous search?
No, I don't care about the history and past results as such. I was
wondering if it was possible to create a true results cache so that
equivalent queries don't have to be rerun. But since there is no
format that exactly captures all the features of a query, neither
too much nor too little, that won't be possible.