User 8139ea8dbd
24-10-2008 18:06:09
Marvin Sketcher can draw polymer-containing compounds, which is very nice. However, we cannot convert that to either smiles/smarts (extended Jchem format). Therefore, our jchem cartridge index for our compound collection table is built on the smiles column, which seems won't be able to handle such registration requests. Any suggestions or workarounds you can think of? Thanks.
ChemAxon a3d59b832c
26-10-2008 22:29:56
We are planning to handle polymers in extended smiles format and in JChem databases from version 5.2. (Expected in the first half of 2009.)
Best regards,
Szabolcs
ChemAxon 9c0afc9aaf
28-10-2008 23:10:41
Hi,
An obvious workaround may be to switch to a format for registration that supports these compounds.
For both JChem and plain tables the format of the inserted structures may be mixed.
If one is indexing a JChem table with the cartridge the the reference for the cd_smiles column is just symbolical, it does not matter if it is NULL for some rows, the structure will still be found.
Szilard
User 8139ea8dbd
29-10-2008 05:15:36
Excellent suggestion!
Two questions for my own education 1) so you are saying cd_smiles actually is not used at all by the cartridge, right? (I initially thought that's what being cached in the structure search server) 2) does it mean the original structure column on which we build the index is the column that being cached in the memory of the structure searching server? If that's the case, in general, it's still preferable to index smiles rather MOL, because the latter would increase the memory requirement of the structure search program. Is it right? (We say 1 million structures roughly use 100 MB server memory, that estimation was made based on the assumption that we are indexing smiles strings, right?)
ChemAxon 9c0afc9aaf
30-10-2008 15:13:21
Some further clarification:
One of my colleagues pointed out have informed me that the brackets of repeating unit definition of these polimers are currently ignored during the search anyway (as if it would be there only once), so you should consider this when planning a substructure search.
My colleague Peter explains caching in a more explicit way:
Quote: |
The cd_structure column is not cached. It is always the cd_smiles column that is cached for jc_idxtype-indexes. If no extended smiles value is available for a given structure (cd_smiles IS NULL), only the fingerprints will be cached (which are cached anyway), and the memory requirement of the structure cache will actually decrease, rather than increase. In short: the penalty for structures not having a compact extended smiles representation is on the performance side, not on the memory side. |
Here is some more additional explanation on why the cd_smiles can be NULL:
http://www.chemaxon.com/forum/viewpost403.html
Best regards,
Szilard