standardize / cache question

ChemAxon 60ee1f1328

14-02-2007 11:22:28

I have re-standardized a particular table (using jcman and an .xml) and in the cd_smiles column the SMILES string looks to be in the expected standardization state.





When I load up the vanilla application I still see the old (uncorrected) version of the molecule and if I "copy as SMILES" I also get the uncorrected version of the SMILES.





I have cleared all the cache (client and server) that I know of IE and Java cache and have emptied the tomcat work dir - of course I have tried restarting tomcat on several occasions.





What can I do next?


db.

ChemAxon 60ee1f1328

14-02-2007 11:39:37

perhaps the jsp vanilla application reads from the cd_structure column?





Can I also complete a re-standardize against the cd_structure as well as the cd_smiles column?





db.

ChemAxon 60ee1f1328

14-02-2007 11:45:11

OK - we see that the cd_structure has the unfixed version of the SMILES contained in it and the JSP vanilla application is obviously reading from this column - how can we get a re-standardization to update this column as well please?

ChemAxon 60ee1f1328

14-02-2007 11:47:09

i.e. we assumed that a re-standardization step would render the entire record re-standardized. Why the difference in values between these columns?

ChemAxon 60ee1f1328

14-02-2007 12:21:44

we have also tried changing $cd_structure to cd_smiles in the setup.jsp page and this does not appear to work in terms of what we view in the JSP application? Please help!

ChemAxon 60ee1f1328

14-02-2007 12:39:50

so to summarise, we would like to able to either:





re-standardize the cd_structure column or


be able to point the JSP application at the cd_smiles column.





We could really do with an answer asap please.





Cheers,


Daniel.

ChemAxon 9c0afc9aaf

15-02-2007 08:38:41

Quote:
When I load up the vanilla application I still see the old (uncorrected) version of the molecule and if I "copy as SMILES" I also get the uncorrected version of the SMILES.
The "Copy as SMILES" menu copies the current (visible) structure as SMILES (of course), if you see an "uncorrected" structure, the SMILES will be copied accordingly.
Quote:
perhaps the jsp vanilla application reads from the cd_structure column?
Yes.
Quote:
Can I also complete a re-standardize against the cd_structure as well as the cd_smiles column?
No.
Quote:
we assumed that a re-standardization step would render the entire record re-standardized. Why the difference in values between these columns?
To preserve your structures.


JChem always keeps the original structures, so a bad standardization configuration (or a standardization bug) can never ruin them permanently.
Quote:
we have also tried changing $cd_structure to cd_smiles in the setup.jsp page and this does not appear to work in terms of what we view in the JSP application?
The documentation writes:





Code:
FIELD_NAME     This can be the name of the structure table's field which value appears, or one of these:





    * $cd_structure - structure is shown


    * $dissimilarity - in case of similarity search the value of dissimilarity ratio is shown






So it's either a plain data field (without $), or one of the above two special identifiers. In short: this is not a viable option.
Quote:
so to summarise, we would like to able to either:





re-standardize the cd_structure column or



Regeneration (whether you change the standardization or not) never touches the cd_structure field, and it's deliberately so.
Quote:
be able to point the JSP application at the cd_smiles column.
You would have to modify the code for this, but this is not recommended as this field can also be NULL at certain cases (if some feature cannot be described in cxsmiles W/O information loss, and the cd_structure must be used). This field is for internal use only, used by the search process.





If you want to display the standardized structures, one solution is to directly standardize them on-the-fly before displaying them with Standardizer:





http://www.chemaxon.com/jchem/doc/api/chemaxon/reaction/Standardizer.html


http://www.chemaxon.com/jchem/doc/guide/standardizer/index.html





The other possibility is to standardize the structures before import:





http://www.chemaxon.com/jchem/doc/user/Standardizer.html








Best regards,





Szilard

ChemAxon 60ee1f1328

15-02-2007 09:10:55

Szilard,





Thanks for your extensive and clear answer.





OK so it seems to me that I may now have two options.





1. Write a bespoke (MSketch/Mview) application that can access the cd_smiles column - we have done this now which is quite handy!





2. Export the cd_smiles column to a file and then perform an import into another "clean" jchem table.





Option 2 above is probably the best thing to do.





Question: Is the fingerprint based upon the contents of the cd_structure column or is this data based on the cd_smiles column? If I placed a jcidx on the cd_smiles, presumably the fingerprints in the associated table that is created with the index would be relevant to the indexed column i.e. cd_smiles.





Cheers,


Daniel.

ChemAxon 9c0afc9aaf

15-02-2007 17:03:58

Hi,
Quote:
OK so it seems to me that I may now have two options.





1. Write a bespoke (MSketch/Mview) application that can access the cd_smiles column - we have done this now which is quite handy!
I do not recommend this, as cd_smiles can sometimes eb NULL.


As I wrote, we do not recommend the usage of cd_smiles in general.


(This field is to be used by the JChem search process only)





I also don't understand why would you need a new application for this - you could just as well modify the JSP application to use this field.
Quote:
2. Export the cd_smiles column to a file and then perform an import into another "clean" jchem table.
I do not recommend this either, partly because of the reason above, partly because you may loose information, such as coordinates.


(in case your original input had coordinates)





I would rather export the table with JChem manager into MRV format (it can store every kind of information), then standardize it with the Standardizer GUI or command-line script (standardize), and then import it back to an other table.





Shortcomings of this solution:


- if the standardization changes, it's not easy to convert the structures again, and if you don't keep the original structures, the new standardization will get structures that were already modified before





- if someone inserts new structures, these won't be visibly standardized (although they will be standardized OK for search, just like now)





I think the best solution is to use the Standardizer API to standardize the structures during display.


Of course you should make sure it's the same standardizer that's currently used for searching.


There is no public API to get this yet, but until then you can use an unofficial solution from chemaxon.jchem.db.TableInfo:





Code:
public static Standardizer getStandardizer(ConnectionHandler ch,


            String tableName) throws SQLException, StandardizerException {
Quote:
Question: Is the fingerprint based upon the contents of the cd_structure column or is this data based on the cd_smiles column?
I would rather put it like this: the fingerprints reflect the standardized state of the molecule, otherwise the search would fail. The cd_smiles column is also generated from the standardized molecule.
Quote:



If I placed a jcidx on the cd_smiles, presumably the fingerprints in the associated table that is created with the index would be relevant to the indexed column i.e. cd_smiles.
As the cd_smiles column is internal data for the search process, I strongly discourage you to index it with the cartridge.





Best regards,





Szilard

ChemAxon 60ee1f1328

16-02-2007 17:40:16

Szilard,





Once we have the the "correctly" standardized SMILES I intend to take a cut (csv, tab separated) and import into a "clean" table - this appears to be the quickest way round this contrary to your advice (sorry) - generating a .mrv file looked like it would take a while.





On import both the cd_structure and cd_smiles columns will get the correct structured assigned and I think I should be where I need to be?


We have no 3d data as such in the "uncorrected" cd_structure column at present and so I don't expect to lose anything as such. Please comment if you think otherwise - this appears to be OK as far as I can see.





Cheers + nice weekend,


Daniel.