Export with customized DB field

User a7e5cba6fc

31-01-2007 15:27:08

I'm trying to export a hit list after a substructure search into SDF data.


My question is: is it possible to change the default exported field list so that I can see the "CD_ID" in the SDF data instead of "CD_STRUCTURE" ?





the "CD_ID" must be in the first place and not as sdf-comment "> <CD_ID>"





thanks

ChemAxon a3d59b832c

31-01-2007 15:43:16

Hi Nejd,





I am afraid it is not possible. The SDF format must contain structural information. If you are using our JSP example (http://www.chemaxon.com/jchem/examples/jsp1_x ), then at export you can specify the extra fields to export, including cd_id.





I hope this helps. If not, please explain what format you intend to generate.





Szabolcs

User a7e5cba6fc

01-02-2007 01:32:21

Hi Szabolcs,


thank you for your quick reply.





1) I'm using your JSP example and I export, after specifying the extra field (e.g. CD_ID), into SDF fromat.


2) now the SDF file will be used as input to other tools for different treatments and I get finally another ordered list with solely molecule names (without any further informations).


3) Therefore I must make a new search to obtain the correspondent SDF data to the ordered list.








Nejd

ChemAxon a3d59b832c

01-02-2007 08:10:48

Hi,





Could you provide an example about the conversion you intend to make?





You have to be aware that the JSP code is an open source example, and you can modify the source code to suit your needs.





It may also interest you that we are working on a IUPAC naming module (structure to name) which is in testing phase now. In the future it will be very easy to automatically generate IUPAC name at database import (by way of the already available calculated columns) or export, with some custom coding.





Szabolcs

User a7e5cba6fc

01-02-2007 12:24:52

Hi,





As attachment is a list with molecule names and that's the output of step 2). the most important hier is the order of the names (CD_STRUCTURE).


now (in step 3), I try to get the SDF data with the same molecule order as in the list.


what i would like to know:


is it possible to run this example of Query "select * from table_name where cd_structure like 'MEPHENYTOIN%' " i.e. can the molecule name identify one and only one row in the DB otherwise I must somehow use the "CD_ID" field since it's the primary key.. and thanks








Nejd

ChemAxon a3d59b832c

02-02-2007 09:02:46

Hi Nejd,





I am afraid it is not possible before our name to structure (or structure to name) project completes.





One possibility is to store the molecule name in a custom column, but you have to be careful to keep this column unique (not to import the same molecule twice or not to use the same name for two different molecules).





Another possibility is to somehow keep track of the cd_id-s during the external process where you generate the molecule names list. Perhaps if these are exactly the same list of structures as the hits themselves, you can keep the original exported SDF and write a custom code to read the two lists (structures + names) in parallel to make the association.





I hope this helps,





Szabolcs

User fdee5ee126

02-02-2007 13:05:24

Let me add some Info about my colleague's Problem:





We want to export customized subset of our JChem Database, which contains each single molecule, needed within our work group.


That is not the problem.


However, if we want to export a custom list of molecules (usually a Hitlist, derived form self-written algorthms), the problem occures:


Is there any functionality, which enables me to define either a list of CD_IDs or Compound Names identifiers or so, by which I can define


the Molecules AND THE ORDER of the Molecules exported to e.g. a sdfile.





There is a variety of functionalities, in the search/export panels. However we have some problems implementing that slightly different functionality.





Thereby we discussed another functionality we would like to use and which in a way apprechiates Szabolcs's oppinion


""Another possibility is to somehow keep track of the cd_id-s during the external process where you generate the molecule names list""





Is it possible to define another Database Field, that is then written to the frist line of the Headder of an sdfile, e.g CD_ID?


Thus "naming" the molecules according to their ID_ID we, could then simply use some "Where CD_ID is..." commands in order to export.


Allthough the problem of keeping the right order of the Hitlist would still have to be solved





I've allresdy written a Perl-script, reading the CD_ID Field of the sdf and replacing the Name by this ID. It's just that we do not want


to srore a bunch of scripst, but have the routine implemented in the database-software.





Thanks in advance for your support,





Markus Kossner

ChemAxon a3d59b832c

02-02-2007 22:39:45

m.kossner wrote:
Let me add some Info about my colleague's Problem:





We want to export customized subset of our JChem Database, which contains each single molecule, needed within our work group.


That is not the problem.


However, if we want to export a custom list of molecules (usually a Hitlist, derived form self-written algorthms), the problem occures:


Is there any functionality, which enables me to define either a list of CD_IDs or Compound Names identifiers or so, by which I can define


the Molecules AND THE ORDER of the Molecules exported to e.g. a sdfile.
I see now. It is probably not very difficult to write such a code using the Marvin/JChem API. You should check out the following classes:





Molecule http://www.chemaxon.com/marvin/doc/api/chemaxon/struc/Molecule.html


MolImporter http://www.chemaxon.com/marvin/doc/api/chemaxon/formats/MolImporter.html


MolExporter http://www.chemaxon.com/marvin/doc/api/chemaxon/formats/MolExporter.html


Let us know if you have questions about the use of these. These examples may also be of interest: http://www.chemaxon.com/marvin/examples/index.html
width="90%" cellspacing="0" cellpadding="3" border="0" align="center"> m.kossner wrote: There is a variety of functionalities, in the search/export panels. However we have some problems implementing that slightly different functionality.





Thereby we discussed another functionality we would like to use and which in a way apprechiates Szabolcs's oppinion


""Another possibility is to somehow keep track of the cd_id-s during the external process where you generate the molecule names list""





Is it possible to define another Database Field, that is then written to the frist line of the Headder of an sdfile, e.g CD_ID?


Thus "naming" the molecules according to their ID_ID we, could then simply use some "Where CD_ID is..." commands in order to export. You can set the first line of the structure by Molecule.setName()
m.kossner wrote:
Allthough the problem of keeping the right order of the Hitlist would still have to be solved





I've allresdy written a Perl-script, reading the CD_ID Field of the sdf and replacing the Name by this ID. It's just that we do not want


to srore a bunch of scripst, but have the routine implemented in the database-software.
So would you like to implement the above perl script as part of the JSP application? Hopefully the above links will give you enough guidance. Let us know if not.





All the best,


Szabolcs

ChemAxon 9c0afc9aaf

06-02-2007 08:48:14

Hi,





Some additional information that may help:
Quote:
Is there any functionality, which enables me to define either a list of CD_IDs or Compound Names identifiers or so, by which I can define


the Molecules AND THE ORDER of the Molecules exported to e.g. a sdfile.
I suggest to also take a look at Exporter.setSelectStatement() in out API, here you can specify an SQL statement which can contain an "ORDER BY" clause:





http://www.chemaxon.com/jchem/doc/api/chemaxon/jchem/db/Exporter.html#setSelectStatement(java.lang.String)





Otherwise if you need to fetch the structures one-by-one from the database you should fetch the cd_structure column yourself by executing a JDBC query.


Since the more than one data type is allowed for the cd_structure column, I suggest using one of the following helper methods to retrieve the structure source from the ResultSet:





http://www.chemaxon.com/jchem/doc/api/chemaxon/util/DatabaseTools.html#readBytes(java.sql.ResultSet,%20int)


http://www.chemaxon.com/jchem/doc/api/chemaxon/util/DatabaseTools.html#readBytes(java.sql.ResultSet,%20java.lang.String)


http://www.chemaxon.com/jchem/doc/api/chemaxon/util/DatabaseTools.html#readString(java.sql.ResultSet,%20int)





I hope this helps.





Best regards,





Szilard