Exporter::setFormat: necessary?

User fa1369adab

22-02-2006 13:19:00

I sometimes store molecules in MOLFILE and sometimes in MRV format. I would like Exporter to extract molecules from the database in the same format in which I have entered them. But it seems that if I don't explicitly setFormat(), then MOLFILE format is returned. Am I mistaken?

ChemAxon 9c0afc9aaf

22-02-2006 18:01:45

Hi,





The basic concept of file export assumes that all of the exported structures are written into a single file.





If you want the exported file to be a legal structure file, you cannot mix different formats within a single file. Therefore the format must be provided.





Best regards,





Szilard

ChemAxon 587f88acea

22-02-2006 19:21:49

Szilard wrote:
The basic concept of file export assumes that all of the exported structures are written into a single file.
Even when the exports are done at different times?

ChemAxon 9c0afc9aaf

23-02-2006 08:48:44

Quote:
Even when the exports are done at different times?
I'm not sure what you mean, but for separate export processes you can specify different files of course.

ChemAxon 587f88acea

23-02-2006 12:43:59

Exactly! Each individual time we export a structure, we want the original format of the structure to be preserved in the output file. But we don't know what was the original format of that structure, so we can't specify it. And if we don't specify it, it is always exported in MOL format.





It seems to me that the two principles here can be reconciled. Suppose a user wants to export n structures. If a format is specified, then all n structures are exported in that format. If no format is specified, then all n structures are exported in the original format of the first structure retrieved. That way, we can export a single structure in its original format, and multiple structures will always be exported in the same format.

ChemAxon 9c0afc9aaf

24-02-2006 16:29:40

Hi,





I can only see one possible problem with the solution you suggested.


If the structures are exported into a file, the file extension and the file format may differ, making it a "faulty" structure file (as you don't know what format you would get).





Could you tell some details how do you intend to use the exported structures ? Do you want to save them as a file or use them in an other way ?





Regards,





Szilard

ChemAxon 587f88acea

24-02-2006 16:57:33

I believe we export them into a string, not a file. Furthermore, we always retrieve structures from the database one at a time.





Our problem is as follows: We want to store in the database both MOL structures that contain information not preserved upon conversion into MRV, and MRV structures that contain information not preserved upon conversion into MOL. The only way to preserve all the information is to retrieve all the structures in their original format.





The possible problem you raise is easily solved by proper programming. The programmer can easily set the extension of the file to depend on the file format.

ChemAxon 587f88acea

24-02-2006 17:55:02

Or, you can allow the format of the exported structure to depend on the original format only when the structure is exported to a string.

ChemAxon 9c0afc9aaf

25-02-2006 13:54:53

Hi,
Quote:
We want to store in the database both MOL structures that contain information not preserved upon conversion into MRV
The MRV format should be capable of storing every information that is present in a molfile.


If you think that something has been lost during the conversion please give us an example.


(Of course this only applies to standard MOL files)





Back to the original problem:


I think you need a "table reader" rather than an "exporter" to fetch the structures one-by-one.


Actually we are planning to provide some convenience API for this.





Until then you can use the following:





1. Read the structure from cd_structure with an SQL select.


Using DatabaseTools.readBytes() is recommended, as it can uniformly read multiple column types (e.g. cd_structure may be BLOB, CLOB, LONG RAW under Oracle)


2. Decompress it if necessary with MdlCompressor (BTW the compression of cd_structure can be disabled in the options menu of jcman)





Here is a sample code you can use:





Code:
/**


     * Method for getting the source of a structure from a JChem table.


     * @param con connection to the database


     * @param tableName the name of the structure table


     * @param cd_id the cd_id of the structure


     * @return the structure source, unpacked if necessary


     */


    public static String getStructureSource(Connection con, String tableName,


                                            int cd_id) throws SQLException {


        String source = null;


        String sql="SELECT cd_structure FROM " + tableName + " WHERE cd_id = "


                + cd_id;


        Statement stmt = con.createStatement();


        ResultSet rs = stmt.executeQuery(sql);


        if (rs.next()) {


            byte[] bytes= DatabaseTools.readBytes(rs, 1);


            try {


                bytes=decompressIfNeeded(bytes);


            } catch (IOException e) {


                e.printStackTrace();


            }


        }


        rs.close();


        stmt.close();


        return source;


    }





    private static byte[] decompressIfNeeded(byte[] bytes) throws IOException {


        String formatString = (new MolInputStream(


                new ByteArrayInputStream(bytes))).getFormat();


        if (formatString.startsWith("csmol")


                || formatString.startsWith("cssdf")


                || formatString.startsWith("csrdf")


                || formatString.startsWith("csrxn")) {


            bytes = MdlCompressor.convert(bytes, MdlCompressor.DECOMPRESS);


        }


        return bytes;


    }








You may decide to refine it, e.g. to use PreparedStatement if the performance important, but I hope it shows the general idea.





Let me know if this helps.





Best regards,





Szilard

ChemAxon 587f88acea

26-02-2006 02:37:46

Szilard wrote:
The MRV format should be capable of storing every information that is present in a molfile.


If you think that something has been lost during the conversion please give us an example.


(Of course this only applies to standard MOL files)


Exactly -- we use nonstandard MOL extensions to preserve information about unshared electrons. That information is not preserved upon conversion to MRV. And the MOL format doesn't preserve information about graphic objects contained in the MRV format.





(Perhaps your MOL-to-MRV converter could use the <property> tag to always preserve nonstandard information found in a MOL file?)





We'll discuss your solution and see if it works for us. And yes, a method in the API to accomplish the same task would be very useful to us.

ChemAxon 587f88acea

27-02-2006 13:53:17

Thanks, that did the trick for us.





Now we don't need to learn how to use <property> tags in MRV, either. Whew.

ChemAxon 9c0afc9aaf

27-02-2006 20:38:42

Hi,
Quote:
Now we don't need to learn how to use <property> tags in MRV, either.
I don't quite understand how the MRV part has been exactly solved. (unless you store these properties in a extra DB field ?)
Quote:
Exactly -- we use nonstandard MOL extensions to preserve information about unshared electrons. That information is not preserved upon conversion to MRV.
Good, so it's not a bug then :)





I doubt that in either MOL or MRV format using non-standard extensions is the ideal solution.





I still suggest to use a standard way of storing associated data which works fine with both formats.


The main principle is to store your custom data in a standard data fields.


(in case of MOL files you will need SDF of course , but it is basically just a molfile + data fields)


During import these can be directed to a DB column.


In the database these can be stored separately a structure + a data column.


During export the data stored in this column can also be stored in SDF or MRV formats in a fully compatible way.





You may also use store data in Data Sgroups, which is supported by both formats. The downside is that this data would be visible when viewing the molecule.








Szilard

ChemAxon 587f88acea

27-02-2006 22:59:29

Szilard wrote:
Hi,
Quote:
Now we don't need to learn how to use <property> tags in MRV, either.
I don't quite understand how the MRV part has been exactly solved. (unless you store these properties in a extra DB field ?)


We use MOL format for those structures that contain the extra information. JChem stores and retrieves those MOL structures without affecting the extra fields that we add.





We use MRV for other structures that do not require nonstandard information.





Your suggestion to use extra data fields probably makes sense, but it would require us to reengineer what is already working, so I doubt we'll implement it. We're happy with the MOL extensions. We're moving over to MRV only for mechanism questions because we need the graphic object information in it.