How to insert Markush and patent number into JChem Tables

User c1ce6b3d19

11-12-2010 12:57:08

A User would like to insert Markush structures and patent numbers from .vmn (Markush DARC format) file.




There are a large number of .vmn files each holding a markush
structure.  These molecules need to be added to a database table and an
additional table column (e.g. "patent_no") added to hold the patent
number.  Most likely you want to use the JChem API.  There may be a few
ways to do this and using the UpdateHandler class is a good way. 



The example uses the following classes: ConnectionHandler, Structure Table Options, and Update Handler.


 



  1. ConnectionHandler is used to open a connection to the database.  Connection information and examples can be found here.

  2. StructureTableOptions is used to set the options for a new markush table.

  3. UpdateHandler is used to create a new table, and add a new markush structure and insert the patent number into an additional column.




    //  Database table name to be created
String markushTableName = "myMarkushLibrary";
    // Additional column to be created
 String patentNumberColumn = "patent_no";

...
//Open a connection to your database

    ConnectionHandler conh = new ConnectionHandler();
conh.setDriver("oracle.jdbc.driver.OracleDriver");
conh.setUrl("jdbc:oracle:thin:@inhale:1521:demodb");
conh.setLoginName("INTERNAL");
conh.setPassword("oracle");
conh.connect();


//Create a new Markush Library table if one does not exist.
StructureTableOptions sto = new StructureTableOptions();
sto.name = markushTableName;
 sto.tableType = StructureTableOptions. TABLE_TYPE_MARKUSH_LIBRARIES;

UpdateHandler uhCreate = new UpdateHandler();
try{
uhCreate.createStructureTable(conh, sto);
} finally {
uhCreate.close();
}

//Read in from a vmn file. This section may need to be repeated inside a for loop to read all vmn files.
String filename = //the file name
String patentNumber = // the patent number parsed from the file name.
byte[] vmnBinary = //the binary structure data read from the file.

UpdateHandler uh = new UpdateHandler(conh,
UpdateHandler.INSERT, markushTableName, patentNumberColumn);
try {
uh.setStructure(vmnBinary);

//This sets the parsed patent number into the patent Number column.
uh.setValueForAdditionalColumn(1, patentNumber);
 uh.execute();
} finally {
uh.close();
}
...

User 24ceacde10

14-12-2010 20:13:14

The flag UpdateHandler.INSERT should be used instead of UpdateHandler.UPDATE when configuring the UpdateHandler to load structures to a newly created database.

User c1ce6b3d19

15-12-2010 08:06:24

Chris,


You're right.  That was my mistake.  I have now changed the example to replace the UPDATE with INSERT.

User 24ceacde10

15-12-2010 22:30:27

Markush files (*.vmn) may have associated *.amn files, so it is important to read these Markush records completely before loading to the JChem database. The following code illustrates one approach that works.


Assume an array of files of type (*.vmn) present in the same directory is defined.


for (int i=0; i<files.length; i++)
   {
    try
    {
     //Read the contents of the Markush VMN file
     MolImporter molImporter = new MolImporter(files, "vmn");
     Molecule m = molImporter.read();
     String mrv = m.toFormat("mrv");
     byte[] bytes = mrv.getBytes();


     //Get PatentNumber from filename
     String patentNumber = files.getName().toUpperCase();
     patentNumber = patentNumber.substring(0, patentNumber.lastIndexOf('.'));
     patentNumber = patentNumber.replaceFirst("[A-Z]*[0-9]*_[0-9]*$", "");
     logger.log(Level.INFO,"\tProcessing patent: "+patentNumber+" with Markush file "+files.getName().toUpperCase());


     //Upload binary structure file and patent number to database
     uh.setStructure(bytes);
     uh.setValueForAdditionalColumn(1, patentNumber);
     uh.execute();
    }
    catch (Exception ex)
    {
     logger.log(Level.SEVERE,"Error processing file "+files.getAbsolutePath());
     logger.log(Level.SEVERE,ex.getMessage());
    }
   }

ChemAxon a3d59b832c

16-12-2010 08:54:27

Hi Chris,


JChem 5.4.0 already handles the .AMN files natively. It indeed assumes that it is in the same directory as the .VMN file, and it is appended after the VMN data in cd_structure.


 


Best regards,


Szabolcs