How to import molfiles with own IDs? - ChemAxon Forum Archive

User a7faa21887

15-06-2011 15:11:14

Hi, everyone.

I have about 670,000 mol-files with own IDs from previous system.

Can I import it with 'jcman a ...' or something else with keep own ID of each structure?

Each MOL-file contains only MOL-structure and it's ID in filename (one file is attached as example).

Thanks,

Pavel

ChemAxon 9c0afc9aaf

15-06-2011 21:36:33

Hi,

There are two options I can think of, each require some scripting or programming.

A common step is to create a JChem table which has an additional data column that will suit your identifiers.

Solution A:

1. Convert the molfiles to SD files and add the ID into the data tag (with a script or small program).

It is pretty easy, since they are almost the same, just have to add the data to the end and close with $$$$, see:

http://www.mdl.com/downloads/public/ctfile/ctfile.pdf

2. Fuse the SDFiles, e.g. "cat *.* > bigfile.sdf"

3. Now you can use jcman to load the molecules with their ID.

Step #2 is needed because otherwise the import would be much too slow one-by-one.

Solution B:

You may choose to use our API directly for the import while iterating trough the molecule files in your own program.

Kay Java classes you will need:

Import:

http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/formats/MolImporter.html

Insert into DB:

http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/jchem/db/UpdateHandler.html

There are also means of accessing our API from non-Java environment, e.g. Web services (SOAP), .NET API, Cartridge for Oracle.

Best,

Szilard

User a7faa21887

16-06-2011 09:56:19

Thanks, Szilard for ideas.

Yesterday I have imported almost all my structures directly in the database - from another table witch CLOB field that contains MOLs (working with mol-files was too slow). And I just saved all generated IDs in separate table with link new ID and old ID.

May be it will be usefull for others (script include exterman monitoring of process by dbms_application_info):

declare
  l_new_id CD_ID_ARRAY;
  l_clob clob;
  l_old_id number(10);
  l_totalwork integer := 0;
  l_sofar     integer := 0;
  l_rindex    binary_integer;
  l_slno      binary_integer;
begin
  SELECT count(chem_id) INTO l_totalwork FROM sd_chemstructs;
  l_rindex := DBMS_APPLICATION_INFO.set_session_longops_nohint;
  l_sofar := 0;
  for f in (SELECT c.chem_id FROM sd_chemstructs c, ca_chemstructs_ids i WHERE c.chem_id = i.chem_id(+) AND i.chem_id is null ORDER BY 1) loop
    SELECT accord_mol INTO l_clob FROM sd_chemstructs WHERE chem_id = f.chem_id;
    begin
      l_new_id := jchem_table_pkg.jc_insert(l_clob, 'CA_CHEMSTRUCTS', null, null, 'false', 'haltOnBadFormat:y');
    exception when others then
      l_new_id := null;
    end;
    if l_new_id is not null and l_new_id.count > 0 then
      INSERT INTO ca_chemstructs_ids(cd_id, chem_id)
      VALUES (l_new_id(1), f.chem_id);
      if mod(l_sofar, 100) = 0 then
        commit;
      end if;
      dbms_application_info.set_session_longops(l_rindex, l_slno, 'CHEMAXON: MOL CONVERT', l_new_id(1), null, l_sofar, l_totalwork, f.chem_id, 'rows');
    end if;
    l_sofar := l_sofar + 1;
  end loop;
  commit;
end;
/

So, at the end I have some difference between source data and target - just for some errors.

And I will use your idea for additional column and just fill it with old IDs.

Best regards

Pavel