User a7faa21887
15-06-2011 15:11:14
Hi, everyone.
I have about 670,000 mol-files with own IDs from previous system.
Can I import it with 'jcman a ...' or something else with keep own ID of each structure?
Each MOL-file contains only MOL-structure and it's ID in filename (one file is attached as example).
Thanks,
Pavel
ChemAxon 9c0afc9aaf
15-06-2011 21:36:33
Hi,
There are two options I can think of, each require some scripting or programming.
A common step is to create a JChem table which has an additional data column that will suit your identifiers.
Solution A:
1. Convert the molfiles to SD files and add the ID into the data tag (with a script or small program).
It is pretty easy, since they are almost the same, just have to add the data to the end and close with $$$$, see:
http://www.mdl.com/downloads/public/ctfile/ctfile.pdf
2. Fuse the SDFiles, e.g. "cat *.* > bigfile.sdf"
3. Now you can use jcman to load the molecules with their ID.
Step #2 is needed because otherwise the import would be much too slow one-by-one.
Solution B:
You may choose to use our API directly for the import while iterating trough the molecule files in your own program.
Kay Java classes you will need:
Import:
http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/formats/MolImporter.html
Insert into DB:
http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/jchem/db/UpdateHandler.html
There are also means of accessing our API from non-Java environment, e.g. Web services (SOAP), .NET API, Cartridge for Oracle.
Best,
Szilard
User a7faa21887
16-06-2011 09:56:19
Thanks, Szilard for ideas.
Yesterday I have imported almost all my structures directly in the database - from another table witch CLOB field that contains MOLs (working with mol-files was too slow). And I just saved all generated IDs in separate table with link new ID and old ID.
May be it will be usefull for others (script include exterman monitoring of process by dbms_application_info):
declare
l_new_id CD_ID_ARRAY;
l_clob clob;
l_old_id number(10);
l_totalwork integer := 0;
l_sofar integer := 0;
l_rindex binary_integer;
l_slno binary_integer;
begin
SELECT count(chem_id) INTO l_totalwork FROM sd_chemstructs;
l_rindex := DBMS_APPLICATION_INFO.set_session_longops_nohint;
l_sofar := 0;
for f in (SELECT c.chem_id FROM sd_chemstructs c, ca_chemstructs_ids i WHERE c.chem_id = i.chem_id(+) AND i.chem_id is null ORDER BY 1) loop
SELECT accord_mol INTO l_clob FROM sd_chemstructs WHERE chem_id = f.chem_id;
begin
l_new_id := jchem_table_pkg.jc_insert(l_clob, 'CA_CHEMSTRUCTS', null, null, 'false', 'haltOnBadFormat:y');
exception when others then
l_new_id := null;
end;
if l_new_id is not null and l_new_id.count > 0 then
INSERT INTO ca_chemstructs_ids(cd_id, chem_id)
VALUES (l_new_id(1), f.chem_id);
if mod(l_sofar, 100) = 0 then
commit;
end if;
dbms_application_info.set_session_longops(l_rindex, l_slno, 'CHEMAXON: MOL CONVERT', l_new_id(1), null, l_sofar, l_totalwork, f.chem_id, 'rows');
end if;
l_sofar := l_sofar + 1;
end loop;
commit;
end;
/
So, at the end I have some difference between source data and target - just for some errors.
And I will use your idea for additional column and just fill it with old IDs.
Best regards
Pavel