Import and discard duplicates?

User 169b52bbd8

06-03-2008 22:51:47

Dear IJC experts,





I have a database which does not allow duplicate structures. I would like to update the database with an SDF that contains many duplicate structures. But when I try, the import quickly dies with too many 'duplicate entry errors' Is there a straightforward way to do this?





Put another way, can I do an INSERT IGNORE easily?





Cheers





David

ChemAxon fa971619eb

07-03-2008 09:44:59

How are you doing this?


You should be setting the duplicate filtering property of the JChem table, and then duplicates should not be imported. The import log should look something like this:





Structure is mapped to current field Structure


Starting to import data...


Structure 30 not imported. It is a duplicate of CD_ID 29 in the database.


Structure 31 not imported. It is a duplicate of CD_ID 29 in the database.


....


Structure 120 not imported. It is a duplicate of CD_ID 29 in the database.


Structure 121 not imported. It is a duplicate of CD_ID 29 in the database.


Structure 122 not imported. It is a duplicate of CD_ID 29 in the database.





Import completed in 1s.


100 entries successfully imported.


0 Errors.


93 were not imported as they were duplicates


Duplicate records can be found at


/home/timbo/structures/nci/NCI_aug00_100 with dups_duplicates.smiles











Tim