Problem with duplicate filtering using empty DB

User 677b9c22ff

12-10-2007 23:52:26

Hi,


I have DB were I removed all structures using the DB entity editor.


to proof that I used the fingerprint statistics, also using show all in


the query mode did not show any compound, so I think this DB must be empty (an it is truly empty):





Code:



Statistics for table: APP.NEW_JCHEMBASE_TABLE


--------------------


Row count: 0


NULL SMILES count: 0


Average SMILES length: 0.0


Average compressed SMILES length: 0.0


Markush structure count: 0 (0.0%)





Fingerprint settings:





Length (bits): 512


Pattern length: 6


Bits set per pattern: 2





Min. CFP darkness: 2.147483647E7%     cd_id: 0


Max. CFP darkness: 0.0%     cd_id: 0


Avg. CFP darkness: 0.0%





Chemical Fingerpint distribution:


--------------------------------


0% - 10% : 0.0 %


10% - 20% : 0.0 %


20% - 30% : 0.0 %


30% - 40% : 0.0 %


40% - 50% : 0.0 %


50% - 60% : 0.0 %


60% - 70% : 0.0 %


70% - 80% : 0.0 %


80% - 90% : 0.0 %


90% - 100% : 0.0 %








If I now import new SD files it always gives and error during import


says something about CD_ID 1,861,


How can it know and assign a number higher than 1200, if the database has


only 1200 structures?





Code:



Created new field CdId 3


Created new field MolWeight 3


Created new field Formula 3


Structure is mapped to current field Structure


field_0 is mapped to current field field_0


 count is mapped to current field  count


 hits is mapped to current field  hits


Starting to import data...


...snip....


Structure 1,132 not imported. It is a duplicate of CD_ID 1,849 in the database.


Structure 1,133 not imported. It is a duplicate of CD_ID 1,849 in the database.


Structure 1,135 not imported. It is a duplicate of CD_ID 1,850 in the database.


Structure 1,139 not imported. It is a duplicate of CD_ID 1,853 in the database.


Structure 1,142 not imported. It is a duplicate of CD_ID 1,855 in the database.


Structure 1,145 not imported. It is a duplicate of CD_ID 1,857 in the database.


Structure 1,148 not imported. It is a duplicate of CD_ID 1,859 in the database.


Structure 1,149 not imported. It is a duplicate of CD_ID 1,859 in the database.


Structure 1,150 not imported. It is a duplicate of CD_ID 1,859 in the database.


Structure 1,151 not imported. It is a duplicate of CD_ID 1,859 in the database.


Structure 1,154 not imported. It is a duplicate of CD_ID 1,861 in the database.


Structure 1,159 not imported. It is a duplicate of CD_ID 1,487 in the database.


Structure 1,160 not imported. It is a duplicate of CD_ID 1,740 in the database.


Structure 1,161 not imported. It is a duplicate of CD_ID 1,450 in the database.


Structure 1,162 not imported. It is a duplicate of CD_ID 1,288 in the database.





Import completed in 4s.


703 entries successfully imported.


0 Errors.


459 were not imported as they were duplicates








Obviously to get rid of the memory from the old DB fields


you have to drop the whole table or kill the whole template.


That refers also to my previous post about the complexity


of Instant-JChem.





Instant JChem Version: 2.1 (build: 071002) JChem Version: 3.2.11 Marvin Version: 4.1.13 (build date: 2007-9-20) Java: 1.5.0_11; Java HotSpot(TM) Client VM 1.5.0_11-b03 System: Windows XP version 5.1 running on x86; Cp1252; en_US (instantjchem)





Tobias

ChemAxon fa971619eb

13-10-2007 08:15:06

The values for the CD_ID column in the database are automatically incremented each time a new value is entered. This is a standard and necessary approach for all relational databases (though the mechanism of how this happens does differ).


Deleting a row does not make that ID available for a new row that will be imported.





So if you import 1000 rows and then add one more row the new ID will be 1001.


And if you import 1000 rows, then delete those 1000 rows and then add one more row the new ID will still be 1001.





If you want your IDs to start from 1 then you need to use a new table.


or you can have the ID provided as an additional field in the file that you import.





Tim

User 677b9c22ff

13-10-2007 09:02:13

Hi,


assume I am a chemist or biologist with no database and


no SQL(??) knowledge and I have a database with 1200


structures. I can figure out that CD means compound


and ID means ID.


I get the following error:





Code:
Structure 1,132 not imported. It is a duplicate of CD_ID 1,849 in the database.






Should I be worried or concerned or just ignore it?





Thats a pretty complex question because on one one hand


its good to have many options, but on the other hand you


have now many troubles you would only have with larger


DBs like Oracle (including millions of error messages).





Just compare the JChemManager GUI and Instant-JChem entity manager.


Which one is more powerful and more complex?


More complex and powerful is the Instant-JChem manager (not shown here, follow the link).


Tobias