CD_HASH use

07-01-2007 21:06:48

No stereo information is used during hash code generation.

The cd_hash column is mainly intended for internal use speeding up duplicate filtering. Structures with identical hash codes have a high probability of being identical, but there is also a chance that different structures obtain the same hash code.

Therefore JChem always runs a graph search before coming to the final verdict.

I'm also unsure what exactly do you want to achieve, but

- We usually recommend filtering out duplicates during the import via the standard tools provided by JChem

- If one also needs the duplicates (because of associated data, etc), you could store the structures only once (W/O duplicates) in a structure table, and refer to this table as a many-to-one relation

Best regards,

Szilard

08-01-2007 14:47:46

Thanks for your help,

Daniel.