How to remove duplicate structures in a SD file

User 1a928ad2db

11-06-2011 01:32:45

I wrote a test SD file of 100 compounds, among which there were 20 duplicates. When I imported this SD file into IJC 5.5, all structrues were shown in data table. How can I remove those duplicate structures from IJC 5.5?  I tried "Data trees-->General Settings-->Duplicate filtering", but it didn't work.

By the way, how can I standardize all the structures at certain pH (such as add hydrogen)? Many thanks!

ChemAxon fa971619eb

11-06-2011 08:33:48

Whether duplciates are allowed is detemined by the 'Duplicate filtering' setting of the JChem table when the data is imported. This means that is need to be set before the data is imported. The easiest way to do this is to set the property of the table in the import wizard. To do this check the '...' options button next to the 'Table details' selector in the 'File and new table details' step. Once done duplicates will not be imported.

When you say 'standardize all the structures at certain pH' I assume you mean generate a representation of the structure with the most stable protonation state (the major microspecies)? Assuming so this can be generated using a chemical terms field once the structures are imported. The chemical terms expression to use is 'majorMicrospecies("7.4")'.
See here for more details on adding a chemical terms field: />The field needs to be added as a text field, and will initially display (in grid or form view) as text, but you can change the renderer to the structure renderer in the settings of the column (grid view) or widget (form view). See the attached screen shot (MMS column).

See here for more details about the chemical terms langauge: />It contains several other functions that could be useful in this context.

Also see Standardizer: />This allows you to standardize molecules in a single representation so that charged groups are handled in a consistent manner. See particularly the 'Neutralize' action. Note this does not necessarily give the 'correct' charged representation, just a standard one that will be generated whichever charged form is enetered.
Stanadrizer can be specified at the database level, and so can be included in the process of duplicate filtering.This contrasts with the use of a chemical terms field described earlier.