how does jcunique work

User d0132fa8bc

13-08-2015 11:41:09

Hi all,


 


I want to use jcunique to filter out duplicates.


The command itself works great, but I could not find any definition how it works.


Do I need to standardize my structures before, or is this somehow integrated?


 


Best


Björn

User d0132fa8bc

14-08-2015 07:38:19

To add more information to my question:


I need to do a duplicate search in sd-files. It is no problem to do a standardization or other modifications before.


I am just not sure what command might be the best. Additionally I need the ids, or some kind of identifier in my results file because afterwards I want to merge additional information given in the sd-files together.


 


What is the best way to do it from the command line?


 


Best


Björn

ChemAxon abe887c64e

14-08-2015 12:43:40

Hi Björn,


Instead jcunique we recommend the following step to run. Unfortunately, jcunique handles purely the structures  - without any additional data / identifier.



  1. Start JChem Manager by running 'jcman' from command line. See here the documentation of JChem Manager.

  2. Create a table (table1) and import one of your sd-files into table1.

  3. Export the cd_id field of table1 into an output.sdf file (The structures are exported.by default.)

  4. Create a second table by JChem Manager (table2). Here you can set the desired standardization rules. Check 'Filter out duplicate structures' checkbox.

  5. Modify table2 by adding an extra column (identifier).

  6. Import output.sdf into table2 and connect the cd_id field of the sd-file with the identifier column.


As a result, in table2 there will be the unique structures from the original sd-file, together with an identifier number.


The above steps can be executed from command line as well by running jcman with the appropriate parameters.


Please let us know if you need more infomation.


Krisztina