How to propagate an ID tag in SDF via JKlustor

User cd46b9a398

07-12-2011 00:12:16

I am running JKlustor as follows:


jklustor P10635a.sdf -o wrclus:smi:fw.smi  -o "wrmols:sdf:cluster_*.sdf" -c sphex:0.85


The file P10635a.sdf has 71 molecules, each tagged with an <ID> tag  as is normal in an SDF file.


File is attached, but the problem seems to exist with any SD file I try.


The output clusters cluster_*.sdf do not contain this <ID> tag.  I would like to propagate this <ID> tag into the clustered output. 


Question: how do I do this please?


JKlustor identifies itself as v0.07.  (Maybe that should have been my first clue?) 


Thanks


John

User cd46b9a398

14-12-2011 22:05:11

Hi Guys - 


Can I re-phrase my question:  I want to have the identifiers of the molecules in the clustered results. Is there a way to do this please?  At the moment, there does not appear to be any way to trace back from the clustered results to the input molecules.


Thanks


John

ChemAxon 8b644e6bf4

15-12-2011 00:53:51

Dear John,


 


Sorry for the late answer. Currently it is not possible to propagate input ids or other properties in jklustor. Implementing this functioanlity is in our plans, however it is not scheduled yet.


Using molconvert's canonic smiles functionality and simple bash tools a workaround can be constructed to assign input ID-s to generated cluster members. Overview:



Details



If you have further questions please do not hesitate to ask them


 


regards,


Gabor

User cd46b9a398

19-01-2012 17:20:06

Hi Gabor


Thanks for your advice. I got your workaround to work!


I appreciate your support.


John

User 247c00dc1d

01-08-2012 13:28:02








Dear 
Gabor ,


do You fix the problem at the moment with ID in an output SDF file?


 




Or I need to attach ID's to a clustered file in the way you write above?


May be exist more quick way... I have to clusterize more than 30 SDF files...


ChemAxon 8b644e6bf4

10-08-2012 15:57:28

Dear Igor,


 


Sorry, this is not solved yet.


The workaround above might be extended with an outer for loop iterating through all the sdf files needed clustering.


The step "cat P10635a.sdf | grep CHEMBL > ids.txt" relies on a common prefix of IDs of sdf files.


Alternatively a simple awk script can be used to extract an sdf property:


grepsdfid.sh:


#!/bin/bash

awk '
    BEGIN {
        nextNAME=0
    } {
        if ( $0 == "> <STRUCTURENAME>" ) {
            nextNAME=1
        } else if ( nextNAME != 0 ) {
            nextNAME=0
            print $0
        } else if ( $0 == "$$$$" ) {
            nextNAME=0
        }
    }
    '

Usage:


cat input.sdf | grepsdfid.sh > ids.txt

Regards,


Gabor