Mapping results from cluster back to smiles

User 749645d446

03-01-2012 11:00:59

I am trying to run a clustering operation over a smiles file then map those results back to the original file (using Jchem verison 5.5.1)


The input .smi file is in the format "CMPD_NAME SMILES"


Here is the command I am perfoming:


'jklustor -c sphex:0.5 /home/input.smi -o wrmols:smiles:/home/output.smi -o wrstat:normal:/home/stat.txt'

The output.smi file is what I am really intrested in, I would like to take the results it gives (GID and GSIZE) for smiles and map them back to my original smiles file so it isis in the format "CPMD_NAME SMILES GID GSIZE" for example. Which could be done with a simple python / bash script. 


However the smiles post and pre clustering do not match (in most cases), making it impossible to see which smiles has which cluster information. After atempting to patch the problem using molconvert on input and output Smiles files, I cannot seem to get a good match. 

Is there a known workaround or way of changing the format or order of the output smiles file so that it would be possible to match the information back.


Many Thanks,


Chris  

ChemAxon 8b644e6bf4

04-01-2012 20:15:52

Dear Chris,


 


Sorry for the late answer.


The input .smi file is in the format "CMPD_NAME SMILES"


This can be problematic since we expect the lines to start with the SMILES content.


You can swap them with awk if CMPD_NAME does not contains whitespaces:


cat input.smi | awk '{ print $2 " " $1 }' > input_cxn.smi


However the smiles post and pre clustering do not
match (in most cases), making it impossible to see which smiles has
which cluster information. After atempting to patch the problem using
molconvert on input and output Smiles files, I cannot seem to get a good
match.



Using molconvert to generate canonical SMILES can help. A workaround for a similar problem is described here: https://www.chemaxon.com/forum/ftopic8475.html


A possible outline for Your workaround:



Regards,


Gabor